Download with Python
This page demonstrates how you can access and download the raw data files the Data Connector exposes. This is useful if you want to import these files into your own data lake or want to manually process and analyze the data locally.
If you want to easily access and query the Data Connector data, directly from Python, DataCamp provides the dcdcpy Python package. Installation details and examples of data import are provided in the linked GitHub README.
With the script below you can easily list or download all files from your Data Connector S3 bucket. All you need to do is follow the instructions below.
This example script requires only standard Python 3 installation and the boto3 library which can be installed by executing the following command:
$ pip3 install boto3
Save the script below in a file named
data_connector.py
data_connector.py
import boto3
import os
import sys
import datetime
# Your Data Connector credentials
aws_access_key = "ACCESS_KEY"
aws_secret_key = "SECRET_KEY"
aws_bucket = "BUCKET_NAME"
s3 = boto3.resource('s3', aws_access_key_id=aws_access_key, aws_secret_access_key=aws_secret_key)
def list_s3_files(bucket_name=aws_bucket):
file_list = []
bucket = s3.Bucket(bucket_name)
for bucket_object in bucket.objects.all():
file_list.append(bucket_object.key)
print(bucket_object.key)
return file_list
def download_from_s3(dirname, bucket_name=aws_bucket):
bucket = s3.Bucket(bucket_name)
current_directory = os.getcwd()
target_directory = os.path.join(current_directory, dirname)
os.makedirs(target_directory, exist_ok=True)
for object in bucket.objects.filter(Prefix=dirname):
bucket.download_file(object.key, object.key)
print('Downloading', object.key, '...')
print('All downloads finished!')
return True
if len(sys.argv) == 1:
list_s3_files()
else:
download_from_s3(sys.argv[1])
Update the script with your organization's credentials.
In the script update the three variables on lines 7 to 9 with your Data Connector credentials.
The above example script requires you to enter your credentials in the script itself, this is generally considered an insecure practice. Learn more about how you can safely store your access credentials in our Storing your Credentialsdocument
Now the script is ready, below are a couple of examples of how you can use the script.
Calling the script without any extra arguments will show a list of all files and directories in the bucket as demonstrated with the example below.
This could potentially output a very long list of items!
python3 data_connector.py
2021-12-15/assessment_dim.csv
2021-12-15/chapter_dim.csv
2021-12-15/course_dim.csv
2021-12-15/date_dim.csv
2021-12-15/docs.csv
2021-12-15/exercise_dim.csv
2021-12-15/learning_assessment_fact.csv
2021-12-15/learning_chapter_fact.csv
2021-12-15/learning_course_fact.csv
2021-12-15/learning_exercise_fact.csv
2021-12-15/learning_practice_fact.csv
2021-12-15/learning_project_fact.csv
2021-12-15/learning_track_content_fact.csv
2021-12-15/learning_track_fact.csv
2021-12-15/practice_dim.csv
2021-12-15/project_dim.csv
2021-12-15/team_dim.csv
2021-12-15/track_content_dim.csv
2021-12-15/track_dim.csv
2021-12-15/user_dim.csv
2021-12-15/user_team_bridge.csv
2021-12-15/xp_fact.csv
...
PreviousAWS CLI (Linux)If you specify a directory name (eg:
2021-12-12
) this script will download all files from that directory to your local machine.python3 data_connector.py 2021-12-12
Downloading 2021-12-12/assessment_dim.csv ...
Downloading 2021-12-12/chapter_dim.csv ...
Downloading 2021-12-12/course_dim.csv ...
Downloading 2021-12-12/date_dim.csv ...
Downloading 2021-12-12/docs.csv ...
Downloading 2021-12-12/exercise_dim.csv ...
Downloading 2021-12-12/learning_assessment_fact.csv ...
Downloading 2021-12-12/learning_chapter_fact.csv ...
Downloading 2021-12-12/learning_course_fact.csv ...
Downloading 2021-12-12/learning_course_fact_v2.csv ...
Downloading 2021-12-12/learning_exercise_fact.csv ...
Downloading 2021-12-12/learning_exercise_fact_v2.csv ...
Downloading 2021-12-12/learning_practice_fact.csv ...
Downloading 2021-12-12/learning_project_fact.csv ...
Downloading 2021-12-12/learning_track_content_fact.csv ...
Downloading 2021-12-12/learning_track_fact.csv ...
Downloading 2021-12-12/practice_dim.csv ...
Downloading 2021-12-12/project_dim.csv ...
Downloading 2021-12-12/team_dim.csv ...
Downloading 2021-12-12/track_content_dim.csv ...
Downloading 2021-12-12/track_dim.csv ...
Downloading 2021-12-12/user_dim.csv ...
Downloading 2021-12-12/user_team_bridge.csv ...
Downloading 2021-12-12/xp_fact.csv ...
All downloads finished!
Time saver! There is a "smart" directory called "latest" which always contains the latest export. You don't need to dynamically calculate or specify today's date if you need the latest data. Just calling the script with
python3 data_connector.py latest
will download the most recent export to your local machine!Last modified 1yr ago