Download with Python

This page demonstrates how you can access and download the raw data files the Data Connector exposes. This is useful if you want to import these files into your own data lake or want to manually process and analyze the data locally.

If you want to easily access and query the Data Connector data, directly from Python, DataCamp provides the dcdcpy Python package. Installation details and examples of data import are provided in the linked GitHub README.

With the script below you can easily list or download all files from your Data Connector S3 bucket. All you need to do is follow the instructions below.

Step 1: Install the required packages

This example script requires only standard Python 3 installation and the boto3 library which can be installed by executing the following command:

$ pip3 install boto3

Step 2: Save the script

Save the script below in a file named data_connector.py

data_connector.py
import boto3
import os
import sys
import datetime

# Your Data Connector credentials
aws_access_key = "ACCESS_KEY"
aws_secret_key = "SECRET_KEY"
aws_bucket = "BUCKET_NAME"

s3 = boto3.resource('s3', aws_access_key_id=aws_access_key, aws_secret_access_key=aws_secret_key)

def list_s3_files(bucket_name=aws_bucket):
    file_list = []
    bucket = s3.Bucket(bucket_name)
    for bucket_object in bucket.objects.all():
        file_list.append(bucket_object.key)
        print(bucket_object.key)
    return file_list

def download_from_s3(dirname, bucket_name=aws_bucket):
    bucket = s3.Bucket(bucket_name)
    current_directory = os.getcwd()
    target_directory = os.path.join(current_directory, dirname)
    os.makedirs(target_directory, exist_ok=True)

    for object in bucket.objects.filter(Prefix=dirname):
        bucket.download_file(object.key, object.key)
        print('Downloading', object.key, '...')
    print('All downloads finished!')
    return True

if len(sys.argv) == 1:
    list_s3_files()
else:
    download_from_s3(sys.argv[1])

Step 3: Add credentials

Update the script with your organization's credentials.

In the script update the three variables on lines 7 to 9 with your Data Connector credentials.

See our Your Credentials page for instructions on how to retrieve these settings.

The above example script requires you to enter your credentials in the script itself, this is generally considered an insecure practice. Learn more about how you can safely store your access credentials in our Storing your Credentialsdocument

Step 4: Run the script

Now the script is ready, below are a couple of examples of how you can use the script.

Calling the script without any extra arguments will show a list of all files and directories in the bucket as demonstrated with the example below.

This could potentially output a very long list of items!

python3 data_connector.py

2021-12-15/assessment_dim.csv
2021-12-15/chapter_dim.csv
2021-12-15/course_dim.csv
2021-12-15/date_dim.csv
2021-12-15/docs.csv
2021-12-15/exercise_dim.csv
2021-12-15/learning_assessment_fact.csv
2021-12-15/learning_chapter_fact.csv
2021-12-15/learning_course_fact.csv
2021-12-15/learning_exercise_fact.csv
2021-12-15/learning_practice_fact.csv
2021-12-15/learning_project_fact.csv
2021-12-15/learning_track_content_fact.csv
2021-12-15/learning_track_fact.csv
2021-12-15/practice_dim.csv
2021-12-15/project_dim.csv
2021-12-15/team_dim.csv
2021-12-15/track_content_dim.csv
2021-12-15/track_dim.csv
2021-12-15/user_dim.csv
2021-12-15/user_team_bridge.csv
2021-12-15/xp_fact.csv
...

PreviousAWS CLI (Linux)If you specify a directory name (eg: 2021-12-12) this script will download all files from that directory to your local machine.

python3 data_connector.py 2021-12-12

Downloading 2021-12-12/assessment_dim.csv ...
Downloading 2021-12-12/chapter_dim.csv ...
Downloading 2021-12-12/course_dim.csv ...
Downloading 2021-12-12/date_dim.csv ...
Downloading 2021-12-12/docs.csv ...
Downloading 2021-12-12/exercise_dim.csv ...
Downloading 2021-12-12/learning_assessment_fact.csv ...
Downloading 2021-12-12/learning_chapter_fact.csv ...
Downloading 2021-12-12/learning_course_fact.csv ...
Downloading 2021-12-12/learning_course_fact_v2.csv ...
Downloading 2021-12-12/learning_exercise_fact.csv ...
Downloading 2021-12-12/learning_exercise_fact_v2.csv ...
Downloading 2021-12-12/learning_practice_fact.csv ...
Downloading 2021-12-12/learning_project_fact.csv ...
Downloading 2021-12-12/learning_track_content_fact.csv ...
Downloading 2021-12-12/learning_track_fact.csv ...
Downloading 2021-12-12/practice_dim.csv ...
Downloading 2021-12-12/project_dim.csv ...
Downloading 2021-12-12/team_dim.csv ...
Downloading 2021-12-12/track_content_dim.csv ...
Downloading 2021-12-12/track_dim.csv ...
Downloading 2021-12-12/user_dim.csv ...
Downloading 2021-12-12/user_team_bridge.csv ...
Downloading 2021-12-12/xp_fact.csv ...
All downloads finished!

Time saver! There is a "smart" directory called "latest" which always contains the latest export. You don't need to dynamically calculate or specify today's date if you need the latest data. Just calling the script with python3 data_connector.py latest will download the most recent export to your local machine!

Last updated