Python with boto3

The dcdcpy (deprecated) page explains how to use the DataCamp Data Connector using a convenient python package created by DataCamp. However, if you cannot install the package, you can also use the opensource (created by aws) boto3 package instead.

The dcdcpy package is actually just a wrapper of the boto3 library custom built to support our Data Model.

This page describes how you can easily use the boto3 library to use Python to analyze the data from the Data Connector.

Before you get started, please make sure your credentials are set up properly. How to do this is explained in the Storing your Credentials article.

Examples

Get a list of all users

This script retrieves a list of all users from the Data Connector and stores it in a pandas dataframe.

import pandas as pd
import boto3

S3_BUCKET_NAME = "<your bucket name here>"

# Create client, authentication is done through environment variables
s3_client = boto3.client('s3')

# Utility method to get a file and load it into a df
def getDataFrameFromS3(table):
    key = f'latest/{table}.csv'
    response = s3_client.get_object(Bucket=S3_BUCKET_NAME, Key=key)
    return pd.read_csv(response['Body'])

# Get the dimension CSV file that contains all users
user_dim = getDataFrameFromS3('user_dim')
print(user_dim)

Get time spent on courses for a single user

This code will print a dataframe with all course activity for a single user, it lists the course, the amount of time spent (in seconds) and date for all their learning sessions.

import pandas as pd
import boto3

S3_BUCKET_NAME = "<your bucket name here>"
USER_EMAIL = 'john.doe@datacamp.com'

# Create client, authentication is done through environment variables
s3_client = boto3.client('s3')

# Utility method to get a file and load it into a df
def getDataFrameFromS3(table):
    key = f'latest/{table}.csv'
    response = s3_client.get_object(Bucket=S3_BUCKET_NAME, Key=key)
    return pd.read_csv(response['Body'])

# Get required data frames
course_fact = getDataFrameFromS3('course_fact')
course_dim = getDataFrameFromS3('course_dim')
user_dim = getDataFrameFromS3('user_dim')

# Merge the dataframes
result = course_fact \
    .merge(course_dim, on='course_id', how="left") \
    .merge(user_dim, on='user_id', how='left') \
    [['course_id', 'title', 'time_spent', 'date_id', 'email']]

# Filter results on a single user
result = result[result['email'] == USER_EMAIL]

print(result)

More examples?

Reach out to your customer success manager and we are happy to help you get the data you need using our python library or SQL.

Last updated