Python with Boto3

This page describes how to easily use the boto3 library in Python to analyze data from the Data Connector.

Before you begin, please ensure that your credentials are correctly set up. How to do this is explained in the Your credentials article.

Examples

Get a list of all users

This script retrieves a list of all users from the Data Connector and stores it in a pandas dataframe.

import pandas as pd
import boto3

S3_BUCKET_NAME = "<your bucket name here>"

# Create client, authentication is done through environment variables
s3_client = boto3.client('s3')

# Utility method to get a file and load it into a df
def getDataFrameFromS3(table):
    key = f'latest/{table}.csv'
    response = s3_client.get_object(Bucket=S3_BUCKET_NAME, Key=key)
    return pd.read_csv(response['Body'])

# Get the dimension CSV file that contains all users
dim_user = getDataFrameFromS3('dim_user')
print(dim_user)

Time spent in Learn per technology

Each content type at DataCamp has an associated technology (e.g., R, Python, SQL, Spark, etc.). With the code below, you can create a report with the time spent per technology.

import pandas as pd
import boto3

S3_BUCKET_NAME = "<your bucket name here>"

# Create client, authentication is done through environment variables
s3_client = boto3.client('s3')

# Utility method to get a file and load it into a df
def getDataFrameFromS3(table):
    key = f'latest/{table}.csv'
    response = s3_client.get_object(Bucket=S3_BUCKET_NAME, Key=key)
    return pd.read_csv(response['Body'])

# Get required data frames
fact_learn_events = getDataFrameFromS3('fact_learn_events')
dim_content = getDataFrameFromS3('dim_content')

# Merge the dataframes
result = fct_learn_events \
    .merge(dim_content, on='content_id', how="left") \
    [['technology', 'duration_engaged']]
    
# Filter for rows where duration_engaged is greater than zero
result_filtered = result[result['duration_engaged'] > 0]

# Turn duration_engaged into hours
result_filtered['duration_engaged'] = result_filtered['duration_engaged'] / 3600

# Calculate time spent per technology
result_grouped = result_filtered.groupby('technology')['duration_engaged'].sum()

print(result_grouped.sort_values(ascending=False))

More examples?

Please review our sample queries and queries that recreate key reports in the Groups tab.

Reach out to your customer success manager, and we are happy to help you get the data you need using Python or SQL.

PreviousDataLab NextDownloading your data

Last updated 1 month ago