# Python with Boto3

This page describes how to easily use the boto3 library in Python to analyze data from the Data Connector.

{% hint style="info" %}
Before you begin, please ensure that your credentials are correctly set up. How to do this is explained in the [your-credentials](https://enterprise-docs.datacamp.com/integrating-our-data-into-your-tools-via-data-connector-2.0/getting-started-with-data-connector-2.0/your-credentials "mention") article.
{% endhint %}

## Examples

### Get a list of all users

This script retrieves a list of all users from the Data Connector and stores it in a pandas dataframe.&#x20;

```python
import pandas as pd
import boto3

S3_BUCKET_NAME = "<your bucket name here>"

# Create client, authentication is done through environment variables
s3_client = boto3.client('s3')

# Utility method to get a file and load it into a df
def getDataFrameFromS3(table):
    key = f'latest/{table}.csv'
    response = s3_client.get_object(Bucket=S3_BUCKET_NAME, Key=key)
    return pd.read_csv(response['Body'])

# Get the dimension CSV file that contains all users
dim_user = getDataFrameFromS3('dim_user')
print(dim_user)
```

### Time spent in Learn per technology

Each content type at DataCamp has an associated [technology](https://enterprise-docs.datacamp.com/understanding-reports-with-clarity-definitions) (e.g., R, Python, SQL, Spark, etc.). With the code below, you can create a report with the time spent per technology.

```python
import pandas as pd
import boto3

S3_BUCKET_NAME = "<your bucket name here>"

# Create client, authentication is done through environment variables
s3_client = boto3.client('s3')

# Utility method to get a file and load it into a df
def getDataFrameFromS3(table):
    key = f'latest/{table}.csv'
    response = s3_client.get_object(Bucket=S3_BUCKET_NAME, Key=key)
    return pd.read_csv(response['Body'])

# Get required data frames
fact_learn_events = getDataFrameFromS3('fact_learn_events')
dim_content = getDataFrameFromS3('dim_content')

# Merge the dataframes
result = fct_learn_events \
    .merge(dim_content, on='content_id', how="left") \
    [['technology', 'duration_engaged']]
    
# Filter for rows where duration_engaged is greater than zero
result_filtered = result[result['duration_engaged'] > 0]

# Turn duration_engaged into hours
result_filtered['duration_engaged'] = result_filtered['duration_engaged'] / 3600

# Calculate time spent per technology
result_grouped = result_filtered.groupby('technology')['duration_engaged'].sum()

print(result_grouped.sort_values(ascending=False))
```

## More examples?

Please review our [sample queries](https://enterprise-docs.datacamp.com/integrating-our-data-into-your-tools-via-data-connector-2.0/sample-queries) and queries that [recreate key reports in the Groups tab](https://enterprise-docs.datacamp.com/integrating-our-data-into-your-tools-via-data-connector-2.0/queries-to-recreate-key-reports-in-the-groups-tab).

Reach out to your customer success manager, and we are happy to help you get the data you need using Python or SQL.&#x20;
