This page demonstrates how you can access and download the raw data files the Data Connector exposes. This is useful if you want to import these files into your own data lake or want to manually process and analyze the data locally.
If you want to easily access and query the Data Connector data, directly from Python, DataCamp provides the dcdcpy Python package. Installation details and examples of data import are provided in the linked GitHub README.
With the script below you can easily list or download all files from your Data Connector S3 bucket. All you need to do is follow the instructions below.
Step 1: Install the required packages
This example script requires only standard Python 3 installation and the boto3 library which can be installed by executing the following command:
$pip3installboto3
Step 2: Save the script
Save the script below in a file named data_connector.py
Update the script with your organization's credentials.
In the script update the three variables on lines 7 to 9 with your Data Connector credentials.
See our Your Credentials page for instructions on how to retrieve these settings.
The above example script requires you to enter your credentials in the script itself, this is generally considered an insecure practice. Learn more about how you can safely store your access credentials in our Storing your Credentialsdocument
Step 4: Run the script
Now the script is ready, below are a couple of examples of how you can use the script.
Calling the script without any extra arguments will show a list of all files and directories in the bucket as demonstrated with the example below.
This could potentially output a very long list of items!
Time saver! There is a "smart" directory called "latest" which always contains the latest export. You don't need to dynamically calculate or specify today's date if you need the latest data. Just calling the script with python3 data_connector.py latest will download the most recent export to your local machine!