Comment on page
Data Model
The data model for the data connector provides data on activity for users in a group across different content types. The currently supported content types are:
- Assessments
- Assignments
- Certification
- Courses
- Practice
- Projects
- Tracks
- Workspace
The data available from the Data Connector is modelled according to a dimensional model. This means there are fact, dimension and bridge tables available.
Facts
Dimensions
Bridges
Facts are the measurements/metrics or facts from your user activity on DataCamp.
course_fact
for example contains measures like XP and time_spent on courses by your users. assessment_fact
contains measures like score and percentile for the assessment scores your users have registered on DataCamp.Dimension provides the context surrounding the fact tables. In simple terms, they give who, what, where of a fact. In other words, a dimension is a window to view information in the facts.
The
course_fact
table can for example be joined with the course_dim
table to find out more information about the courses (like title, technology, instructor, ...) or with user_dim
to get information on the users (like email, name, registration date, etc ...) Bridge tables are used to connect fact and/or dimension tables together. Bridge tables are also referred to as "join tables" in classic SQL.
The
user_team_bridge
table for example can be used to connect a user to the teams they are member of. In this case a bridge table is needed because a single user can be member of multiple teams at the same time.You can join the fact tables with the dimension tables to summarize XP and time spent across technology, topic etc.
The data model also provides dimension tables at the user level
user_dim
, team_dim
, group_dim
, and bridge tables user_team_bridge
to facilitate analysis at the team or user level.For example, you can aggregate the
xp
gained and time_spent
by week across technology
and team
with the following query.SQL
Python
SELECT team_id,
week_start_date,
technology,
SUM(xp) AS xp,
SUM(time_spent)/3600 AS time_spent_hours
FROM course_fact
INNER JOIN course_dim USING(course_id)
INNER JOIN date_dim USING(date_id)
INNER JOIN user_team_bridge USING(user_id)
INNER JOIN team_dim USING(team_id)
GROUP BY 1, 2, 3
from dcdcpy import DataConnector
dc = DataConnector()
(
dc.course_fact()
.merge(dc.course_dim(), on='course_id', suffixes=('', '_y'))
.merge(dc.date_dim(), left_on='date_id', right_on='id')
.merge(dc.user_team_bridge(), on='user_id')
.merge(dc.team_dim(), left_on='team_id', right_on='id')
.groupby(['week_start_date', 'technology'], as_index=False)
.agg({
'time_spent': lambda d: d.sum()/3600,
'xp': 'sum'
})
)
Note that all fact tables only include learning activity for the dates on which a user was a part of the group. For example, consider a user A who joined group 1234 on Jan 1st, started a course on Jan 2nd, left the group on Jan 3rd while continuing to work on the course, rejoined the group on Jan 4th, and completed the course on Jan 5th.
In this case, the fact tables for group 1234 will not contain data for User A for January 3rd (even though the user continued working on the course).

Warning! For now, the Data Connector does not include time spent on mobile, this could result in slight differences from other sources of reporting such as exports or our Group Hub.
assessment_dim
: The assessment dimension provides descriptive data about assessments.column_name | column_description |
---|---|
assessment_id | The unique id of the assessment. |
title | The title of the assessment. |
slug | The slug of The assessment. |
technology | The assessment technology (e.g., Python, R, SQL) |
id | [DEPRECATED] Use assessment_id instead. |
assessment_fact
: The assessment fact table provides data about the users’ assessment results: score and percentile obtained and time spent on each assessment.column_name | column_description |
---|---|
user_id | The unique id of the user id. References user_dim.id |
assessment_id | The unique id of the assessment. References assessment_dim.id |
date_id | The identifier of the date when the user worked on the assessment References dim_date.id |
started_at | The timestamp when the user started the assessment (as registered by the DataCamp application) |
completed_at | The timestamp when the user completed the assessment (as registered by the DataCamp application). It is NULL when the user did not complete the assessment. |
score | The user score for the assessment. |
score_group | The user score group for the assessment. A score of < 70 is “Novice”, 70 - 100 as “Intermediate Lower”, 100 - 130 as “Intermediate”, 130 - 160 as “Intermediate Upper”, and > 160 as “Advanced”. |
percentile | The percentile the user belongs in. |
time_spent | The time (in seconds) the user spent on the assessment. |
assignment_dim
: The assignment dimension table provides data about a specific assignment.column_name | column_description |
---|---|
group_id | The unique identifier for the group. |
assignment_id | The unique identifier for the assignment. |
created_by_id | The user_id of the user who created the assignment. References user_dim.user_id. |
assignment_type | The type of assignment (chapter / course / project / assessment / customtrack / xp) |
assignment_title | The title of the content item assigned. It is NULL for assignment_type = ‘xp’, where no content is assigned. |
assignment_status | The status of the assignemnt (archived / active). |
assignee_type | The type of assignee (user / team / group). |
assignment_xp | The XP achieved by completing the assignment. |
technology | The technology of the associated content item. It is NULL for assignment_type = ‘xp’, where no content is assigned. |
topic | The topic of the associated content item. It is NULL for assignment_type = ‘xp’, where no content is assigned. |
due_at | The timestamp when the assignment is due. |
created_at | The timestamp when the assignment was created. |
updated_at | The timestamp when the assignment was updated. |
deleted_at | The timestamp when the assignment was deleted. |
assignment_fact
: The assignment fact table provides data about the users’ progress on each assignment.column_name | column_description |
---|---|
group_id | The unique identifier of the group. |
user_id | The unique identifier of the user. References user_dim.user_id |
assignment_id | The unique identifier of the assignment. References assignment_dim.assignment_id. |
is_completed | A boolean indicating if the user completed the assignment. |
completion_status | The status of completion of the assignment. It can be completed / late / in_progress / missed. |
assigned_at | The timestamp when the assignment was created. |
due_at | The timestamp when the assignment is due. |
completed_at | The timestamp when the user completed the assignment. |
email_sent_at | The timestamp when an email was sent to the user. |
reminder_sent_at | The timestamp when a reminder was sent to the user. |
certification_dim
: The certification dimension provides data about a specific certification. Certification ID 1 and 2 are the so-called v1 iterations of certifications. From June 2022 onwards, we switched to a process where an official certification registration was needed, and this registration would time out after 30 days.column_name | column_description |
---|---|
certification_id | A unique identifier for the certification. |
certification_name | The name of the certification. |
certification_version | The version of the certification. |
certification_fact
: The certification fact table provides data about the users’ progress through different stages of a certification.column_name | column_description |
---|---|
group_id | The unique identifier of the group. |
user_certification_id | The unique identifier of a user attempting a certification. A user can have more than one attempt at obtaining a certification. |
user_id | The unique identifier of the user. References user_dim.user_id |
certification_id | The unique identifier of the certification. References certification_dim.certification_id. |
registered_at | The timestamp when the user registered for the certification attempt. |
expired_at | The timestamp when the user certification attempt (will) expire. |
assessments_passed_at | (v1 only) The timestamp when the user passed the assessments. |
coding_challenge_passed_at | (v1 only) The timestamp when the user passed the coding challenge(s). |
first_exam_started_at | (v2 only) The timestamp when the user started their first certification exam in an attempt. |
first_exam_completed_at | (v2 only) The timestamp when the user completed their first certification exam in an attempt. |
exam_last_passed_at | (v2 only) The timestamp when the user passed their last certification exam. |
case_study_ready_at | The timestamp when the user passed all stages prior to the practical exam. |
first_case_study_submitted_at | The timestamp when the first practical exam was submitted for grading. |
last_case_study_submitted_at | The timestamp for then the last practical exam was submitted for grading. |
certificate_granted_at | The date at which the certificate was granted by the admin. |
is_passed_assessments | An indicator whether user passed all timed assessments. |
is_passed_challenges | An indicator whether the user passed all coding challenges. |
is_passed_exams | An indicator whether the user passed all certification exams. |
exercise_dim
: The exercise dimension provides descriptive data about a specific exercise. Because the model considers users’ progress on all exercises (i.e., even exercises that are no longer available), there is a specific row with the id = -1. This row is used in the exercise fact table to refer to a deleted exercise.column_name | column_description |
---|---|
exercise_id | The unique id of the exercise id. |
title | The title of the exercise. |
type | The type of exercise type (e.g., NormalExercise, VideoExercise, MultipleChoiceExercise, …) |
number | The number of the exercise in the chapter, accounting for subexercises. This can be used to sort exercises in order. |
xp | The maximum number of XP a user can get by completing the exercise. |
technology | The course technology (e.g., Tableau, SQL, Python, R, …) |
topic | The course topic (e.g., Data visualization, Programming, Machine Learning, …) |
course_title | The course title the exercise belongs to. |
course_slug | The course slug the exercise belongs to |
course_xp | The maximum number of XP a user can get by completing the course the exercise belongs to. |
course_description | The course description the exercise belongs to. |
course_short_description | A shorter version of the course description the exercise belongs to. |
course_launched_date | Date at which the course went live. |
chapter_title | The chapter title the exercise belongs to. |
chapter_slug | The chapter slug the exercise belongs to. |
chapter_xp | The maximum number of XP a user can get by completing the chapter the exercise belongs to. |
chapter_nb_exercises | The number of exercises in the chapter the exercise belongs to. |
id | [DEPRECATED] Use exercise_id instead. |
exercise_fact
: The exercise fact table provides data about the users’ progress: XP gained and time spent on each exercise. The table provides data for exercises that can be no longer available (and deleted from our database). In this case, the exercise_id value is set to -1 and links to a “deleted exercise”” row in the exercise_dim table.Fact table for exercise content. It provides engagement and xp measurements on exercise.column_name | column_description |
---|---|
user_id | The unique user id. Reference to user_dim.id. |
exercise_id | The unique exercise_id. Reference to exercise_dim.id. |
date_id | The date id. Reference to date_dim.id. |
started_at | The timestamp at which the user started the exercise (as registered by the DataCamp application). |
completed_at | The timestamp at which the user completed the exercise (as registered by the DataCamp application). |
time_spent | How much time (in seconds) the user spent on the exercise? |
xp | How many XP did the user gain by completing the exercise? |
chapter_dim
: The chapter dimension provides descriptive data about a specific chapter. Because the model considers users’ progress on all chapters (i.e., even chapters that are no longer available), there is a specific row with the id = -1. This row is used in the chapter fact table to refer to a deleted chapter.column_name | column_description |
---|---|
chapter_id | The unique id of the chapter id. |
title | The title of the chapter. |
slug | The slug of the chapter. |
xp | The maximum number of XP a user can get by completing the chapter. |
technology | The course technology (e.g., Tableau, SQL, Python, R, …). |
topic | The course topic (e.g., Data visualization, Programming, Machine Learning, …). |
nb_exercises | The number of exercises in the chapter. |
course_title | The title of the course the chapter belongs to. |
course_slug | The slug of the course the chapter belongs to. |
course_xp | The maximum number of XP a user can get by completing the course the chapter belongs to. |
course_description | The description of the course the chapter belongs to. |
course_short_description | A shorter description of the course the chapter belongs to. |
course_launched_date | The date on which the course went live. |
id | [DEPRECATED] Use chapter_id instead. |
chapter_fact
: The chapter fact table provides data about the users’ progress: XP gained and time spent on each chapter. The table provides data for chapters that can be no longer available (and deleted from our database). In this case, the chapter_id value is set to -1 and links to a “deleted chapter” row in the chapter_dim table.column_name | column_description |
---|---|
user_id | The unique id of the user. References to user_dim.user_id. |
chapter_id | The unique id of the chapter. References to chapter_dim.chapter_id. |
date_id | The date on which the user worked on the chapter. References to dim_date.id |
started_at | The timestamp when the user started the chapter (as registered by the DataCamp application). |
completed_at | The timestamp when the user completed the chapter (as registered by the DataCamp application). |
time_spent | The time (in seconds) the user spent on the chapter. |
xp | The XP the user gained by working on the chapter. |
course_dim
: The course dimension provides descriptive data about a specific course. Because the model considers users’ progress on all courses (i.e., even courses that are no longer available), there is a specific row with the id = -1. This row is used in the course fact table to refer to a deleted course.column_name | column_description |
---|---|
course_id | The unique id of the course. |
title | The title of the course. |
technology | The course technology (e.g., Tableau, SQL, Python, R, …). |
topic | The course topic (e.g., Data visualization, Programming, Machine Learning, …). |
xp | The maximum number of XP a user can get by following the course. |
nb_hours_needed | Time needed in hours to complete the course. |
slug | The slug of the course |
description | A description of the course. |
short_description | A short description of the course. |
launched_date | The date on which the course went live. |
id | [DEPRECATED] Use course_id instead. |
course_fact
: The course fact table provides data about the users’ progress: XP gained and time spent on each course. The table provides data for courses that can be no longer available (and deleted from our database). In this case, the course value is set to -1 and links to a “deleted course” row in the course_dim table.column_name | column_description |
---|---|
user_id | The unique id of the user id. References user_dim.user_id |
course_id | The unique id of the course. References course_dim.id |
date_id | The unique id of the date on which the user worked on the course. References to dim_date.id |
started_at | The timestamp when the user started the course (as registered by the DataCamp application). |
completed_at | The timestamp when the user started the course (as registered by the DataCamp application). It is NULL when the user has NOT completed the course yet. |
time_spent | The time (in seconds) the user spent on the course. |
xp | The XP the user gained by working on the course. |
practice_dim
: List of all practicescolumn_name | column_description |
---|---|
practice_id | The practice id (internal mobile_app.pools id) [PK] |
title | The practice name |
status | The practice status (HIDDEN, LIVE) |
technology | The practice’s technology type |
xp | Xp gained by completing the practice |
id | [DEPRECATED] Use practice_id instead. |
practice_fact
: Fact table for practice content. It provides engagement and xp measurements on projects. This table includes practices (aka challenges) after practice replaced challenges.column_name | column_description |
---|---|
user_id | The user id (internal main-app id) [PK], [FK to user_dim.id] |
practice_id | The practice id the user started (internal mobile_app pools id). [PK], [FK to practice_dim.id] |
date_id | The date at which the user worked on the practice [PK], [FK to dim_date.id] |
started_at | The date at which the user started the practice |
completed_at | The date at which the user completed the practice. It is NULL when the user did not complete the practice |
is_mobile | Whether the user did the practice on mobile or in the browser |
time_spent | How much time (in seconds) the user spent on the practice |
xp | How many XP did the user gain by completing the practice |
project_dim
: The project dimension provides descriptive data about a specific project.column_name | column_description |
---|---|
project_id | The unique id of the project. |
title | The title of the course. |
technology | The course technology (R, Python, SQL) |
xp | The maxmimum number of XP a user can get by completing the project. |
nb_hours_needed | The number of hours needed to complete the project. |
is_guided | A boolean indicating whether the project is guided or not. |
is_certification | A boolean indicating whether the project is used for certification. |
description | A description of the project. |
short_description | A short description of the project. |
id | [DEPRECATED] Use project_id instead. |
project_fact
: The project fact table provides data about the users’ project progress: time spent and XP gained.column_name | column_description |
---|---|
user_id | The unique id of the user. References user_dim.user_id |
project_id | The unique id of the project. References project_dim.project_id |
date_id | The id of the date on which the user worked on the project. References date_dim.date_id |
started_at | The timestamp when the user started the project (as registered by the DataCamp application). |
completed_at | The timestamp when the user completed the project (as registered by the DataCamp application). It is NULL when the user has NOT completed the project yet. |
time_spent | The time (in seconds) the user spent on the project. |
xp | The XP the user gained by working on the project. |
track_content_dim
: List of content in all live tracks. This table has one row per track_version and content item. It currently only includes courses and projects. Note that a content item can belong to multiple tracks, and different versions of a track might contain different content items.column_name | column_description |
---|---|
track_version_id | The id of the track version. |
track_id | The id of the track. |
content_id | The id of the content, a concatenation of content type and id. |
content_type | The content type (course / project). |
content_title | The title of the content. |
position | The position of the content in the track version. |
xp | The total xp earned on completion of content. |
track_content_fact
: Fact table for track content.column_name | column_description |
---|---|
group_id | The id of the group. |
user_id | The id of the user. |
track_version_id | The id of the track version. |
content_id | The id of the content, a concatenation of content type and id. |
date_id | The unique id of the date on which the enrolled in the track. References to dim_date.id |
started_at | The timestamp when the user enrolled in the track. |
completed_at | The timestamp when the user completed the track. |
xp | The xp gained by completing the content. |
nb_seconds | The total number of seconds spent on the content. |
track_dim
: List of all versions of live tracks, along with their title, technology, and descriptions. This table has one row for every track version, for tracks that are live.column_name | column_description |
---|---|
track_version_id | The id of the track version. |
track_id | The id of the track. |
version_number | The version of the track. |
title | The title concatenated with subtitle of the track version. |
technology | The technology of the track. |
category | The category of the track. |
short_description | A short description of the track |
published_live_at | The timestamp for when the track version was published live. |
archived_at | The timestamp for when the track version was archived. |
is_current_version | A boolean indicating if this is the current version. |
is_custom | A boolean indicating if this is a custom track. |
nb_courses | The number of courses in the track version. |
nb_projects | The number of projects in the track version. |
xp | The total xp gained on completion of the track. |
track_fact
: Fact table for tracks.column_name | column_description |
---|---|
group_id | The id of the group. |
user_id | The id of the user. |
track_version_id | The id of the track version. |
date_id | The unique id of the date on which the enrolled in the track. References to dim_date.id |
started_at | The timestamp when the user enrolled in the track. |
completed_at | The timestamp when the user completed the track. |
is_currently_active | A boolean indicating if the user is currently active in the track |
publication_fact
: The publication fact table provides data about the users’ workspace publications: number of viewers and time spent viewing by date and viewer type (creator / viewer).column_name | column_description |
---|---|
group_id | The unique identifier of the group. |
creator_id | The unique identifier of the user who created the workspace. References user_dim.user_id. |
workspace_id | The unique identifier of the workspace. References workspace_dim.workspace_id |
viewed_at | The date on which the workspace publication was viewed. |
nb_viewers | The number of unique viewers who viewed the publication. |
nb_seconds | The number of seconds spent viewing the publication. |
viewer_type | The type of viewer (creator / viewer) |
workspace_dim
: The workspace dimension table provides descriptive data about a specific workspace.column_name | column_description |
---|---|
workspace_id | The unique identifier of the workspace. |
workspace_title | The title of the workspace. |
technology | The language of the workspace (R / Python) |
owner_type | Whether the owner is individual or group. |
category | The category of the source template. - base: - dataset - recipe - playbook - project - boilerplate |
key | The template key (NULL for non-template types). |
is_featured | A boolean indicating if the workspace is featured on the profile. |
group_id | The unique identifier of the group, the workspace creator belonged to. |
workspace_fact
: The workspace fact table provides data about workspaces created by users.column_name | column_description |
---|---|
creator_id | The user id of the creator of the workspace. References user_dim.user_id. |
workspace_id | The id of the workspace. References workspace_dim.workspace_id. |
nb_megabytes | The size of files in the workspace measured in megabytes. |
nb_attempts_to_publish | The number of times the workspace was published. |
nb_times_published_successfully | The number of times the workspace was published successfully. |
nb_upvotes | The number of upvotes received. |
nb_shares | The number of times the workspace was shared. |
created_at | The timestamp for when the workspace is created. |
updated_at | The timstamp for when the workspace is updated. |
first_edited_at | The timestamp for when the workspace was first edited. |
last_edited_at | The timestamp for when the workspace was last edited. |
first_published_at | The timestamp for when the workspace was first published. |
last_published_at | The timestamp for when the workspace was last edited. |
first_integration_added_at | The timestamp for when the first integration was added to the workspace. |
last_integration_added_at | The timestamp for when the last integration was added to the workspace. |
first_upvoted_at | The timestamp for when the first upvote was received. |
last_upvoted_at | The timestamp for when the last upvote was received. |
group_id | The id of the group owner of the workspace. |
workspace_visit_fact
: The workspace visit fact table provides data user visits to workspaces.column_name | column_description |
---|---|
visitor_id | The user id of the visitor to the workspace. References user_dim.user_id. |
workspace_id | The id of the workspace. References workspace_dim.workspace_id. |
visited_at | The date of the visit, granular to the day level. |
nb_seconds | The duration of the visit in seconds. |
group_id | The id of the group. |
team_dim
: The team dimension provides descriptive data about teams.column_name | column_description |
---|---|
team_id | The unique identifier of the team |
name | The team name |
slug | The team slug |
created_date | The team’s creation date |
updated_date | The team’s updated date |
deleted_date | The team’s deleted date |
id | [DEPRECATED] Use team_id instead. |
user_dim
: List of all enterprise userscolumn_name | column_description |
---|---|
user_id | The user id (internal main-app id) [PK] |
first_name | The user first name |
last_name | The user last name |
email | The user email |
slug | The slug to the user profile |