Data Model
The data model for the data connector provides data on learning activity for users in a group across different content types. The currently supported content types are:
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
For every content type, the data model provides a fact table (e.g.
course_fact
) consisting of measures like XP and time spent, and a dimension table (course_dim
) consisting of descriptors like technology, topic etc. You can join the fact tables with the dimension tables to summarize XP and time spent across technology, topic etc.Additionally, the data model also provides dimension tables at the user level
user_dim
, team_dim
, group_dim
, and bridge tables user_team_bridge
to facilitate analysis at the team or user level.For example, you can aggregate the
xp
gained and time_spent
by week across technology
and team
with the following query.SQL
Python
SELECT team_id,
week_start_date,
technology,
SUM(xp) AS xp,
SUM(time_spent)/3600 AS time_spent_hours
FROM course_fact
INNER JOIN course_dim USING(course_id)
INNER JOIN date_dim USING(date_id)
INNER JOIN user_team_bridge USING(user_id)
INNER JOIN team_dim USING(team_id)
GROUP BY 1, 2, 3
from dcdcpy import DataConnector
dc = DataConnector()
(
dc.course_fact()
.merge(dc.course_dim(), on='course_id', suffixes=('', '_y'))
.merge(dc.date_dim(), left_on='date_id', right_on='id')
.merge(dc.user_team_bridge(), on='user_id')
.merge(dc.team_dim(), left_on='team_id', right_on='id')
.groupby(['week_start_date', 'technology'], as_index=False)
.agg({
'time_spent': lambda d: d.sum()/3600,
'xp': 'sum'
})
)
Note that all fact tables only include learning activity for the dates on which a user was a part of the group. For example, consider a user A who joined group 1234 on Jan 1st, started a course on Jan 2nd, left the group on Jan 3rd while continuing to work on the course, rejoined the group on Jan 4th, and completed the course on Jan 5th.
In this case, the fact tables for group 1234 will not contain data for User A for January 3rd (even though the user continued working on the course).

assessment_dim
: The assessment dimension provides descriptive data about a specific assessment.column_name | column_description |
---|---|
assessment_id | The unique id of the assessment. |
title | The title of the assessment. |
slug | The slug of The assessment. |
technology | The assessment technology (e.g., Python, R, SQL) |
id | [DEPRECATED] Use assessment_id instead. |
assessment_fact
: The assessment fact table provides data about the users’ assessment results: score and percentile obtained and time spent on each assessment.column_name | column_description |
---|---|
user_id | The unique id of the user id. References user_dim.id |
assessment_id | The unique id of the assessment. References assessment_dim.id |
date_id | The identifier of the date when the user worked on the assessment References dim_date.id |
started_at | The timestamp when the user started the assessment (as registered by the DataCamp application) |
completed_at | The timestamp when the user completed the assessment (as registered by the DataCamp application). It is NULL when the user did not complete the assessment. |
score | The user score for the assessment. |
score_group | The user score group for the assessment. A score of < 70 is “Novice”, 70 - 100 as “Intermediate Lower”, 100 - 130 as “Intermediate”, 130 - 160 as “Intermediate Upper”, and > 160 as “Advanced”. |
percentile | The percentile the user belongs in. |
time_spent | The time (in seconds) the user spent on the assessment. |
assignment_dim
: The assignment dimension table provides data about a specific assignment.column_name | column_description |
---|---|
group_id | The unique identifier for the group. |
assignment_id | The unique identifier for the assignment. |
created_by_id | The user_id of the user who created the assignment. References user_dim.user_id. |
assignment_type | The type of assignment (chapter / course / project / assessment / customtrack / xp) |
assignment_title | The title of the content item assigned. It is NULL for assignment_type = ‘xp’, where no content is assigned. |
assignment_status | The status of the assignemnt (archived / active). |
assignee_type | The type of assignee (user / team / group). |
assignment_xp | The XP achieved by completing the assignment. |
technology | The technology of the associated content item. It is NULL for assignment_type = ‘xp’, where no content is assigned. |
topic | The topic of the associated content item. It is NULL for assignment_type = ‘xp’, where no content is assigned. |
due_at | The timestamp when the assignment is due. |
created_at | The timestamp when the assignment was created. |
updated_at | The timestamp when the assignment was updated. |
deleted_at | The timestamp when the assignment was deleted. |
assignment_fact
: The assignment fact table provides data about the users’ progress on each assignment.column_name | column_description |
---|---|
group_id | The unique identifier of the group. |
user_id | The unique identifier of the user. References user_dim.user_id |
assignment_id | The unique identifier of the assignment. References assignment_dim.assignment_id. |
is_completed | A boolean indicating if the user completed the assignment. |
completion_status | The status of completion of the assignment. It can be completed / late / in_progress / missed. |
assigned_at | The timestamp when the assignment was created. |
due_at | The timestamp when the assignment is due. |
completed_at | The timestamp when the user completed the assignment. |
email_sent_at | The timestamp when an email was sent to the user. |
reminder_sent_at | The timestamp when a reminder was sent to the user. |
certification_dim
: The certification dimension provides data about a specific certification.column_name | column_description |
---|---|
certificate_id | A unique identifier for the certification. |
certification_name | The name of the certification. |
certification_version | The version of the certification. |
created_at | A timestamp when the certification was created. |
updated_at | A timestamp when the certification was updated. |
certification_fact
: The certification fact table provides data about the users’ progress through different stages of a certification.column_name | column_description |
---|---|
group_id | The unique identifier of the group. |
user_id | The unique identifier of the user. References user_dim.user_id |
certificate_id | The unique identifier of the certification. References certification_dim.certification_id. |
started_at | onboarding date |
assessments_completed_at | The timestamp when the user completed the assessments. |
challenge_started_at | The date on which the (first) coding challenge was started. |
challenge_completed_at | The date at which the (last) coding challenge was completed. |
case_study_started_at | timestamp at which the most recent case study was started. |
case_study_completed_at | timestamp at which the most recent case study was completed. |
certificate_granted_at | The date at which the certificate was granted by the admin. |
is_passed_assessments | An indicator whether user passed all timed assessments |
is_passed_challenges | AN indicator whether the user passed all coding challenges |
case_study_status | The current status of the most recent case study attempt |
case_study_title | The title of the case study. |
nb_assessments_taken | The total number of qualifying assessments taken (prior to onboarding counts as a single attempt). |
nb_assessments_passed | The total number of assessments passed (always in line with current status of product). |
nb_challenges_completed | The number of challenges completed. |
nb_challenges_attempts | The total number of times the user submitted the coding challenge. |
challenge_duration_engaged | The total duration engaged in coding challenge (in seconds). |
assessments_duration_engaged | The total duration engaged in assessments (in seconds). |
exercise_dim
: The exercise dimension provides descriptive data about a specific exercise. Because the model considers users’ progress on all exercises (i.e., even exercises that are no longer available), there is a specific row with the id = -1. This row is used in the exercise fact table to refer to a deleted exercise.column_name | column_description |
---|---|
exercise_id | The unique id of the exercise id. |
title | The title of the exercise. |
type | The type of exercise type (e.g., NormalExercise, VideoExercise, MultipleChoiceExercise, …) |
number | The number of the exercise in the chapter, accounting for subexercises. This can be used to sort exercises in order. |
xp | The maximum number of XP a user can get by completing the exercise. |
technology | The course technology (e.g., Tableau, SQL, Python, R, …) |
topic | The course topic (e.g., Data visualization, Programming, Machine Learning, …) |
course_title | The course title the exercise belongs to. |
course_slug | The course slug the exercise belongs to |
course_xp | The maximum number of XP a user can get by completing the course the exercise belongs to. |
course_description | The course description the exercise belongs to. |
course_short_description | A shorter version of the course description the exercise belongs to. |
course_launched_date | Date at which the course went live. |
chapter_title | The chapter title the exercise belongs to. |
chapter_slug | The chapter slug the exercise belongs to. |
chapter_xp | The maximum number of XP a user can get by completing the chapter the exercise belongs to. |
chapter_nb_exercises | The number of exercises in the chapter the exercise belongs to. |
id | [DEPRECATED] Use exercise_id instead. |
exercise_fact
: The exercise fact table provides data about the users’ progress: XP gained and time spent on each exercise. The table provides data for exercises that can be no longer available (and deleted from our database). In this case, the exercise_id value is set to -1 and links to a “deleted exercise”” row in the exercise_dim table.Fact table for exercise content. It provides engagement and xp measurements on exercise.column_name | column_description |
---|---|
user_id | The unique user id. Reference to user_dim.id. |
exercise_id | The unique exercise_id. Reference to exercise_dim.id. |
date_id | The date id. Reference to date_dim.id. |
started_at | The timestamp at which the user started the exercise (as registered by the DataCamp application). |
completed_at | The timestamp at which the user completed the exercise (as registered by the DataCamp application). |
time_spent | How much time (in seconds) the user spent on the exercise? |
xp | How many XP did the user gain by completing the exercise? |
chapter_dim
: The chapter dimension provides descriptive data about a specific chapter. Because the model considers users’ progress on all chapters (i.e., even chapters that are no longer available), there is a specific row with the id = -1. This row is used in the chapter fact table to refer to a deleted chapter.column_name | column_description |
---|---|
chapter_id | The unique id of the chapter id. |
title | The title of the chapter. |
slug | The slug of the chapter. |
xp | The maximum number of XP a user can get by completing the chapter. |
technology | The course technology (e.g., Tableau, SQL, Python, R, …). |
topic | The course topic (e.g., Data visualization, Programming, Machine Learning, …). |
nb_exercises | The number of exercises in the chapter. |
course_title | The title of the course the chapter belongs to. |
course_slug | The slug of the course the chapter belongs to. |
course_xp | The maximum number of XP a user can get by completing the course the chapter belongs to. |
course_description | The description of the course the chapter belongs to. |
course_short_description | A shorter description of the course the chapter belongs to. |
course_launched_date | The date on which the course went live. |
id | [DEPRECATED] Use chapter_id instead. |
chapter_fact
: The chapter fact table provides data about the users’ progress: XP gained and time spent on each chapter. The table provides data for chapters that can be no longer available (and deleted from our database). In this case, the chapter_id value is set to -1 and links to a “deleted chapter” row in the chapter_dim table.column_name | column_description |
---|---|
user_id | The unique id of the user. References to user_dim.user_id. |
chapter_id | The unique id of the chapter. References to chapter_dim.chapter_id. |
date_id | The date on which the user worked on the chapter. References to dim_date.id |
started_at | The timestamp when the user started the chapter (as registered by the DataCamp application). |
completed_at | The timestamp when the user completed the chapter (as registered by the DataCamp application). |
time_spent | The time (in seconds) the user spent on the chapter. |
xp | The XP the user gained by working on the chapter. |
course_dim
: The course dimension provides descriptive data about a specific course. Because the model considers users’ progress on all courses (i.e., even courses that are no longer available), there is a specific row with the id = -1. This row is used in the course fact table to refer to a deleted course.column_name | column_description |
---|---|
course_id | The unique id of the course. |
title | The title of the course. |
technology | The course technology (e.g., Tableau, SQL, Python, R, …). |
topic | The course topic (e.g., Data visualization, Programming, Machine Learning, …). |
xp | The maximum number of XP a user can get by following the course. |
nb_hours_needed | Time needed in hours to complete the course. |
slug | The slug of the course |
description | A description of the course. |
short_description | A short description of the course. |
launched_date | The date on which the course went live. |
id | [DEPRECATED] Use course_id instead. |
course_fact
: The course fact table provides data about the users’ progress: XP gained and time spent on each course. The table provides data for courses that can be no longer available (and deleted from our database). In this case, the course value is set to -1 and links to a “deleted course” row in the course_dim table.column_name | column_description |
---|---|
user_id | The unique id of the user id. References user_dim.user_id |
course_id | The unique id of the course. References course_dim.id |
date_id | The unique id of the date on which the user worked on the course. References to dim_date.id |
started_at | The timestamp when the user started the course (as registered by the DataCamp application). |
completed_at | The timestamp when the user started the course (as registered by the DataCamp application). It is NULL when the user has NOT completed the course yet. |
time_spent | The time (in seconds) the user spent on the course. |
xp | The XP the user gained by working on the course. |
practice_dim
: List of all practicescolumn_name | column_description |
---|---|
practice_id | The practice id (internal mobile_app.pools id) [PK] |
title | The practice name |
status | The practice status (HIDDEN, LIVE) |
technology | The practice’s technology type |
xp | Xp gained by completing the practice |
id | [DEPRECATED] Use practice_id instead. |
practice_fact
: Fact table for practice content. It provides engagement and xp measurements on projects. This table includes practices (aka challenges) after practice replaced challenges.column_name | column_description |
---|---|
user_id | The user id (internal main-app id) [PK], [FK to user_dim.id] |
practice_id | The practice id the user started (internal mobile_app pools id). [PK], [FK to practice_dim.id] |
date_id | The date at which the user worked on the practice [PK], [FK to dim_date.id] |
started_at | The date at which the user started the practice |
completed_at | The date at which the user completed the practice. It is NULL when the user did not complete the practice |
is_mobile | Whether the user did the practice on mobile or in the browser |
time_spent | How much time (in seconds) the user spent on the practice |
xp | How many XP did the user gain by completing the practice |
project_dim
: The project dimension provides descriptive data about a specific project.column_name | column_description |
---|---|
project_id | The unique id of the project. |
title | The title of the course. |
technology | The course technology (R, Python, SQL) |
xp | The maxmimum number of XP a user can get by completing the project. |
nb_hours_needed | The number of hours needed to complete the project. |
is_guided | A boolean indicating whether the project is guided or not. |
is_certification | A boolean indicating whether the project is used for certification. |
description | A description of the project. |
short_description | A short description of the project. |
id | [DEPRECATED] Use project_id instead. |
project_fact
: The project fact table provides data about the users’ project progress: time spent and XP gained.column_name | column_description |
---|---|
user_id | The unique id of the user. References user_dim.user_id |
project_id | The unique id of the project. References project_dim.project_id |
date_id | The id of the date on which the user worked on the project. References date_dim.date_id |
started_at | The timestamp when the user started the project (as registered by the DataCamp application). |
completed_at | The timestamp when the user completed the project (as registered by the DataCamp application). It is NULL when the user has NOT completed the project yet. |
time_spent | The time (in seconds) the user spent on the project. |
xp | The XP the user gained by working on the project. |
track_content_dim
: List of content in all live tracks. This table has one row per track_version and content item. It currently only includes courses and projects. Note that a content item can belong to multiple tracks, and different versions of a track might contain different content items.column_name | column_description |
---|---|
track_version_id | The id of the track version. |
track_id | The id of the track. |
content_id | The id of the content, a concatenation of content type and id. |
content_type | The content type (course / project). |
content_title | The title of the content. |
position | The position of the content in the track version. |
xp | The total xp earned on completion of content. |
track_content_fact
: Fact table for track content.column_name | column_description |
---|---|
group_id | The id of the group. |
user_id | The id of the user. |
track_version_id | The id of the track version. |
content_id | The id of the content, a concatenation of content type and id. |
date_id | The unique id of the date on which the enrolled in the track. References to dim_date.id |
started_at | The timestamp when the user enrolled in the track. |
completed_at | The timestamp when the user completed the track. |
xp | The xp gained by completing the content. |
nb_seconds | The total number of seconds spent on the content. |
track_dim
: List of all versions of live tracks, along with their title, technology, and descriptions. This table has one row for every track version, for tracks that are live.column_name | column_description |
---|---|
track_version_id | The id of the track version. |
track_id | The id of the track. |
version_number | The version of the track. |
title | The title concatenated with subtitle of the track version. |
technology | The technology of the track. |
category | The category of the track. |
short_description | A short description of the track |
published_live_at | The timestamp for when the track version was published live. |
archived_at | The timestamp for when the track version was archived. |
is_current_version | A boolean indicating if this is the current version. |
is_custom | A boolean indicating if this is a custom track. |
nb_courses | The number of courses in the track version. |
nb_projects | The number of projects in the track version. |
xp | The total xp gained on completion of the track. |
track_fact
: Fact table for tracks.column_name | column_description |
---|---|
group_id | The id of the group. |
user_id | The id of the user. |
track_version_id | The id of the track version. |
date_id | The unique id of the date on which the enrolled in the track. References to dim_date.id |
started_at | The timestamp when the user enrolled in the track. |
completed_at | The timestamp when the user completed the track. |
is_currently_active | A boolean indicating if the user is currently active in the track |
publication_fact
: The publication fact table provides data about the users’ workspace publications: number of viewers and time spent viewing by date and viewer type (creator / viewer).column_name | column_description |
---|---|
group_id | The unique identifier of the group. |
creator_id | The unique identifier of the user who created the workspace. References user_dim.user_id. |
workspace_id | The unique identifier of the workspace. References workspace_dim.workspace_id |
viewed_at | The date on which the workspace publication was viewed. |
nb_viewers | The number of unique viewers who viewed the publication. |
nb_seconds | The number of seconds spent viewing the publication. |
viewer_type | The type of viewer (creator / viewer) |
workspace_dim
: The workspace dimension table provides descriptive data about a specific workspace.column_name | column_description |
---|---|
group_id | The unique identifier of the group, the workspace creator belonged to. |
workspace_id | The unique identifier of the workspace. |
technology | The language of the workspace (R / Python) |
category | The category of the source template. - base: - dataset - recipe - playbook - project - boilerplate |
key | The template key (NULL for non-template types). |
is_featured | A boolean indicating if the workspace is featured on the profile. |
workspace_fact
: The workspace fact table provides data about group workspaces created by users. Note that if a user in a group creates a workspace as part of their personal account, it is NOT included in this model.column_name | column_description |
---|---|
group_id | The unique identifier of the group. |
creator_id | The user id of the creator of the workspace. References user_dim.user_id. |
workspace_id | The id of the workspace. References workspace_dim.workspace_id. |
nb_megabytes | The size of files in the workspace measured in megabytes. |
nb_hours | The number of hours spent editing the workspace. |
nb_attempts_to_publish | The number of times the workspace was published. |
nb_times_published_successfully | The number of times the workspace was published successfully. |
nb_upvotes | The number of upvotes received. |
created_at | The timestamp for when the workspace is created. |
updated_at | The timstamp for when the workspace is updated. |
first_edited_at | The timestamp for when the workspace was first edited. |
last_edited_at | The timestamp for when the workspace was last edited. |
first_published_at | The timestamp for when the workspace was first published. |
last_published_at | The timestamp for when the workspace was last edited. |
first_integration_added_at | The timestamp for when the first integration was added to the workspace. |
last_integration_added_at | The timestamp for when the last integration was added to the workspace. |
first_upvoted_at | The timestamp for when the first upvote was received. |
last_upvoted_at | The timestamp for when the last upvote was received. |
team_dim
: The team dimension provides descriptive data about teams.column_name | column_description |
---|---|
team_id | The unique identifier of the team |
name | The team name |
slug | The team slug |
created_date | The team’s creation date |
updated_date | The team’s updated date |
deleted_date | The team’s deleted date |
id | [DEPRECATED] Use team_id instead. |
user_dim
: List of all enterprise userscolumn_name | column_description |
---|---|
user_id | The user id (internal main-app id) [PK] |
first_name | The user first name |
last_name | The user last name |
email | The user email |
slug | The slug to the user profile |
registered_at | When the user registered |
deleted_at | When the user was deleted |
last_visit_at | When the user visited (browsed) the plarform for the last time |
last_time_spent_at | When the user spent time on the platform (campus, challenges, projects, mobile) for the last time |
onboarding_completed_at | When the user completed their onboarding |
first_content_completed_at | When the user completed their first content |
id | [DEPRECATED] Use user_id instead |