Data Model
The data model for the data connector provides data on activity for users in a group across different content types. The currently supported content types are:
Assessments
Assignments
Certification
Courses
Practice
Projects
Tracks
DataLab (formerly Workspace)
How to work with the data?
The data available from the Data Connector is modelled according to a dimensional model. This means there are fact, dimension and bridge tables available.
Facts are the measurements/metrics or facts from your user activity on DataCamp.
course_fact
for example contains measures like XP and time_spent on courses by your users.
assessment_fact
contains measures like score and percentile for the assessment scores your users have registered on DataCamp.
You can join the fact tables with the dimension tables to summarize XP and time spent across technology, topic etc.
The data model also provides dimension tables at the user level user_dim
, team_dim
, group_dim
, and bridge tables user_team_bridge
to facilitate analysis at the team or user level.
For example, you can aggregate the xp
gained and time_spent
by week across technology
and team
with the following query.
Warning! The Data Connector's fact tables only include:
learning activity for the dates on which a user was part of the group.
learning activity for the dates on which the group had an active subscription.
learning activity performed by users with all license types, including basic.
learning activity performed in the DataCamp desktop app.
For now, the Data Connector does not include time spent on mobile, this could result in slight differences from other sources of reporting such as exports or our Group Hub.
For example, consider a user A who joined group 1234 on Jan 1st, started a course on Jan 2nd, left the group on Jan 3rd while continuing to work on the course, rejoined the group on Jan 4th, and completed the course on Jan 5th.
In this case, the fact tables for group 1234 will not contain data for User A for January 3rd (even though the user continued working on the course).
Assessments
The Data Connector includes certification and private (test-out) assessments, which differ from standard assessments.
Certification assessments evaluate your knowledge within the Certification process. Private (test-out) assessments are used to skip content within a track if you pass a certain threshold. Reports in the Group Hub do not include certification and private (test-out) assessments. As a result, the reports in the Group hub may show a slight difference in numbers compared to those reported by the Data Connector.
Assessment Dim
assessment_dim
: The assessment dimension provides descriptive data about assessments.
column_name | column_description |
---|---|
assessment_id | The unique id of the assessment. |
title | The title of the assessment. |
slug | The slug of The assessment. |
technology | The assessment technology (e.g., Python, R, SQL) |
id | [DEPRECATED] Use assessment_id instead. |
Assessment Fact
assessment_fact
: The assessment fact table provides data about the users’ assessment results: score and percentile obtained and time spent on each assessment.
column_name | column_description |
---|---|
user_id | The unique id of the user id. References user_dim.id |
assessment_id | The unique id of the assessment. References assessment_dim.id |
date_id | The identifier of the date when the user worked on the assessment References dim_date.id |
started_at | The timestamp when the user started the assessment (as registered by the DataCamp application) |
completed_at | The timestamp when the user completed the assessment (as registered by the DataCamp application). It is NULL when the user did not complete the assessment. |
score | The user score for the assessment. |
score_group | The user score group for the assessment. A score of < 70 is “Novice”, 70 - 100 as “Intermediate Lower”, 100 - 130 as “Intermediate”, 130 - 160 as “Intermediate Upper”, and > 160 as “Advanced”. |
percentile | The percentile the user belongs in. |
time_spent | The time (in seconds) the user spent on the assessment. |
Assignments
Assignment Dim
assignment_dim
: The assignment dimension table provides data about a specific assignment.
column_name | column_description |
---|---|
group_id | The unique identifier for the group. |
assignment_id | The unique identifier for the assignment. |
created_by_id | The user_id of the user who created the assignment. References user_dim.user_id. |
assignment_type | The type of assignment (chapter / course / project / assessment / customtrack / xp) |
assignment_title | The title of the content item assigned. It is NULL for assignment_type = ‘xp’, where no content is assigned. |
assignment_status | The status of the assignemnt (archived / active). |
assignee_type | The type of assignee (user / team / group). |
assignment_xp | The XP achieved by completing the assignment. |
technology | The technology of the associated content item. It is NULL for assignment_type = ‘xp’, where no content is assigned. |
topic | The topic of the associated content item. It is NULL for assignment_type = ‘xp’, where no content is assigned. |
due_at | The timestamp when the assignment is due. |
created_at | The timestamp when the assignment was created. |
updated_at | The timestamp when the assignment was updated. |
deleted_at | The timestamp when the assignment was deleted. |
Assignment Fact
assignment_fact
: The assignment fact table provides data about the users’ progress on each assignment.
column_name | column_description |
---|---|
group_id | The unique identifier of the group. |
user_id | The unique identifier of the user. References user_dim.user_id |
assignment_id | The unique identifier of the assignment. References assignment_dim.assignment_id. |
is_completed | A boolean indicating if the user completed the assignment. |
completion_status | The status of completion of the assignment. It can be completed / late / in_progress / missed. |
assigned_at | The timestamp when the assignment was created. |
due_at | The timestamp when the assignment is due. |
completed_at | The timestamp when the user completed the assignment. |
email_sent_at | The timestamp when an email was sent to the user. |
reminder_sent_at | The timestamp when a reminder was sent to the user. |
Certifications
Certification Dim
certification_dim
: The certification dimension provides data about a specific certification. Certification ID 1 and 2 are the so-called v1 iterations of certifications. From June 2022 onwards, we switched to a process where an official certification registration was needed, and this registration would time out after 30 days.
column_name | column_description |
---|---|
certification_id | A unique identifier for the certification. |
certification_name | The name of the certification. |
certification_version | The version of the certification. |
Certification Fact
certification_fact
: The certification fact table provides data about the users’ progress through different stages of a certification.
column_name | column_description |
---|---|
group_id | The unique identifier of the group. |
user_certification_id | The unique identifier of a user attempting a certification. A user can have more than one attempt at obtaining a certification. |
user_id | The unique identifier of the user. References user_dim.user_id |
certification_id | The unique identifier of the certification. References certification_dim.certification_id. |
registered_at | The timestamp when the user registered for the certification attempt. |
expired_at | The timestamp when the user certification attempt (will) expire. |
assessments_passed_at | (v1 only) The timestamp when the user passed the assessments. |
coding_challenge_passed_at | (v1 only) The timestamp when the user passed the coding challenge(s). |
first_exam_started_at | (v2 only) The timestamp when the user started their first certification exam in an attempt. |
first_exam_completed_at | (v2 only) The timestamp when the user completed their first certification exam in an attempt. |
exam_last_passed_at | (v2 only) The timestamp when the user passed their last certification exam. |
case_study_ready_at | The timestamp when the user passed all stages prior to the practical exam. |
first_case_study_submitted_at | The timestamp when the first practical exam was submitted for grading. |
last_case_study_submitted_at | The timestamp for then the last practical exam was submitted for grading. |
certificate_granted_at | The date at which the certificate was granted by the admin. |
is_passed_assessments | An indicator whether user passed all timed assessments. |
is_passed_challenges | An indicator whether the user passed all coding challenges. |
is_passed_exams | An indicator whether the user passed all certification exams. |
Courses
Exercise Dim
exercise_dim
: The exercise dimension provides descriptive data about a specific exercise. Because the model considers users’ progress on all exercises (i.e., even exercises that are no longer available), there is a specific row with the id = -1. This row is used in the exercise fact table to refer to a deleted exercise.
column_name | column_description |
---|---|
exercise_id | The unique id of the exercise id. |
title | The title of the exercise. |
type | The type of exercise type (e.g., NormalExercise, VideoExercise, MultipleChoiceExercise, …) |
number | The number of the exercise in the chapter, accounting for subexercises. This can be used to sort exercises in order. |
xp | The maximum number of XP a user can get by completing the exercise. |
technology | The course technology (e.g., Tableau, SQL, Python, R, …) |
topic | The course topic (e.g., Data visualization, Programming, Machine Learning, …) |
course_title | The course title the exercise belongs to. |
course_slug | The course slug the exercise belongs to |
course_xp | The maximum number of XP a user can get by completing the course the exercise belongs to. |
course_description | The course description the exercise belongs to. |
course_short_description | A shorter version of the course description the exercise belongs to. |
course_launched_date | Date at which the course went live. |
chapter_title | The chapter title the exercise belongs to. |
chapter_slug | The chapter slug the exercise belongs to. |
chapter_xp | The maximum number of XP a user can get by completing the chapter the exercise belongs to. |
chapter_nb_exercises | The number of exercises in the chapter the exercise belongs to. |
id | [DEPRECATED] Use exercise_id instead. |
Exercise Fact
exercise_fact
: The exercise fact table provides data about the users’ progress: XP gained and time spent on each exercise. The table provides data for exercises that can be no longer available (and deleted from our database). In this case, the exercise_id value is set to -1 and links to a “deleted exercise”” row in the exercise_dim table.Fact table for exercise content. It provides engagement and xp measurements on exercise.
column_name | column_description |
---|---|
user_id | The unique user id. Reference to user_dim.id. |
exercise_id | The unique exercise_id. Reference to exercise_dim.id. |
date_id | The date id. Reference to date_dim.id. |
started_at | The timestamp at which the user started the exercise (as registered by the DataCamp application). |
completed_at | The timestamp at which the user completed the exercise (as registered by the DataCamp application). |
time_spent | How much time (in seconds) the user spent on the exercise? |
xp | How many XP did the user gain by completing the exercise? |
Chapter Dim
chapter_dim
: The chapter dimension provides descriptive data about a specific chapter. Because the model considers users’ progress on all chapters (i.e., even chapters that are no longer available), there is a specific row with the id = -1. This row is used in the chapter fact table to refer to a deleted chapter.
column_name | column_description |
---|---|
chapter_id | The unique id of the chapter id. |
title | The title of the chapter. |
slug | The slug of the chapter. |
xp | The maximum number of XP a user can get by completing the chapter. |
technology | The course technology (e.g., Tableau, SQL, Python, R, …). |
topic | The course topic (e.g., Data visualization, Programming, Machine Learning, …). |
nb_exercises | The number of exercises in the chapter. |
course_title | The title of the course the chapter belongs to. |
course_slug | The slug of the course the chapter belongs to. |
course_xp | The maximum number of XP a user can get by completing the course the chapter belongs to. |
course_description | The description of the course the chapter belongs to. |
course_short_description | A shorter description of the course the chapter belongs to. |
course_launched_date | The date on which the course went live. |
id | [DEPRECATED] Use chapter_id instead. |
Chapter Fact
chapter_fact
: The chapter fact table provides data about the users’ progress: XP gained and time spent on each chapter. The table provides data for chapters that can be no longer available (and deleted from our database). In this case, the chapter_id value is set to -1 and links to a “deleted chapter” row in the chapter_dim table.
column_name | column_description |
---|---|
user_id | The unique id of the user. References to user_dim.user_id. |
chapter_id | The unique id of the chapter. References to chapter_dim.chapter_id. |
date_id | The date on which the user worked on the chapter. References to dim_date.id |
started_at | The timestamp when the user started the chapter (as registered by the DataCamp application). |
completed_at | The timestamp when the user completed the chapter (as registered by the DataCamp application). |
time_spent | The time (in seconds) the user spent on the chapter. |
xp | The XP the user gained by working on the chapter. |
Course Dim
course_dim
: The course dimension provides descriptive data about a specific course. Because the model considers users’ progress on all courses (i.e., even courses that are no longer available), there is a specific row with the id = -1. This row is used in the course fact table to refer to a deleted course.
column_name | column_description |
---|---|
course_id | The unique id of the course. |
title | The title of the course. |
technology | The course technology (e.g., Tableau, SQL, Python, R, …). |
topic | The course topic (e.g., Data visualization, Programming, Machine Learning, …). |
xp | The maximum number of XP a user can get by following the course. |
nb_hours_needed | Time needed in hours to complete the course. |
slug | The slug of the course |
description | A description of the course. |
short_description | A short description of the course. |
launched_date | The date on which the course went live. |
id | [DEPRECATED] Use course_id instead. |
Course Fact
course_fact
: The course fact table provides data about the users’ progress: XP gained and time spent on each course. The table provides data for courses that can be no longer available (and deleted from our database). In this case, the course value is set to -1 and links to a “deleted course” row in the course_dim table.
column_name | column_description |
---|---|
user_id | The unique id of the user id. References user_dim.user_id |
course_id | The unique id of the course. References course_dim.id |
date_id | The unique id of the date on which the user worked on the course. References to dim_date.id |
started_at | The timestamp when the user started the course (as registered by the DataCamp application). |
completed_at | The timestamp when the user started the course (as registered by the DataCamp application). It is NULL when the user has NOT completed the course yet. |
time_spent | The time (in seconds) the user spent on the course. |
xp | The XP the user gained by working on the course. |
Practice
Practice Dim
practice_dim
: List of all practices
column_name | column_description |
---|---|
practice_id | The practice id (internal mobile_app.pools id) [PK] |
title | The practice name |
status | The practice status (HIDDEN, LIVE) |
technology | The practice’s technology type |
xp | Xp gained by completing the practice |
id | [DEPRECATED] Use practice_id instead. |
Practice Fact
practice_fact
: Fact table for practice content. It provides engagement and xp measurements on projects. This table includes practices (aka challenges) after practice replaced challenges.
column_name | column_description |
---|---|
user_id | The user id (internal id) [PK], [FK to user_dim.id] |
practice_id | The practice id the user started (internal mobile_app pools id). [PK], [FK to practice_dim.id] |
date_id | The date at which the user worked on the practice [PK], [FK to dim_date.id] |
started_at | The date at which the user started the practice |
completed_at | The date at which the user completed the practice. It is NULL when the user did not complete the practice |
is_mobile | Whether the user did the practice on mobile or in the browser |
time_spent | How much time (in seconds) the user spent on the practice |
xp | How many XP did the user gain by completing the practice |
Projects
The Data Connector includes certification projects, which differ from standard projects.
Certification projects are used exclusively during the Certification process. Reports in the Group Hub do not include certification projects. As a result, the reports in the Group hub may show a slight difference in numbers compared to those reported by the Data Connector.
Project Dim
project_dim
: The project dimension provides descriptive data about a specific project.
column_name | column_description |
---|---|
project_id | The unique id of the project. |
title | The title of the course. |
technology | The course technology (R, Python, SQL) |
xp | The maxmimum number of XP a user can get by completing the project. |
nb_hours_needed | The number of hours needed to complete the project. |
is_guided | A boolean indicating whether the project is guided or not. |
is_certification | A boolean indicating whether the project is used for certification. |
description | A description of the project. |
short_description | A short description of the project. |
id | [DEPRECATED] Use project_id instead. |
Project Fact
project_fact
: The project fact table provides data about the users’ project progress: time spent and XP gained.
column_name | column_description |
---|---|
user_id | The unique id of the user. References user_dim.user_id |
project_id | The unique id of the project. References project_dim.project_id |
date_id | The id of the date on which the user worked on the project. References date_dim.date_id |
started_at | The timestamp when the user started the project (as registered by the DataCamp application). |
completed_at | The timestamp when the user completed the project (as registered by the DataCamp application). It is NULL when the user has NOT completed the project yet. |
time_spent | The time (in seconds) the user spent on the project. |
xp | The XP the user gained by working on the project. |
Tracks
Track Content Dim
track_content_dim
: List of content in all live tracks. This table has one row per track_version and content item. It currently only includes courses and projects. Note that a content item can belong to multiple tracks, and different versions of a track might contain different content items.
column_name | column_description |
---|---|
track_version_id | The id of the track version. |
track_id | The id of the track. |
content_id | The id of the content, a concatenation of content type and id. |
content_type | The content type (course / project). |
content_title | The title of the content. |
position | The position of the content in the track version. |
xp | The total xp earned on completion of content. |
Track Content Fact
track_content_fact
: Fact table for track content.
column_name | column_description |
---|---|
group_id | The id of the group. |
user_id | The id of the user. |
track_version_id | The id of the track version. |
content_id | The id of the content, a concatenation of content type and id. |
date_id | The unique id of the date on which the enrolled in the track. References to dim_date.id |
started_at | The timestamp when the user enrolled in the track. |
completed_at | The timestamp when the user completed the track. |
xp | The xp gained by completing the content. |
nb_seconds | The total number of seconds spent on the content. |
Track Dim
track_dim
: List of all versions of live tracks, along with their title, technology, and descriptions. This table has one row for every track version, for tracks that are live.
column_name | column_description |
---|---|
track_version_id | The id of the track version. |
track_id | The id of the track. |
version_number | The version of the track. |
title | The title concatenated with subtitle of the track version. |
technology | The technology of the track. |
category | The category of the track. |
short_description | A short description of the track |
published_live_at | The timestamp for when the track version was published live. |
archived_at | The timestamp for when the track version was archived. |
is_current_version | A boolean indicating if this is the current version. |
is_custom | A boolean indicating if this is a custom track. |
nb_courses | The number of courses in the track version. |
nb_projects | The number of projects in the track version. |
xp | The total xp gained on completion of the track. |
Track Fact
track_fact
: Fact table for tracks.
column_name | column_description |
---|---|
group_id | The id of the group. |
user_id | The id of the user. |
track_version_id | The id of the track version. |
date_id | The unique id of the date on which the enrolled in the track. References to dim_date.id |
started_at | The timestamp when the user enrolled in the track. |
completed_at | The timestamp when the user completed the track. |
is_currently_active | A boolean indicating if the user is currently active in the track |
DataLab
Note: In March of 2024, Workspaces was renamed DataLab. Individual workspaces are now called workbooks. We have kept the column names in the previous format to prevent any problems with existing connections but have updated other references where appropriate.
Publication Fact
publication_fact
: The publication fact table provides data about the users’ DataLab publications: number of viewers and time spent viewing by date and viewer type (creator / viewer).
column_name | column_description |
---|---|
group_id | The unique identifier of the group. |
creator_id | The unique identifier of the user who created the DataLab workbook. References user_dim.user_id. |
workspace_id | The unique identifier of the DataLab Workbook. References workspace_dim.workspace_id |
viewed_at | The date on which the DataLab publication was viewed. |
nb_viewers | The number of unique viewers who viewed the publication. |
nb_seconds | The number of seconds spent viewing the publication. |
viewer_type | The type of viewer (creator / viewer) |
DataLab Dim
workspace_dim
: The DataLab dimension table provides descriptive data about a specific DataLab workbook.
column_name | column_description |
---|---|
workspace_id | The unique identifier of the DataLab workbook. |
workspace_title | The title of the workbook. |
technology | The language of the workbook (R / Python) |
owner_type | Whether the owner is individual or group. |
category | The category of the source template. - base: - dataset - recipe - playbook - project - boilerplate |
key | The template key (NULL for non-template types). |
is_featured | A boolean indicating if the workbook is featured on the profile. |
group_id | The unique identifier of the group, the workbook creator belonged to. |
DataLab Workbook Fact
workspace_fact
: The workbook fact table provides data about the workbook created by users.
column_name | column_description |
---|---|
creator_id | The user id of the creator of the workbook. References user_dim.user_id. |
workspace_id | The id of the workspace. References workspace_dim.workspace_id. |
nb_megabytes | The size of files in the workbook measured in megabytes. |
nb_attempts_to_publish | The number of times the workbook was published. |
nb_times_published_successfully | The number of times the workbook was published successfully. |
nb_upvotes | The number of upvotes received. |
nb_shares | The number of times the workbook was shared. |
created_at | The timestamp for when the workbook is created. |
updated_at | The timstamp for when the workbook is updated. |
first_edited_at | The timestamp for when the workbook was first edited. |
last_edited_at | The timestamp for when the workbook was last edited. |
first_published_at | The timestamp for when the workbook was first published. |
last_published_at | The timestamp for when the workbook was last edited. |
first_integration_added_at | The timestamp for when the first integration was added to the workbook. |
last_integration_added_at | The timestamp for when the last integration was added to the workbook. |
first_upvoted_at | The timestamp for when the first upvote was received. |
last_upvoted_at | The timestamp for when the last upvote was received. |
group_id | The id of the group owner of the workbook. |
Workspace Visit Fact
workspace_visit_fact
: The workspace (workbook) visit fact table provides data on user visits to workbooks.
column_name | column_description |
---|---|
visitor_id | The user id of the visitor to the workbook. References user_dim.user_id. |
workspace_id | The id of the workbook. References workspace_dim.workspace_id. |
visited_at | The date of the visit, granular to the day level. |
nb_seconds | The duration of the visit in seconds. |
group_id | The id of the group. |
User Teams
Team Dim
team_dim
: The team dimension provides descriptive data about teams.
column_name | column_description |
---|---|
team_id | The unique identifier of the team |
name | The team name |
slug | The team slug |
created_date | The team’s creation date |
updated_date | The team’s updated date |
deleted_date | The team’s deleted date |
id | [DEPRECATED] Use team_id instead. |
User Dim
user_dim
: List of all enterprise users
column_name | column_description |
---|---|
user_id | The user id (internal id) [PK] |
first_name | The user first name |
last_name | The user last name |
The user email | |
slug | The slug to the user profile |
registered_at | When the user registered |
deleted_at | When the user was deleted |
last_visit_at | When the user visited (browsed) the plarform for the last time |
last_time_spent_at | When the user spent time on the platform (campus, challenges, projects, mobile) for the last time |
onboarding_completed_at | When the user completed their onboarding |
first_content_completed_at | When the user completed their first content |
id | [DEPRECATED] Use user_id instead |
User Team Bridge
user_team_bridge
: Bridge table to link users to their teams
column_name | column_description |
---|---|
user_id | The user id (internal id) [PK] |
team_id | The team id (internal id) [PK] |
joined_team_date | Date at which the user joined the team |
left_team_date | Date at which the user left the team |
Others
Xp Fact
xp_fact
: This fact table provides information on XP gained by a user across different content modalities.
column_name | column_description |
---|---|
group_id | The unique identifier of the group. (references dim_groups.group_id) |
user_id | The unique identifier of the user. (References dim_users.user_id) |
event | The event for which XP was gained ( course_exercise_completed, alpa_onboarding_completed etc.) |
created_date | The date on which the XP was gained |
xp | The number of XP gained. |
Last updated