Data Model
The data model for the data connector provides data on activity for users in a group across different content types. The currently supported content types are:
Assessments
Assignments
Certification
Courses
Practice
Projects
Tracks
DataLab (formerly Workspace)
How to work with the data?
The data available from the Data Connector is modelled according to a dimensional model. This means there are fact, dimension and bridge tables available.
Facts are the measurements/metrics or facts from your user activity on DataCamp.
course_fact
for example contains measures like XP and time_spent on courses by your users.
assessment_fact
contains measures like score and percentile for the assessment scores your users have registered on DataCamp.
You can join the fact tables with the dimension tables to summarize XP and time spent across technology, topic etc.
The data model also provides dimension tables at the user level user_dim
, team_dim
, group_dim
, and bridge tables user_team_bridge
to facilitate analysis at the team or user level.
For example, you can aggregate the xp
gained and time_spent
by week across technology
and team
with the following query.
Warning! The Data Connector's fact tables only include:
learning activity for the dates on which a user was part of the group.
learning activity for the dates on which the group had an active subscription.
learning activity performed by users with all license types, including basic.
learning activity performed in the DataCamp desktop app.
For now, the Data Connector does not include time spent on mobile, this could result in slight differences from other sources of reporting such as exports or our Group Hub.
For example, consider a user A who joined group 1234 on Jan 1st, started a course on Jan 2nd, left the group on Jan 3rd while continuing to work on the course, rejoined the group on Jan 4th, and completed the course on Jan 5th.
In this case, the fact tables for group 1234 will not contain data for User A for January 3rd (even though the user continued working on the course).
Assessments
The Data Connector includes certification and private (test-out) assessments, which differ from standard assessments.
Certification assessments evaluate your knowledge within the Certification process. Private (test-out) assessments are used to skip content within a track if you pass a certain threshold. Reports in the Group Hub do not include certification and private (test-out) assessments. As a result, the reports in the Group hub may show a slight difference in numbers compared to those reported by the Data Connector.
Assessment Dim
assessment_dim
: The assessment dimension provides descriptive data about assessments.
assessment_id
The unique id of the assessment.
title
The title of the assessment.
slug
The slug of The assessment.
technology
The assessment technology (e.g., Python, R, SQL)
id
[DEPRECATED] Use assessment_id instead.
Assessment Fact
assessment_fact
: The assessment fact table provides data about the users’ assessment results: score and percentile obtained and time spent on each assessment.
user_id
The unique id of the user id. References user_dim.id
assessment_id
The unique id of the assessment. References assessment_dim.id
date_id
The identifier of the date when the user worked on the assessment References dim_date.id
started_at
The timestamp when the user started the assessment (as registered by the DataCamp application)
completed_at
The timestamp when the user completed the assessment (as registered by the DataCamp application). It is NULL when the user did not complete the assessment.
score
The user score for the assessment.
score_group
The user score group for the assessment. A score of < 70 is “Novice”, 70 - 100 as “Intermediate Lower”, 100 - 130 as “Intermediate”, 130 - 160 as “Intermediate Upper”, and > 160 as “Advanced”.
percentile
The percentile the user belongs in.
time_spent
The time (in seconds) the user spent on the assessment.
Assignments
Assignment Dim
assignment_dim
: The assignment dimension table provides data about a specific assignment.
group_id
The unique identifier for the group.
assignment_id
The unique identifier for the assignment.
created_by_id
The user_id of the user who created the assignment. References user_dim.user_id.
assignment_type
The type of assignment (chapter / course / project / assessment / customtrack / xp)
assignment_title
The title of the content item assigned. It is NULL for assignment_type = ‘xp’, where no content is assigned.
assignment_status
The status of the assignemnt (archived / active).
assignee_type
The type of assignee (user / team / group).
assignment_xp
The XP achieved by completing the assignment.
technology
The technology of the associated content item. It is NULL for assignment_type = ‘xp’, where no content is assigned.
topic
The topic of the associated content item. It is NULL for assignment_type = ‘xp’, where no content is assigned.
due_at
The timestamp when the assignment is due.
created_at
The timestamp when the assignment was created.
updated_at
The timestamp when the assignment was updated.
deleted_at
The timestamp when the assignment was deleted.
Assignment Fact
assignment_fact
: The assignment fact table provides data about the users’ progress on each assignment.
group_id
The unique identifier of the group.
user_id
The unique identifier of the user. References user_dim.user_id
assignment_id
The unique identifier of the assignment. References assignment_dim.assignment_id.
is_completed
A boolean indicating if the user completed the assignment.
completion_status
The status of completion of the assignment. It can be completed / late / in_progress / missed.
assigned_at
The timestamp when the assignment was created.
due_at
The timestamp when the assignment is due.
completed_at
The timestamp when the user completed the assignment.
email_sent_at
The timestamp when an email was sent to the user.
reminder_sent_at
The timestamp when a reminder was sent to the user.
Certifications
Certification Dim
certification_dim
: The certification dimension provides data about a specific certification. Certification ID 1 and 2 are the so-called v1 iterations of certifications. From June 2022 onwards, we switched to a process where an official certification registration was needed, and this registration would time out after 30 days.
certification_id
A unique identifier for the certification.
certification_name
The name of the certification.
certification_version
The version of the certification.
Certification Fact
certification_fact
: The certification fact table provides data about the users’ progress through different stages of a certification.
group_id
The unique identifier of the group.
user_certification_id
The unique identifier of a user attempting a certification. A user can have more than one attempt at obtaining a certification.
user_id
The unique identifier of the user. References user_dim.user_id
certification_id
The unique identifier of the certification. References certification_dim.certification_id.
registered_at
The timestamp when the user registered for the certification attempt.
expired_at
The timestamp when the user certification attempt (will) expire.
assessments_passed_at
(v1 only) The timestamp when the user passed the assessments.
coding_challenge_passed_at
(v1 only) The timestamp when the user passed the coding challenge(s).
first_exam_started_at
(v2 only) The timestamp when the user started their first certification exam in an attempt.
first_exam_completed_at
(v2 only) The timestamp when the user completed their first certification exam in an attempt.
exam_last_passed_at
(v2 only) The timestamp when the user passed their last certification exam.
case_study_ready_at
The timestamp when the user passed all stages prior to the practical exam.
first_case_study_submitted_at
The timestamp when the first practical exam was submitted for grading.
last_case_study_submitted_at
The timestamp for then the last practical exam was submitted for grading.
certificate_granted_at
The date at which the certificate was granted by the admin.
is_passed_assessments
An indicator whether user passed all timed assessments.
is_passed_challenges
An indicator whether the user passed all coding challenges.
is_passed_exams
An indicator whether the user passed all certification exams.
Courses
Exercise Dim
exercise_dim
: The exercise dimension provides descriptive data about a specific exercise. Because the model considers users’ progress on all exercises (i.e., even exercises that are no longer available), there is a specific row with the id = -1. This row is used in the exercise fact table to refer to a deleted exercise.
exercise_id
The unique id of the exercise id.
title
The title of the exercise.
type
The type of exercise type (e.g., NormalExercise, VideoExercise, MultipleChoiceExercise, …)
number
The number of the exercise in the chapter, accounting for subexercises. This can be used to sort exercises in order.
xp
The maximum number of XP a user can get by completing the exercise.
technology
The course technology (e.g., Tableau, SQL, Python, R, …)
topic
The course topic (e.g., Data visualization, Programming, Machine Learning, …)
course_title
The course title the exercise belongs to.
course_slug
The course slug the exercise belongs to
course_xp
The maximum number of XP a user can get by completing the course the exercise belongs to.
course_description
The course description the exercise belongs to.
course_short_description
A shorter version of the course description the exercise belongs to.
course_launched_date
Date at which the course went live.
chapter_title
The chapter title the exercise belongs to.
chapter_slug
The chapter slug the exercise belongs to.
chapter_xp
The maximum number of XP a user can get by completing the chapter the exercise belongs to.
chapter_nb_exercises
The number of exercises in the chapter the exercise belongs to.
id
[DEPRECATED] Use exercise_id instead.
Exercise Fact
exercise_fact
: The exercise fact table provides data about the users’ progress: XP gained and time spent on each exercise. The table provides data for exercises that can be no longer available (and deleted from our database). In this case, the exercise_id value is set to -1 and links to a “deleted exercise”” row in the exercise_dim table.Fact table for exercise content. It provides engagement and xp measurements on exercise.
user_id
The unique user id. Reference to user_dim.id.
exercise_id
The unique exercise_id. Reference to exercise_dim.id.
date_id
The date id. Reference to date_dim.id.
started_at
The timestamp at which the user started the exercise (as registered by the DataCamp application).
completed_at
The timestamp at which the user completed the exercise (as registered by the DataCamp application).
time_spent
How much time (in seconds) the user spent on the exercise?
xp
How many XP did the user gain by completing the exercise?
Chapter Dim
chapter_dim
: The chapter dimension provides descriptive data about a specific chapter. Because the model considers users’ progress on all chapters (i.e., even chapters that are no longer available), there is a specific row with the id = -1. This row is used in the chapter fact table to refer to a deleted chapter.
chapter_id
The unique id of the chapter id.
title
The title of the chapter.
slug
The slug of the chapter.
xp
The maximum number of XP a user can get by completing the chapter.
technology
The course technology (e.g., Tableau, SQL, Python, R, …).
topic
The course topic (e.g., Data visualization, Programming, Machine Learning, …).
nb_exercises
The number of exercises in the chapter.
course_title
The title of the course the chapter belongs to.
course_slug
The slug of the course the chapter belongs to.
course_xp
The maximum number of XP a user can get by completing the course the chapter belongs to.
course_description
The description of the course the chapter belongs to.
course_short_description
A shorter description of the course the chapter belongs to.
course_launched_date
The date on which the course went live.
id
[DEPRECATED] Use chapter_id instead.
Chapter Fact
chapter_fact
: The chapter fact table provides data about the users’ progress: XP gained and time spent on each chapter. The table provides data for chapters that can be no longer available (and deleted from our database). In this case, the chapter_id value is set to -1 and links to a “deleted chapter” row in the chapter_dim table.
user_id
The unique id of the user. References to user_dim.user_id.
chapter_id
The unique id of the chapter. References to chapter_dim.chapter_id.
date_id
The date on which the user worked on the chapter. References to dim_date.id
started_at
The timestamp when the user started the chapter (as registered by the DataCamp application).
completed_at
The timestamp when the user completed the chapter (as registered by the DataCamp application).
time_spent
The time (in seconds) the user spent on the chapter.
xp
The XP the user gained by working on the chapter.
Course Dim
course_dim
: The course dimension provides descriptive data about a specific course. Because the model considers users’ progress on all courses (i.e., even courses that are no longer available), there is a specific row with the id = -1. This row is used in the course fact table to refer to a deleted course.
course_id
The unique id of the course.
title
The title of the course.
technology
The course technology (e.g., Tableau, SQL, Python, R, …).
topic
The course topic (e.g., Data visualization, Programming, Machine Learning, …).
xp
The maximum number of XP a user can get by following the course.
nb_hours_needed
Time needed in hours to complete the course.
slug
The slug of the course
description
A description of the course.
short_description
A short description of the course.
launched_date
The date on which the course went live.
id
[DEPRECATED] Use course_id instead.
Course Fact
course_fact
: The course fact table provides data about the users’ progress: XP gained and time spent on each course. The table provides data for courses that can be no longer available (and deleted from our database). In this case, the course value is set to -1 and links to a “deleted course” row in the course_dim table.
user_id
The unique id of the user id. References user_dim.user_id
course_id
The unique id of the course. References course_dim.id
date_id
The unique id of the date on which the user worked on the course. References to dim_date.id
started_at
The timestamp when the user started the course (as registered by the DataCamp application).
completed_at
The timestamp when the user started the course (as registered by the DataCamp application). It is NULL when the user has NOT completed the course yet.
time_spent
The time (in seconds) the user spent on the course.
xp
The XP the user gained by working on the course.
Practice
Practice Dim
practice_dim
: List of all practices
practice_id
The practice id (internal mobile_app.pools id) [PK]
title
The practice name
status
The practice status (HIDDEN, LIVE)
technology
The practice’s technology type
xp
Xp gained by completing the practice
id
[DEPRECATED] Use practice_id instead.
Practice Fact
practice_fact
: Fact table for practice content. It provides engagement and xp measurements on projects. This table includes practices (aka challenges) after practice replaced challenges.
user_id
The user id (internal id) [PK], [FK to user_dim.id]
practice_id
The practice id the user started (internal mobile_app pools id). [PK], [FK to practice_dim.id]
date_id
The date at which the user worked on the practice [PK], [FK to dim_date.id]
started_at
The date at which the user started the practice
completed_at
The date at which the user completed the practice. It is NULL when the user did not complete the practice
is_mobile
Whether the user did the practice on mobile or in the browser
time_spent
How much time (in seconds) the user spent on the practice
xp
How many XP did the user gain by completing the practice
Projects
The Data Connector includes certification projects, which differ from standard projects.
Certification projects are used exclusively during the Certification process. Reports in the Group Hub do not include certification projects. As a result, the reports in the Group hub may show a slight difference in numbers compared to those reported by the Data Connector.
Project Dim
project_dim
: The project dimension provides descriptive data about a specific project.
project_id
The unique id of the project.
title
The title of the course.
technology
The course technology (R, Python, SQL)
xp
The maxmimum number of XP a user can get by completing the project.
nb_hours_needed
The number of hours needed to complete the project.
is_guided
A boolean indicating whether the project is guided or not.
is_certification
A boolean indicating whether the project is used for certification.
description
A description of the project.
short_description
A short description of the project.
id
[DEPRECATED] Use project_id instead.
Project Fact
project_fact
: The project fact table provides data about the users’ project progress: time spent and XP gained.
user_id
The unique id of the user. References user_dim.user_id
project_id
The unique id of the project. References project_dim.project_id
date_id
The id of the date on which the user worked on the project. References date_dim.date_id
started_at
The timestamp when the user started the project (as registered by the DataCamp application).
completed_at
The timestamp when the user completed the project (as registered by the DataCamp application). It is NULL when the user has NOT completed the project yet.
time_spent
The time (in seconds) the user spent on the project.
xp
The XP the user gained by working on the project.
Tracks
Track Content Dim
track_content_dim
: List of content in all live tracks. This table has one row per track_version and content item. It currently only includes courses and projects. Note that a content item can belong to multiple tracks, and different versions of a track might contain different content items.
track_version_id
The id of the track version.
track_id
The id of the track.
content_id
The id of the content, a concatenation of content type and id.
content_type
The content type (course / project).
content_title
The title of the content.
position
The position of the content in the track version.
xp
The total xp earned on completion of content.
Track Content Fact
track_content_fact
: Fact table for track content.
group_id
The id of the group.
user_id
The id of the user.
track_version_id
The id of the track version.
content_id
The id of the content, a concatenation of content type and id.
date_id
The unique id of the date on which the enrolled in the track. References to dim_date.id
started_at
The timestamp when the user enrolled in the track.
completed_at
The timestamp when the user completed the track.
xp
The xp gained by completing the content.
nb_seconds
The total number of seconds spent on the content.
Track Dim
track_dim
: List of all versions of live tracks, along with their title, technology, and descriptions. This table has one row for every track version, for tracks that are live.
track_version_id
The id of the track version.
track_id
The id of the track.
version_number
The version of the track.
title
The title concatenated with subtitle of the track version.
technology
The technology of the track.
category
The category of the track.
short_description
A short description of the track
published_live_at
The timestamp for when the track version was published live.
archived_at
The timestamp for when the track version was archived.
is_current_version
A boolean indicating if this is the current version.
is_custom
A boolean indicating if this is a custom track.
nb_courses
The number of courses in the track version.
nb_projects
The number of projects in the track version.
xp
The total xp gained on completion of the track.
Track Fact
track_fact
: Fact table for tracks.
group_id
The id of the group.
user_id
The id of the user.
track_version_id
The id of the track version.
date_id
The unique id of the date on which the enrolled in the track. References to dim_date.id
started_at
The timestamp when the user enrolled in the track.
completed_at
The timestamp when the user completed the track.
is_currently_active
A boolean indicating if the user is currently active in the track
DataLab
Note: In March of 2024, DataCamp Workspace was renamed DataLab. Individual workspaces are now called workbooks. We have kept the column names in the previous format to prevent any problems with existing connections but have updated other references where appropriate.
Publication Fact
publication_fact
: The publication fact table provides data about the users’ DataLab publications: number of viewers and time spent viewing by date and viewer type (creator / viewer).
group_id
The unique identifier of the group.
creator_id
The unique identifier of the user who created the DataLab workbook. References user_dim.user_id.
workspace_id
The unique identifier of the DataLab Workbook. References workspace_dim.workspace_id
viewed_at
The date on which the DataLab publication was viewed.
nb_viewers
The number of unique viewers who viewed the publication.
nb_seconds
The number of seconds spent viewing the publication.
viewer_type
The type of viewer (creator / viewer)
DataLab Dim
workspace_dim
: The DataLab dimension table provides descriptive data about a specific DataLab workbook.
workspace_id
The unique identifier of the DataLab workbook.
workspace_title
The title of the workbook.
technology
The language of the workbook (R / Python)
owner_type
Whether the owner is individual or group.
category
The category of the source template. - base: - dataset - recipe - playbook - project - boilerplate
key
The template key (NULL for non-template types).
is_featured
A boolean indicating if the workbook is featured on the profile.
group_id
The unique identifier of the group, the workbook creator belonged to.
DataLab Workbook Fact
workspace_fact
: The workbook fact table provides data about the workbook created by users.
creator_id
The user id of the creator of the workbook. References user_dim.user_id.
workspace_id
The id of the workspace. References workspace_dim.workspace_id.
nb_megabytes
The size of files in the workbook measured in megabytes.
nb_attempts_to_publish
The number of times the workbook was published.
nb_times_published_successfully
The number of times the workbook was published successfully.
nb_upvotes
The number of upvotes received.
nb_shares
The number of times the workbook was shared.
created_at
The timestamp for when the workbook is created.
updated_at
The timstamp for when the workbook is updated.
first_edited_at
The timestamp for when the workbook was first edited.
last_edited_at
The timestamp for when the workbook was last edited.
first_published_at
The timestamp for when the workbook was first published.
last_published_at
The timestamp for when the workbook was last edited.
first_integration_added_at
The timestamp for when the first integration was added to the workbook.
last_integration_added_at
The timestamp for when the last integration was added to the workbook.
first_upvoted_at
The timestamp for when the first upvote was received.
last_upvoted_at
The timestamp for when the last upvote was received.
group_id
The id of the group owner of the workbook.
Workspace Visit Fact
workspace_visit_fact
: The workspace (workbook) visit fact table provides data on user visits to workbooks.
visitor_id
The user id of the visitor to the workbook. References user_dim.user_id.
workspace_id
The id of the workbook. References workspace_dim.workspace_id.
visited_at
The date of the visit, granular to the day level.
nb_seconds
The duration of the visit in seconds.
group_id
The id of the group.
User Teams
Team Dim
team_dim
: The team dimension provides descriptive data about teams.
team_id
The unique identifier of the team
name
The team name
slug
The team slug
created_date
The team’s creation date
updated_date
The team’s updated date
deleted_date
The team’s deleted date
id
[DEPRECATED] Use team_id instead.
User Dim
user_dim
: List of all enterprise users
user_id
The user id (internal id) [PK]
first_name
The user first name
last_name
The user last name
The user email
slug
The slug to the user profile
registered_at
When the user registered
deleted_at
When the user was deleted
last_visit_at
When the user visited (browsed) the plarform for the last time
last_time_spent_at
When the user spent time on the platform (campus, challenges, projects, mobile) for the last time
onboarding_completed_at
When the user completed their onboarding
first_content_completed_at
When the user completed their first content
id
[DEPRECATED] Use user_id instead
User Team Bridge
user_team_bridge
: Bridge table to link users to their teams
user_id
The user id (internal id) [PK]
team_id
The team id (internal id) [PK]
joined_team_date
Date at which the user joined the team
left_team_date
Date at which the user left the team
Others
Xp Fact
xp_fact
: This fact table provides information on XP gained by a user across different content modalities.
group_id
The unique identifier of the group. (references dim_groups.group_id)
user_id
The unique identifier of the user. (References dim_users.user_id)
event
The event for which XP was gained ( course_exercise_completed, alpa_onboarding_completed etc.)
created_date
The date on which the XP was gained
xp
The number of XP gained.
Last updated