Enterprise Docs
Search
K
Comment on page

Data Model

The data model for the data connector provides data on activity for users in a group across different content types. The currently supported content types are:
  • Assessments
  • Assignments
  • Certification
  • Courses
  • Practice
  • Projects
  • Tracks
  • Workspace

How to work with the data?

The data available from the Data Connector is modelled according to a dimensional model. This means there are fact, dimension and bridge tables available.
Facts
Dimensions
Bridges
Facts are the measurements/metrics or facts from your user activity on DataCamp.
course_fact for example contains measures like XP and time_spent on courses by your users.
assessment_fact contains measures like score and percentile for the assessment scores your users have registered on DataCamp.
Dimension provides the context surrounding the fact tables. In simple terms, they give who, what, where of a fact. In other words, a dimension is a window to view information in the facts.
The course_fact table can for example be joined with the course_dim table to find out more information about the courses (like title, technology, instructor, ...) or with user_dim to get information on the users (like email, name, registration date, etc ...)
Bridge tables are used to connect fact and/or dimension tables together. Bridge tables are also referred to as "join tables" in classic SQL.
The user_team_bridge table for example can be used to connect a user to the teams they are member of. In this case a bridge table is needed because a single user can be member of multiple teams at the same time.
You can join the fact tables with the dimension tables to summarize XP and time spent across technology, topic etc.
The data model also provides dimension tables at the user level user_dim, team_dim, group_dim, and bridge tables user_team_bridge to facilitate analysis at the team or user level.
For example, you can aggregate the xp gained and time_spent by week across technology and team with the following query.
SQL
Python
SELECT team_id,
week_start_date,
technology,
SUM(xp) AS xp,
SUM(time_spent)/3600 AS time_spent_hours
FROM course_fact
INNER JOIN course_dim USING(course_id)
INNER JOIN date_dim USING(date_id)
INNER JOIN user_team_bridge USING(user_id)
INNER JOIN team_dim USING(team_id)
GROUP BY 1, 2, 3
from dcdcpy import DataConnector
dc = DataConnector()
(
dc.course_fact()
.merge(dc.course_dim(), on='course_id', suffixes=('', '_y'))
.merge(dc.date_dim(), left_on='date_id', right_on='id')
.merge(dc.user_team_bridge(), on='user_id')
.merge(dc.team_dim(), left_on='team_id', right_on='id')
.groupby(['week_start_date', 'technology'], as_index=False)
.agg({
'time_spent': lambda d: d.sum()/3600,
'xp': 'sum'
})
)
Note that all fact tables only include learning activity for the dates on which a user was a part of the group. For example, consider a user A who joined group 1234 on Jan 1st, started a course on Jan 2nd, left the group on Jan 3rd while continuing to work on the course, rejoined the group on Jan 4th, and completed the course on Jan 5th.
In this case, the fact tables for group 1234 will not contain data for User A for January 3rd (even though the user continued working on the course).
Warning! For now, the Data Connector does not include time spent on mobile, this could result in slight differences from other sources of reporting such as exports or our Group Hub.

Assessments

Assessment Dim

assessment_dim: The assessment dimension provides descriptive data about assessments.
column_name
column_description
assessment_id
The unique id of the assessment.
title
The title of the assessment.
slug
The slug of The assessment.
technology
The assessment technology (e.g., Python, R, SQL)
id
[DEPRECATED] Use assessment_id instead.

Assessment Fact

assessment_fact: The assessment fact table provides data about the users’ assessment results: score and percentile obtained and time spent on each assessment.
column_name
column_description
user_id
The unique id of the user id. References user_dim.id
assessment_id
The unique id of the assessment. References assessment_dim.id
date_id
The identifier of the date when the user worked on the assessment References dim_date.id
started_at
The timestamp when the user started the assessment (as registered by the DataCamp application)
completed_at
The timestamp when the user completed the assessment (as registered by the DataCamp application). It is NULL when the user did not complete the assessment.
score
The user score for the assessment.
score_group
The user score group for the assessment. A score of < 70 is “Novice”, 70 - 100 as “Intermediate Lower”, 100 - 130 as “Intermediate”, 130 - 160 as “Intermediate Upper”, and > 160 as “Advanced”.
percentile
The percentile the user belongs in.
time_spent
The time (in seconds) the user spent on the assessment.

Assignments

Assignment Dim

assignment_dim: The assignment dimension table provides data about a specific assignment.
column_name
column_description
group_id
The unique identifier for the group.
assignment_id
The unique identifier for the assignment.
created_by_id
The user_id of the user who created the assignment. References user_dim.user_id.
assignment_type
The type of assignment (chapter / course / project / assessment / customtrack / xp)
assignment_title
The title of the content item assigned. It is NULL for assignment_type = ‘xp’, where no content is assigned.
assignment_status
The status of the assignemnt (archived / active).
assignee_type
The type of assignee (user / team / group).
assignment_xp
The XP achieved by completing the assignment.
technology
The technology of the associated content item. It is NULL for assignment_type = ‘xp’, where no content is assigned.
topic
The topic of the associated content item. It is NULL for assignment_type = ‘xp’, where no content is assigned.
due_at
The timestamp when the assignment is due.
created_at
The timestamp when the assignment was created.
updated_at
The timestamp when the assignment was updated.
deleted_at
The timestamp when the assignment was deleted.

Assignment Fact

assignment_fact: The assignment fact table provides data about the users’ progress on each assignment.
column_name
column_description
group_id
The unique identifier of the group.
user_id
The unique identifier of the user. References user_dim.user_id
assignment_id
The unique identifier of the assignment. References assignment_dim.assignment_id.
is_completed
A boolean indicating if the user completed the assignment.
completion_status
The status of completion of the assignment. It can be completed / late / in_progress / missed.
assigned_at
The timestamp when the assignment was created.
due_at
The timestamp when the assignment is due.
completed_at
The timestamp when the user completed the assignment.
email_sent_at
The timestamp when an email was sent to the user.
reminder_sent_at
The timestamp when a reminder was sent to the user.

Certifications

Certification Dim

certification_dim: The certification dimension provides data about a specific certification. Certification ID 1 and 2 are the so-called v1 iterations of certifications. From June 2022 onwards, we switched to a process where an official certification registration was needed, and this registration would time out after 30 days.
column_name
column_description
certification_id
A unique identifier for the certification.
certification_name
The name of the certification.
certification_version
The version of the certification.

Certification Fact

certification_fact: The certification fact table provides data about the users’ progress through different stages of a certification.
column_name
column_description
group_id
The unique identifier of the group.
user_certification_id
The unique identifier of a user attempting a certification. A user can have more than one attempt at obtaining a certification.
user_id
The unique identifier of the user. References user_dim.user_id
certification_id
The unique identifier of the certification. References certification_dim.certification_id.
registered_at
The timestamp when the user registered for the certification attempt.
expired_at
The timestamp when the user certification attempt (will) expire.
assessments_passed_at
(v1 only) The timestamp when the user passed the assessments.
coding_challenge_passed_at
(v1 only) The timestamp when the user passed the coding challenge(s).
first_exam_started_at
(v2 only) The timestamp when the user started their first certification exam in an attempt.
first_exam_completed_at
(v2 only) The timestamp when the user completed their first certification exam in an attempt.
exam_last_passed_at
(v2 only) The timestamp when the user passed their last certification exam.
case_study_ready_at
The timestamp when the user passed all stages prior to the practical exam.
first_case_study_submitted_at
The timestamp when the first practical exam was submitted for grading.
last_case_study_submitted_at
The timestamp for then the last practical exam was submitted for grading.
certificate_granted_at
The date at which the certificate was granted by the admin.
is_passed_assessments
An indicator whether user passed all timed assessments.
is_passed_challenges
An indicator whether the user passed all coding challenges.
is_passed_exams
An indicator whether the user passed all certification exams.

Courses

Exercise Dim

exercise_dim: The exercise dimension provides descriptive data about a specific exercise. Because the model considers users’ progress on all exercises (i.e., even exercises that are no longer available), there is a specific row with the id = -1. This row is used in the exercise fact table to refer to a deleted exercise.
column_name
column_description
exercise_id
The unique id of the exercise id.
title
The title of the exercise.
type
The type of exercise type (e.g., NormalExercise, VideoExercise, MultipleChoiceExercise, …)
number
The number of the exercise in the chapter, accounting for subexercises. This can be used to sort exercises in order.
xp
The maximum number of XP a user can get by completing the exercise.
technology
The course technology (e.g., Tableau, SQL, Python, R, …)
topic
The course topic (e.g., Data visualization, Programming, Machine Learning, …)
course_title
The course title the exercise belongs to.
course_slug
The course slug the exercise belongs to
course_xp
The maximum number of XP a user can get by completing the course the exercise belongs to.
course_description
The course description the exercise belongs to.
course_short_description
A shorter version of the course description the exercise belongs to.
course_launched_date
Date at which the course went live.
chapter_title
The chapter title the exercise belongs to.
chapter_slug
The chapter slug the exercise belongs to.
chapter_xp
The maximum number of XP a user can get by completing the chapter the exercise belongs to.
chapter_nb_exercises
The number of exercises in the chapter the exercise belongs to.
id
[DEPRECATED] Use exercise_id instead.

Exercise Fact

exercise_fact: The exercise fact table provides data about the users’ progress: XP gained and time spent on each exercise. The table provides data for exercises that can be no longer available (and deleted from our database). In this case, the exercise_id value is set to -1 and links to a “deleted exercise”” row in the exercise_dim table.Fact table for exercise content. It provides engagement and xp measurements on exercise.
column_name
column_description
user_id
The unique user id. Reference to user_dim.id.
exercise_id
The unique exercise_id. Reference to exercise_dim.id.
date_id
The date id. Reference to date_dim.id.
started_at
The timestamp at which the user started the exercise (as registered by the DataCamp application).
completed_at
The timestamp at which the user completed the exercise (as registered by the DataCamp application).
time_spent
How much time (in seconds) the user spent on the exercise?
xp
How many XP did the user gain by completing the exercise?

Chapter Dim

chapter_dim: The chapter dimension provides descriptive data about a specific chapter. Because the model considers users’ progress on all chapters (i.e., even chapters that are no longer available), there is a specific row with the id = -1. This row is used in the chapter fact table to refer to a deleted chapter.
column_name
column_description
chapter_id
The unique id of the chapter id.
title
The title of the chapter.
slug
The slug of the chapter.
xp
The maximum number of XP a user can get by completing the chapter.
technology
The course technology (e.g., Tableau, SQL, Python, R, …).
topic
The course topic (e.g., Data visualization, Programming, Machine Learning, …).
nb_exercises
The number of exercises in the chapter.
course_title
The title of the course the chapter belongs to.
course_slug
The slug of the course the chapter belongs to.
course_xp
The maximum number of XP a user can get by completing the course the chapter belongs to.
course_description
The description of the course the chapter belongs to.
course_short_description
A shorter description of the course the chapter belongs to.
course_launched_date
The date on which the course went live.
id
[DEPRECATED] Use chapter_id instead.

Chapter Fact

chapter_fact: The chapter fact table provides data about the users’ progress: XP gained and time spent on each chapter. The table provides data for chapters that can be no longer available (and deleted from our database). In this case, the chapter_id value is set to -1 and links to a “deleted chapter” row in the chapter_dim table.
column_name
column_description
user_id
The unique id of the user. References to user_dim.user_id.
chapter_id
The unique id of the chapter. References to chapter_dim.chapter_id.
date_id
The date on which the user worked on the chapter. References to dim_date.id
started_at
The timestamp when the user started the chapter (as registered by the DataCamp application).
completed_at
The timestamp when the user completed the chapter (as registered by the DataCamp application).
time_spent
The time (in seconds) the user spent on the chapter.
xp
The XP the user gained by working on the chapter.

Course Dim

course_dim: The course dimension provides descriptive data about a specific course. Because the model considers users’ progress on all courses (i.e., even courses that are no longer available), there is a specific row with the id = -1. This row is used in the course fact table to refer to a deleted course.
column_name
column_description
course_id
The unique id of the course.
title
The title of the course.
technology
The course technology (e.g., Tableau, SQL, Python, R, …).
topic
The course topic (e.g., Data visualization, Programming, Machine Learning, …).
xp
The maximum number of XP a user can get by following the course.
nb_hours_needed
Time needed in hours to complete the course.
slug
The slug of the course
description
A description of the course.
short_description
A short description of the course.
launched_date
The date on which the course went live.
id
[DEPRECATED] Use course_id instead.

Course Fact

course_fact: The course fact table provides data about the users’ progress: XP gained and time spent on each course. The table provides data for courses that can be no longer available (and deleted from our database). In this case, the course value is set to -1 and links to a “deleted course” row in the course_dim table.
column_name
column_description
user_id
The unique id of the user id. References user_dim.user_id
course_id
The unique id of the course. References course_dim.id
date_id
The unique id of the date on which the user worked on the course. References to dim_date.id
started_at
The timestamp when the user started the course (as registered by the DataCamp application).
completed_at
The timestamp when the user started the course (as registered by the DataCamp application). It is NULL when the user has NOT completed the course yet.
time_spent
The time (in seconds) the user spent on the course.
xp
The XP the user gained by working on the course.

Practice

Practice Dim

practice_dim: List of all practices
column_name
column_description
practice_id
The practice id (internal mobile_app.pools id) [PK]
title
The practice name
status
The practice status (HIDDEN, LIVE)
technology
The practice’s technology type
xp
Xp gained by completing the practice
id
[DEPRECATED] Use practice_id instead.

Practice Fact

practice_fact: Fact table for practice content. It provides engagement and xp measurements on projects. This table includes practices (aka challenges) after practice replaced challenges.
column_name
column_description
user_id
The user id (internal main-app id) [PK], [FK to user_dim.id]
practice_id
The practice id the user started (internal mobile_app pools id). [PK], [FK to practice_dim.id]
date_id
The date at which the user worked on the practice [PK], [FK to dim_date.id]
started_at
The date at which the user started the practice
completed_at
The date at which the user completed the practice. It is NULL when the user did not complete the practice
is_mobile
Whether the user did the practice on mobile or in the browser
time_spent
How much time (in seconds) the user spent on the practice
xp
How many XP did the user gain by completing the practice

Projects

Project Dim

project_dim: The project dimension provides descriptive data about a specific project.
column_name
column_description
project_id
The unique id of the project.
title
The title of the course.
technology
The course technology (R, Python, SQL)
xp
The maxmimum number of XP a user can get by completing the project.
nb_hours_needed
The number of hours needed to complete the project.
is_guided
A boolean indicating whether the project is guided or not.
is_certification
A boolean indicating whether the project is used for certification.
description
A description of the project.
short_description
A short description of the project.
id
[DEPRECATED] Use project_id instead.

Project Fact

project_fact: The project fact table provides data about the users’ project progress: time spent and XP gained.
column_name
column_description
user_id
The unique id of the user. References user_dim.user_id
project_id
The unique id of the project. References project_dim.project_id
date_id
The id of the date on which the user worked on the project. References date_dim.date_id
started_at
The timestamp when the user started the project (as registered by the DataCamp application).
completed_at
The timestamp when the user completed the project (as registered by the DataCamp application). It is NULL when the user has NOT completed the project yet.
time_spent
The time (in seconds) the user spent on the project.
xp
The XP the user gained by working on the project.

Tracks

Track Content Dim

track_content_dim: List of content in all live tracks. This table has one row per track_version and content item. It currently only includes courses and projects. Note that a content item can belong to multiple tracks, and different versions of a track might contain different content items.
column_name
column_description
track_version_id
The id of the track version.
track_id
The id of the track.
content_id
The id of the content, a concatenation of content type and id.
content_type
The content type (course / project).
content_title
The title of the content.
position
The position of the content in the track version.
xp
The total xp earned on completion of content.

Track Content Fact

track_content_fact: Fact table for track content.
column_name
column_description
group_id
The id of the group.
user_id
The id of the user.
track_version_id
The id of the track version.
content_id
The id of the content, a concatenation of content type and id.
date_id
The unique id of the date on which the enrolled in the track. References to dim_date.id
started_at
The timestamp when the user enrolled in the track.
completed_at
The timestamp when the user completed the track.
xp
The xp gained by completing the content.
nb_seconds
The total number of seconds spent on the content.

Track Dim

track_dim: List of all versions of live tracks, along with their title, technology, and descriptions. This table has one row for every track version, for tracks that are live.
column_name
column_description
track_version_id
The id of the track version.
track_id
The id of the track.
version_number
The version of the track.
title
The title concatenated with subtitle of the track version.
technology
The technology of the track.
category
The category of the track.
short_description
A short description of the track
published_live_at
The timestamp for when the track version was published live.
archived_at
The timestamp for when the track version was archived.
is_current_version
A boolean indicating if this is the current version.
is_custom
A boolean indicating if this is a custom track.
nb_courses
The number of courses in the track version.
nb_projects
The number of projects in the track version.
xp
The total xp gained on completion of the track.

Track Fact

track_fact: Fact table for tracks.
column_name
column_description
group_id
The id of the group.
user_id
The id of the user.
track_version_id
The id of the track version.
date_id
The unique id of the date on which the enrolled in the track. References to dim_date.id
started_at
The timestamp when the user enrolled in the track.
completed_at
The timestamp when the user completed the track.
is_currently_active
A boolean indicating if the user is currently active in the track

Workspaces

Publication Fact

publication_fact: The publication fact table provides data about the users’ workspace publications: number of viewers and time spent viewing by date and viewer type (creator / viewer).
column_name
column_description
group_id
The unique identifier of the group.
creator_id
The unique identifier of the user who created the workspace. References user_dim.user_id.
workspace_id
The unique identifier of the workspace. References workspace_dim.workspace_id
viewed_at
The date on which the workspace publication was viewed.
nb_viewers
The number of unique viewers who viewed the publication.
nb_seconds
The number of seconds spent viewing the publication.
viewer_type
The type of viewer (creator / viewer)

Workspace Dim

workspace_dim: The workspace dimension table provides descriptive data about a specific workspace.
column_name
column_description
workspace_id
The unique identifier of the workspace.
workspace_title
The title of the workspace.
technology
The language of the workspace (R / Python)
owner_type
Whether the owner is individual or group.
category
The category of the source template. - base: - dataset - recipe - playbook - project - boilerplate
key
The template key (NULL for non-template types).
is_featured
A boolean indicating if the workspace is featured on the profile.
group_id
The unique identifier of the group, the workspace creator belonged to.

Workspace Fact

workspace_fact: The workspace fact table provides data about workspaces created by users.
column_name
column_description
creator_id
The user id of the creator of the workspace. References user_dim.user_id.
workspace_id
The id of the workspace. References workspace_dim.workspace_id.
nb_megabytes
The size of files in the workspace measured in megabytes.
nb_attempts_to_publish
The number of times the workspace was published.
nb_times_published_successfully
The number of times the workspace was published successfully.
nb_upvotes
The number of upvotes received.
nb_shares
The number of times the workspace was shared.
created_at
The timestamp for when the workspace is created.
updated_at
The timstamp for when the workspace is updated.
first_edited_at
The timestamp for when the workspace was first edited.
last_edited_at
The timestamp for when the workspace was last edited.
first_published_at
The timestamp for when the workspace was first published.
last_published_at
The timestamp for when the workspace was last edited.
first_integration_added_at
The timestamp for when the first integration was added to the workspace.
last_integration_added_at
The timestamp for when the last integration was added to the workspace.
first_upvoted_at
The timestamp for when the first upvote was received.
last_upvoted_at
The timestamp for when the last upvote was received.
group_id
The id of the group owner of the workspace.

Workspace Visit Fact

workspace_visit_fact: The workspace visit fact table provides data user visits to workspaces.
column_name
column_description
visitor_id
The user id of the visitor to the workspace. References user_dim.user_id.
workspace_id
The id of the workspace. References workspace_dim.workspace_id.
visited_at
The date of the visit, granular to the day level.
nb_seconds
The duration of the visit in seconds.
group_id
The id of the group.

User Teams

Team Dim

team_dim: The team dimension provides descriptive data about teams.
column_name
column_description
team_id
The unique identifier of the team
name
The team name
slug
The team slug
created_date
The team’s creation date
updated_date
The team’s updated date
deleted_date
The team’s deleted date
id
[DEPRECATED] Use team_id instead.

User Dim

user_dim: List of all enterprise users
column_name
column_description
user_id
The user id (internal main-app id) [PK]
first_name
The user first name
last_name
The user last name
email
The user email
slug
The slug to the user profile