The data model for the data connector provides data on activity for users in a group across different content types. The currently supported content types are:
The data available from the Data Connector is modelled according to a dimensional model. This means there are fact, dimension and bridge tables available.
Facts are the measurements/metrics or facts from your user activity on DataCamp.
course_factfor example contains measures like XP and time_spent on courses by your users.
assessment_factcontains measures like score and percentile for the assessment scores your users have registered on DataCamp.
Dimension provides the context surrounding the fact tables. In simple terms, they give who, what, where of a fact. In other words, a dimension is a window to view information in the facts.
course_facttable can for example be joined with the
course_dimtable to find out more information about the courses (like title, technology, instructor, ...) or with
user_dimto get information on the users (like email, name, registration date, etc ...)
Bridge tables are used to connect fact and/or dimension tables together. Bridge tables are also referred to as "join tables" in classic SQL.
user_team_bridgetable for example can be used to connect a user to the teams they are member of. In this case a bridge table is needed because a single user can be member of multiple teams at the same time.
You can join the fact tables with the dimension tables to summarize XP and time spent across technology, topic etc.
The data model also provides dimension tables at the user level
group_dim, and bridge tables
user_team_bridgeto facilitate analysis at the team or user level.
For example, you can aggregate the
time_spentby week across
teamwith the following query.
SUM(xp) AS xp,
SUM(time_spent)/3600 AS time_spent_hours
INNER JOIN course_dim USING(course_id)
INNER JOIN date_dim USING(date_id)
INNER JOIN user_team_bridge USING(user_id)
INNER JOIN team_dim USING(team_id)
GROUP BY 1, 2, 3
from dcdcpy import DataConnector
dc = DataConnector()
.merge(dc.course_dim(), on='course_id', suffixes=('', '_y'))
.merge(dc.date_dim(), left_on='date_id', right_on='id')
.merge(dc.team_dim(), left_on='team_id', right_on='id')
.groupby(['week_start_date', 'technology'], as_index=False)
'time_spent': lambda d: d.sum()/3600,
Note that all fact tables only include learning activity for the dates on which a user was a part of the group. For example, consider a user A who joined group 1234 on Jan 1st, started a course on Jan 2nd, left the group on Jan 3rd while continuing to work on the course, rejoined the group on Jan 4th, and completed the course on Jan 5th.
In this case, the fact tables for group 1234 will not contain data for User A for January 3rd (even though the user continued working on the course).
Warning! For now, the Data Connector does not include time spent on mobile, this could result in slight differences from other sources of reporting such as exports or our Group Hub.
assessment_dim: The assessment dimension provides descriptive data about assessments.
assessment_fact: The assessment fact table provides data about the users’ assessment results: score and percentile obtained and time spent on each assessment.
assignment_dim: The assignment dimension table provides data about a specific assignment.
assignment_fact: The assignment fact table provides data about the users’ progress on each assignment.
certification_dim: The certification dimension provides data about a specific certification. Certification ID 1 and 2 are the so-called v1 iterations of certifications. From June 2022 onwards, we switched to a process where an official certification registration was needed, and this registration would time out after 30 days.
certification_fact: The certification fact table provides data about the users’ progress through different stages of a certification.
exercise_dim: The exercise dimension provides descriptive data about a specific exercise. Because the model considers users’ progress on all exercises (i.e., even exercises that are no longer available), there is a specific row with the id = -1. This row is used in the exercise fact table to refer to a deleted exercise.
exercise_fact: The exercise fact table provides data about the users’ progress: XP gained and time spent on each exercise. The table provides data for exercises that can be no longer available (and deleted from our database). In this case, the exercise_id value is set to -1 and links to a “deleted exercise”” row in the exercise_dim table.Fact table for exercise content. It provides engagement and xp measurements on exercise.
chapter_dim: The chapter dimension provides descriptive data about a specific chapter. Because the model considers users’ progress on all chapters (i.e., even chapters that are no longer available), there is a specific row with the id = -1. This row is used in the chapter fact table to refer to a deleted chapter.
chapter_fact: The chapter fact table provides data about the users’ progress: XP gained and time spent on each chapter. The table provides data for chapters that can be no longer available (and deleted from our database). In this case, the chapter_id value is set to -1 and links to a “deleted chapter” row in the chapter_dim table.
course_dim: The course dimension provides descriptive data about a specific course. Because the model considers users’ progress on all courses (i.e., even courses that are no longer available), there is a specific row with the id = -1. This row is used in the course fact table to refer to a deleted course.
course_fact: The course fact table provides data about the users’ progress: XP gained and time spent on each course. The table provides data for courses that can be no longer available (and deleted from our database). In this case, the course value is set to -1 and links to a “deleted course” row in the course_dim table.
practice_dim: List of all practices
practice_fact: Fact table for practice content. It provides engagement and xp measurements on projects. This table includes practices (aka challenges) after practice replaced challenges.
project_dim: The project dimension provides descriptive data about a specific project.
project_fact: The project fact table provides data about the users’ project progress: time spent and XP gained.
track_content_dim: List of content in all live tracks. This table has one row per track_version and content item. It currently only includes courses and projects. Note that a content item can belong to multiple tracks, and different versions of a track might contain different content items.
track_content_fact: Fact table for track content.
track_dim: List of all versions of live tracks, along with their title, technology, and descriptions. This table has one row for every track version, for tracks that are live.
track_fact: Fact table for tracks.
publication_fact: The publication fact table provides data about the users’ workspace publications: number of viewers and time spent viewing by date and viewer type (creator / viewer).
workspace_dim: The workspace dimension table provides descriptive data about a specific workspace.
workspace_fact: The workspace fact table provides data about workspaces created by users.
workspace_visit_fact: The workspace visit fact table provides data user visits to workspaces.
team_dim: The team dimension provides descriptive data about teams.
user_dim: List of all enterprise users