Data Model

The data model for the data connector provides data on activity for users in a group across different content types. The currently supported content types are:

  • Assessments

  • Assignments

  • Certification

  • Courses

  • Practice

  • Projects

  • Tracks

  • DataLab (formerly Workspace)

How to work with the data?

The data available from the Data Connector is modelled according to a dimensional model. This means there are fact, dimension and bridge tables available.

Facts are the measurements/metrics or facts from your user activity on DataCamp.

course_fact for example contains measures like XP and time_spent on courses by your users.

assessment_fact contains measures like score and percentile for the assessment scores your users have registered on DataCamp.

You can join the fact tables with the dimension tables to summarize XP and time spent across technology, topic etc.

The data model also provides dimension tables at the user level user_dim, team_dim, group_dim, and bridge tables user_team_bridge to facilitate analysis at the team or user level.

For example, you can aggregate the xp gained and time_spent by week across technology and team with the following query.

SELECT team_id,
       week_start_date,
       technology,
       SUM(xp) AS xp,
       SUM(time_spent)/3600 AS time_spent_hours
  FROM course_fact
       INNER JOIN course_dim USING(course_id)
       INNER JOIN date_dim USING(date_id)
       INNER JOIN user_team_bridge USING(user_id)
       INNER JOIN team_dim USING(team_id)
 GROUP BY 1, 2, 3

Warning! The Data Connector's fact tables only include:

  • learning activity for the dates on which a user was part of the group.

  • learning activity for the dates on which the group had an active subscription.

  • learning activity performed by users with all license types, including basic.

  • learning activity performed in the DataCamp desktop app.

For now, the Data Connector does not include time spent on mobile, this could result in slight differences from other sources of reporting such as exports or our Group Hub.

For example, consider a user A who joined group 1234 on Jan 1st, started a course on Jan 2nd, left the group on Jan 3rd while continuing to work on the course, rejoined the group on Jan 4th, and completed the course on Jan 5th.

In this case, the fact tables for group 1234 will not contain data for User A for January 3rd (even though the user continued working on the course).

Assessments

The Data Connector includes certification and private (test-out) assessments, which differ from standard assessments.

Certification assessments evaluate your knowledge within the Certification process. Private (test-out) assessments are used to skip content within a track if you pass a certain threshold. Reports in the Group Hub do not include certification and private (test-out) assessments. As a result, the reports in the Group hub may show a slight difference in numbers compared to those reported by the Data Connector.

Assessment Dim

assessment_dim: The assessment dimension provides descriptive data about assessments.

column_namecolumn_description

assessment_id

The unique id of the assessment.

title

The title of the assessment.

slug

The slug of The assessment.

technology

The assessment technology (e.g., Python, R, SQL)

id

[DEPRECATED] Use assessment_id instead.

Assessment Fact

assessment_fact: The assessment fact table provides data about the users’ assessment results: score and percentile obtained and time spent on each assessment.

column_namecolumn_description

user_id

The unique id of the user id. References user_dim.id

assessment_id

The unique id of the assessment. References assessment_dim.id

date_id

The identifier of the date when the user worked on the assessment References dim_date.id

started_at

The timestamp when the user started the assessment (as registered by the DataCamp application)

completed_at

The timestamp when the user completed the assessment (as registered by the DataCamp application). It is NULL when the user did not complete the assessment.

score

The user score for the assessment.

score_group

The user score group for the assessment. A score of < 70 is “Novice”, 70 - 100 as “Intermediate Lower”, 100 - 130 as “Intermediate”, 130 - 160 as “Intermediate Upper”, and > 160 as “Advanced”.

percentile

The percentile the user belongs in.

time_spent

The time (in seconds) the user spent on the assessment.

Assignments

Assignment Dim

assignment_dim: The assignment dimension table provides data about a specific assignment.

column_namecolumn_description

group_id

The unique identifier for the group.

assignment_id

The unique identifier for the assignment.

created_by_id

The user_id of the user who created the assignment. References user_dim.user_id.

assignment_type

The type of assignment (chapter / course / project / assessment / customtrack / xp)

assignment_title

The title of the content item assigned. It is NULL for assignment_type = ‘xp’, where no content is assigned.

assignment_status

The status of the assignemnt (archived / active).

assignee_type

The type of assignee (user / team / group).

assignment_xp

The XP achieved by completing the assignment.

technology

The technology of the associated content item. It is NULL for assignment_type = ‘xp’, where no content is assigned.

topic

The topic of the associated content item. It is NULL for assignment_type = ‘xp’, where no content is assigned.

due_at

The timestamp when the assignment is due.

created_at

The timestamp when the assignment was created.

updated_at

The timestamp when the assignment was updated.

deleted_at

The timestamp when the assignment was deleted.

Assignment Fact

assignment_fact: The assignment fact table provides data about the users’ progress on each assignment.

column_namecolumn_description

group_id

The unique identifier of the group.

user_id

The unique identifier of the user. References user_dim.user_id

assignment_id

The unique identifier of the assignment. References assignment_dim.assignment_id.

is_completed

A boolean indicating if the user completed the assignment.

completion_status

The status of completion of the assignment. It can be completed / late / in_progress / missed.

assigned_at

The timestamp when the assignment was created.

due_at

The timestamp when the assignment is due.

completed_at

The timestamp when the user completed the assignment.

email_sent_at

The timestamp when an email was sent to the user.

reminder_sent_at

The timestamp when a reminder was sent to the user.

Certifications

Certification Dim

certification_dim: The certification dimension provides data about a specific certification. Certification ID 1 and 2 are the so-called v1 iterations of certifications. From June 2022 onwards, we switched to a process where an official certification registration was needed, and this registration would time out after 30 days.

column_namecolumn_description

certification_id

A unique identifier for the certification.

certification_name

The name of the certification.

certification_version

The version of the certification.

Certification Fact

certification_fact: The certification fact table provides data about the users’ progress through different stages of a certification.

column_namecolumn_description

group_id

The unique identifier of the group.

user_certification_id

The unique identifier of a user attempting a certification. A user can have more than one attempt at obtaining a certification.

user_id

The unique identifier of the user. References user_dim.user_id

certification_id

The unique identifier of the certification. References certification_dim.certification_id.

registered_at

The timestamp when the user registered for the certification attempt.

expired_at

The timestamp when the user certification attempt (will) expire.

assessments_passed_at

(v1 only) The timestamp when the user passed the assessments.

coding_challenge_passed_at

(v1 only) The timestamp when the user passed the coding challenge(s).

first_exam_started_at

(v2 only) The timestamp when the user started their first certification exam in an attempt.

first_exam_completed_at

(v2 only) The timestamp when the user completed their first certification exam in an attempt.

exam_last_passed_at

(v2 only) The timestamp when the user passed their last certification exam.

case_study_ready_at

The timestamp when the user passed all stages prior to the practical exam.

first_case_study_submitted_at

The timestamp when the first practical exam was submitted for grading.

last_case_study_submitted_at

The timestamp for then the last practical exam was submitted for grading.

certificate_granted_at

The date at which the certificate was granted by the admin.

is_passed_assessments

An indicator whether user passed all timed assessments.

is_passed_challenges

An indicator whether the user passed all coding challenges.

is_passed_exams

An indicator whether the user passed all certification exams.

Courses

Exercise Dim

exercise_dim: The exercise dimension provides descriptive data about a specific exercise. Because the model considers users’ progress on all exercises (i.e., even exercises that are no longer available), there is a specific row with the id = -1. This row is used in the exercise fact table to refer to a deleted exercise.

column_namecolumn_description

exercise_id

The unique id of the exercise id.

title

The title of the exercise.

type

The type of exercise type (e.g., NormalExercise, VideoExercise, MultipleChoiceExercise, …)

number

The number of the exercise in the chapter, accounting for subexercises. This can be used to sort exercises in order.

xp

The maximum number of XP a user can get by completing the exercise.

technology

The course technology (e.g., Tableau, SQL, Python, R, …)

topic

The course topic (e.g., Data visualization, Programming, Machine Learning, …)

course_title

The course title the exercise belongs to.

course_slug

The course slug the exercise belongs to

course_xp

The maximum number of XP a user can get by completing the course the exercise belongs to.

course_description

The course description the exercise belongs to.

course_short_description

A shorter version of the course description the exercise belongs to.

course_launched_date

Date at which the course went live.

chapter_title

The chapter title the exercise belongs to.

chapter_slug

The chapter slug the exercise belongs to.

chapter_xp

The maximum number of XP a user can get by completing the chapter the exercise belongs to.

chapter_nb_exercises

The number of exercises in the chapter the exercise belongs to.

id

[DEPRECATED] Use exercise_id instead.

Exercise Fact

exercise_fact: The exercise fact table provides data about the users’ progress: XP gained and time spent on each exercise. The table provides data for exercises that can be no longer available (and deleted from our database). In this case, the exercise_id value is set to -1 and links to a “deleted exercise”” row in the exercise_dim table.Fact table for exercise content. It provides engagement and xp measurements on exercise.

column_namecolumn_description

user_id

The unique user id. Reference to user_dim.id.

exercise_id

The unique exercise_id. Reference to exercise_dim.id.

date_id

The date id. Reference to date_dim.id.

started_at

The timestamp at which the user started the exercise (as registered by the DataCamp application).

completed_at

The timestamp at which the user completed the exercise (as registered by the DataCamp application).

time_spent

How much time (in seconds) the user spent on the exercise?

xp

How many XP did the user gain by completing the exercise?

Chapter Dim

chapter_dim: The chapter dimension provides descriptive data about a specific chapter. Because the model considers users’ progress on all chapters (i.e., even chapters that are no longer available), there is a specific row with the id = -1. This row is used in the chapter fact table to refer to a deleted chapter.

column_namecolumn_description

chapter_id

The unique id of the chapter id.

title

The title of the chapter.

slug

The slug of the chapter.

xp

The maximum number of XP a user can get by completing the chapter.

technology

The course technology (e.g., Tableau, SQL, Python, R, …).

topic

The course topic (e.g., Data visualization, Programming, Machine Learning, …).

nb_exercises

The number of exercises in the chapter.

course_title

The title of the course the chapter belongs to.

course_slug

The slug of the course the chapter belongs to.

course_xp

The maximum number of XP a user can get by completing the course the chapter belongs to.

course_description

The description of the course the chapter belongs to.

course_short_description

A shorter description of the course the chapter belongs to.

course_launched_date

The date on which the course went live.

id

[DEPRECATED] Use chapter_id instead.

Chapter Fact

chapter_fact: The chapter fact table provides data about the users’ progress: XP gained and time spent on each chapter. The table provides data for chapters that can be no longer available (and deleted from our database). In this case, the chapter_id value is set to -1 and links to a “deleted chapter” row in the chapter_dim table.

column_namecolumn_description

user_id

The unique id of the user. References to user_dim.user_id.

chapter_id

The unique id of the chapter. References to chapter_dim.chapter_id.

date_id

The date on which the user worked on the chapter. References to dim_date.id

started_at

The timestamp when the user started the chapter (as registered by the DataCamp application).

completed_at

The timestamp when the user completed the chapter (as registered by the DataCamp application).

time_spent

The time (in seconds) the user spent on the chapter.

xp

The XP the user gained by working on the chapter.

Course Dim

course_dim: The course dimension provides descriptive data about a specific course. Because the model considers users’ progress on all courses (i.e., even courses that are no longer available), there is a specific row with the id = -1. This row is used in the course fact table to refer to a deleted course.

column_namecolumn_description

course_id

The unique id of the course.

title

The title of the course.

technology

The course technology (e.g., Tableau, SQL, Python, R, …).

topic

The course topic (e.g., Data visualization, Programming, Machine Learning, …).

xp

The maximum number of XP a user can get by following the course.

nb_hours_needed

Time needed in hours to complete the course.

slug

The slug of the course

description

A description of the course.

short_description

A short description of the course.

launched_date

The date on which the course went live.

id

[DEPRECATED] Use course_id instead.

Course Fact

course_fact: The course fact table provides data about the users’ progress: XP gained and time spent on each course. The table provides data for courses that can be no longer available (and deleted from our database). In this case, the course value is set to -1 and links to a “deleted course” row in the course_dim table.

column_namecolumn_description

user_id

The unique id of the user id. References user_dim.user_id

course_id

The unique id of the course. References course_dim.id

date_id

The unique id of the date on which the user worked on the course. References to dim_date.id

started_at

The timestamp when the user started the course (as registered by the DataCamp application).

completed_at

The timestamp when the user started the course (as registered by the DataCamp application). It is NULL when the user has NOT completed the course yet.

time_spent

The time (in seconds) the user spent on the course.

xp

The XP the user gained by working on the course.

Practice

Practice Dim

practice_dim: List of all practices

column_namecolumn_description

practice_id

The practice id (internal mobile_app.pools id) [PK]

title

The practice name

status

The practice status (HIDDEN, LIVE)

technology

The practice’s technology type

xp

Xp gained by completing the practice

id

[DEPRECATED] Use practice_id instead.

Practice Fact

practice_fact: Fact table for practice content. It provides engagement and xp measurements on projects. This table includes practices (aka challenges) after practice replaced challenges.

column_namecolumn_description

user_id

The user id (internal id) [PK], [FK to user_dim.id]

practice_id

The practice id the user started (internal mobile_app pools id). [PK], [FK to practice_dim.id]

date_id

The date at which the user worked on the practice [PK], [FK to dim_date.id]

started_at

The date at which the user started the practice

completed_at

The date at which the user completed the practice. It is NULL when the user did not complete the practice

is_mobile

Whether the user did the practice on mobile or in the browser

time_spent

How much time (in seconds) the user spent on the practice

xp

How many XP did the user gain by completing the practice

Projects

The Data Connector includes certification projects, which differ from standard projects.

Certification projects are used exclusively during the Certification process. Reports in the Group Hub do not include certification projects. As a result, the reports in the Group hub may show a slight difference in numbers compared to those reported by the Data Connector.

Project Dim

project_dim: The project dimension provides descriptive data about a specific project.

column_namecolumn_description

project_id

The unique id of the project.

title

The title of the course.

technology

The course technology (R, Python, SQL)

xp

The maxmimum number of XP a user can get by completing the project.

nb_hours_needed

The number of hours needed to complete the project.

is_guided

A boolean indicating whether the project is guided or not.

is_certification

A boolean indicating whether the project is used for certification.

description

A description of the project.

short_description

A short description of the project.

id

[DEPRECATED] Use project_id instead.

Project Fact

project_fact: The project fact table provides data about the users’ project progress: time spent and XP gained.

column_namecolumn_description

user_id

The unique id of the user. References user_dim.user_id

project_id

The unique id of the project. References project_dim.project_id

date_id

The id of the date on which the user worked on the project. References date_dim.date_id

started_at

The timestamp when the user started the project (as registered by the DataCamp application).

completed_at

The timestamp when the user completed the project (as registered by the DataCamp application). It is NULL when the user has NOT completed the project yet.

time_spent

The time (in seconds) the user spent on the project.

xp

The XP the user gained by working on the project.

Tracks

Track Content Dim

track_content_dim: List of content in all live tracks. This table has one row per track_version and content item. It currently only includes courses and projects. Note that a content item can belong to multiple tracks, and different versions of a track might contain different content items.

column_namecolumn_description

track_version_id

The id of the track version.

track_id

The id of the track.

content_id

The id of the content, a concatenation of content type and id.

content_type

The content type (course / project).

content_title

The title of the content.

position

The position of the content in the track version.

xp

The total xp earned on completion of content.

Track Content Fact

track_content_fact: Fact table for track content.

column_namecolumn_description

group_id

The id of the group.

user_id

The id of the user.

track_version_id

The id of the track version.

content_id

The id of the content, a concatenation of content type and id.

date_id

The unique id of the date on which the enrolled in the track. References to dim_date.id

started_at

The timestamp when the user enrolled in the track.

completed_at

The timestamp when the user completed the track.

xp

The xp gained by completing the content.

nb_seconds

The total number of seconds spent on the content.

Track Dim

track_dim: List of all versions of live tracks, along with their title, technology, and descriptions. This table has one row for every track version, for tracks that are live.

column_namecolumn_description

track_version_id

The id of the track version.

track_id

The id of the track.

version_number

The version of the track.

title

The title concatenated with subtitle of the track version.

technology

The technology of the track.

category

The category of the track.

short_description

A short description of the track

published_live_at

The timestamp for when the track version was published live.

archived_at

The timestamp for when the track version was archived.

is_current_version

A boolean indicating if this is the current version.

is_custom

A boolean indicating if this is a custom track.

nb_courses

The number of courses in the track version.

nb_projects

The number of projects in the track version.

xp

The total xp gained on completion of the track.

Track Fact

track_fact: Fact table for tracks.

column_namecolumn_description

group_id

The id of the group.

user_id

The id of the user.

track_version_id

The id of the track version.

date_id

The unique id of the date on which the enrolled in the track. References to dim_date.id

started_at

The timestamp when the user enrolled in the track.

completed_at

The timestamp when the user completed the track.

is_currently_active

A boolean indicating if the user is currently active in the track

DataLab

Note: In March of 2024, Workspaces was renamed DataLab. Individual workspaces are now called workbooks. We have kept the column names in the previous format to prevent any problems with existing connections but have updated other references where appropriate.

Publication Fact

publication_fact: The publication fact table provides data about the users’ DataLab publications: number of viewers and time spent viewing by date and viewer type (creator / viewer).

column_namecolumn_description

group_id

The unique identifier of the group.

creator_id

The unique identifier of the user who created the DataLab workbook. References user_dim.user_id.

workspace_id

The unique identifier of the DataLab Workbook. References workspace_dim.workspace_id

viewed_at

The date on which the DataLab publication was viewed.

nb_viewers

The number of unique viewers who viewed the publication.

nb_seconds

The number of seconds spent viewing the publication.

viewer_type

The type of viewer (creator / viewer)

DataLab Dim

workspace_dim: The DataLab dimension table provides descriptive data about a specific DataLab workbook.

column_namecolumn_description

workspace_id

The unique identifier of the DataLab workbook.

workspace_title

The title of the workbook.

technology

The language of the workbook (R / Python)

owner_type

Whether the owner is individual or group.

category

The category of the source template. - base: - dataset - recipe - playbook - project - boilerplate

key

The template key (NULL for non-template types).

is_featured

A boolean indicating if the workbook is featured on the profile.

group_id

The unique identifier of the group, the workbook creator belonged to.

DataLab Workbook Fact

workspace_fact: The workbook fact table provides data about the workbook created by users.

column_namecolumn_description

creator_id

The user id of the creator of the workbook. References user_dim.user_id.

workspace_id

The id of the workspace. References workspace_dim.workspace_id.

nb_megabytes

The size of files in the workbook measured in megabytes.

nb_attempts_to_publish

The number of times the workbook was published.

nb_times_published_successfully

The number of times the workbook was published successfully.

nb_upvotes

The number of upvotes received.

nb_shares

The number of times the workbook was shared.

created_at

The timestamp for when the workbook is created.

updated_at

The timstamp for when the workbook is updated.

first_edited_at

The timestamp for when the workbook was first edited.

last_edited_at

The timestamp for when the workbook was last edited.

first_published_at

The timestamp for when the workbook was first published.

last_published_at

The timestamp for when the workbook was last edited.

first_integration_added_at

The timestamp for when the first integration was added to the workbook.

last_integration_added_at

The timestamp for when the last integration was added to the workbook.

first_upvoted_at

The timestamp for when the first upvote was received.

last_upvoted_at

The timestamp for when the last upvote was received.

group_id

The id of the group owner of the workbook.

Workspace Visit Fact

workspace_visit_fact: The workspace (workbook) visit fact table provides data on user visits to workbooks.

column_namecolumn_description

visitor_id

The user id of the visitor to the workbook. References user_dim.user_id.

workspace_id

The id of the workbook. References workspace_dim.workspace_id.

visited_at

The date of the visit, granular to the day level.

nb_seconds

The duration of the visit in seconds.

group_id

The id of the group.

User Teams

Team Dim

team_dim: The team dimension provides descriptive data about teams.

column_namecolumn_description

team_id

The unique identifier of the team

name

The team name

slug

The team slug

created_date

The team’s creation date

updated_date

The team’s updated date

deleted_date

The team’s deleted date

id

[DEPRECATED] Use team_id instead.

User Dim

user_dim: List of all enterprise users

column_namecolumn_description

user_id

The user id (internal id) [PK]

first_name

The user first name

last_name

The user last name

email

The user email

slug

The slug to the user profile

registered_at

When the user registered

deleted_at

When the user was deleted

last_visit_at

When the user visited (browsed) the plarform for the last time

last_time_spent_at

When the user spent time on the platform (campus, challenges, projects, mobile) for the last time

onboarding_completed_at

When the user completed their onboarding

first_content_completed_at

When the user completed their first content

id

[DEPRECATED] Use user_id instead

User Team Bridge

user_team_bridge: Bridge table to link users to their teams

column_namecolumn_description

user_id

The user id (internal id) [PK]

team_id

The team id (internal id) [PK]

joined_team_date

Date at which the user joined the team

left_team_date

Date at which the user left the team

Others

Xp Fact

xp_fact: This fact table provides information on XP gained by a user across different content modalities.

column_namecolumn_description

group_id

The unique identifier of the group. (references dim_groups.group_id)

user_id

The unique identifier of the user. (References dim_users.user_id)

event

The event for which XP was gained ( course_exercise_completed, alpa_onboarding_completed etc.)

created_date

The date on which the XP was gained

xp

The number of XP gained.

Last updated