Service Workbench
  • General User Guide
    • Introduction
    • Studies
      • Open Data Studies
      • Controlled Data Studies
        • Human Connectome
        • TCGA (The Cancer Genome Atlas)
        • EMory BrEast Imaging Dataset (EMBED)
      • Organization Studies
    • Research workspaces
      • Workspace statuses
      • Workspace configurations
      • How can I create a workspace linked to a study?
      • How can I create a workspace WITHOUT a study?
      • Installing packages into workspaces
      • Using custom Jupyter Notebook kernels
      • Uploading data into SWB
    • Instances
    • Example Analyses & Notebooks
    • FAQ
      • Why was my workspace stopped when I was working on it?
      • How can I install devtools in R/Rstudio?
      • I don't see a 'studies' folder in my workspace?
      • Error: 'Forbidden' or 'Unable to connect'
      • Error: 'We have a problem! null is not an object'...
      • 403 error page
      • Error provisioning environment sagemaker
      • Workspace in Unknown status
      • Cannot connect to SageMaker workspace
      • I can't get through to the log-in page on SWB.
    • Help / Contact Us
    • Release Notes
  • AIM-AHEAD
    • Accessing OCHIN Data
      • Phase 1: Regulatory Requirements
      • Phase 2: Data Exploration
      • Phase 3: Data Analysis
      • Example Researcher Workflow
    • Understanding the OCHIN Data
      • i2b2 Common Data Model
      • Example Queries
      • Additional information
  • Harvard Medical School
    • Introduction
    • Activating your account
    • Creating your workspace
  • LEAP-DEV
    • Workspace Configurations
Powered by GitBook
On this page
  1. AIM-AHEAD
  2. Understanding the OCHIN Data

i2b2 Common Data Model

PreviousUnderstanding the OCHIN DataNextExample Queries

Last updated 1 year ago

The data views are based on the i2b2 Common Data Model (CDM). Detailed documentation for this can be found at .

However, note that your project-specific views might not include all the columns in the full CDM. Also, some of the fields in this database are used somewhat differently than in the examples used in the i2b2 CDM documentation.

The i2b2 CDM uses a database structure called a star schema. It looks like a star, with one central fact table surrounded by several dimension tables. (Though, here, you will be accessing views of these tables, rather than the tables themselves.)

i2b2 Common Data Model. Note that your project views will be slightly different than this.

The fact table (view named S00.OBSERVATION_FACT) contains clinical observations about a patient. It is important to note that an observation may not represent the onset or date of the condition or event being described, but instead is simply a recording or a notation of something. For example, the observation of 'diabetes' recorded in the database as a 'fact' at a particular time does not mean that the condition of diabetes began exactly at that time, only that a diagnosis was recorded at that time (there may be many diagnoses of diabetes for this patient over time).

The fact table contains the basic attributes about the observation, such as the patient and provider numbers, a concept code for the concept observed, a start and end date, and other parameters. All patient observations are placed in this one large table, such as diagnoses, procedures, medications, and laboratory test results. Though, your project view will only have access to certain data types. Despite being just a subset of the table, your project view might contain hundreds of millions of rows of data. Even simple queries using this view can take more than an hour to run.

Dimension tables contain further descriptive and analytical information about attributes in the fact table. For example, the patient dimension (view named S00.PATIENT_DIMENSION) contains a patient_num (unique identifier for the patient) and demographics such as date of birth, sex, race and ethnicity. The other dimension table views are S00.VISIT_DIMENSION within information about clinical encounters/visits and S00.PROVIDER_DIMENSION about the clinicians linked to each observation.

The S00.ZCTA_CVS view contains a list of ZIP Code Tabulation Areas (ZCTAs) and data about those ZCTAs obtained from various sources. This view can be joined to the patient using the ZIP_CD field in the S00.PATIENT_DIMENSION view. Though, note that the ZCTAs are 5-digit codes, while the ZIP_CD has 9-digit ZIP codes for many patients. So, you will need to join the views using “LEFT(ZIP_CD,5)”.

The common.I2B2 view contains a list of the concept codes that are used in the S00.OBSERVATION_FACT view. The concept codes come from standard medical vocabularies, such as ICD-9, ICD-10, LOINC, RxNorm, CPT4/HCPCS, etc. In the S00.OBSERVATION_FACT view, these codes are stored in a field named “concept_cd”. However, in the common.I2B2 view, the field is named “c_basecode”. In the common.I2B2 view, the name of the concept is in the field “c_name”.

A final view, common.CONCEPT_DIMENSION groups concept codes into hierarchies. The CONCEPT_PATH field in this view looks like a URL or computer file directory path, which indicates the path from the root node of the hierarchy down to the specific concept code.

Note that in the i2b2 web client, you can right-click a concept in the ontology and select “Show More Info”. This opens a tab, which contains SQL code for querying the common.CONCEPT_DIMENSION table to obtain the code for that concept as well as for its child/descendant concepts.

https://community.i2b2.org/wiki/display/BUN/i2b2+Common+Data+Model+Documentation