Data model

The data views are based on the i2b2 Common Data Model (CDM). Detailed documentation for this can be found at https://community.i2b2.org/wiki/display/BUN/i2b2+Common+Data+Model+Documentation.

However, note that your project-specific views might not include all the columns in the full CDM. Also, some of the fields in this database are used somewhat differently than in the examples used in the i2b2 CDM documentation.

The i2b2 CDM uses a database structure called a star schema. It looks like a star, with one central fact table surrounded by several dimension tables. (Though, here, you will be accessing views of these tables, rather than the tables themselves.)

The fact table (view named S00.OBSERVATION_FACT) contains clinical observations about a patient. It is important to note that an observation may not represent the onset or date of the condition or event being described, but instead is simply a recording or a notation of something. For example, the observation of 'diabetes' recorded in the database as a 'fact' at a particular time does not mean that the condition of diabetes began exactly at that time, only that a diagnosis was recorded at that time (there may be many diagnoses of diabetes for this patient over time).

The fact table contains the basic attributes about the observation, such as the patient and provider numbers, a concept code for the concept observed, a start and end date, and other parameters. All patient observations are placed in this one large table, such as diagnoses, procedures, medications, and laboratory test results. Though, your project view will only have access to certain data types. Despite being just a subset of the table, your project view might contain hundreds of millions of rows of data. Even simple queries using this view can take more than an hour to run.

Dimension tables contain further descriptive and analytical information about attributes in the fact table. For example, the patient dimension (view named S00.PATIENT_DIMENSION) contains a patient_num (unique identifier for the patient) and demographics such as date of birth, sex, race and ethnicity. The other dimension table views are S00.VISIT_DIMENSION within information about clinical encounters/visits and S00.PROVIDER_DIMENSION about the clinicians linked to each observation.

The S00.ZCTA_CVS view contains a list of ZIP Code Tabulation Areas (ZCTAs) and data about those ZCTAs obtained from various sources. This view can be joined to the patient using the ZIP_CD field in the S00.PATIENT_DIMENSION view. Though, note that the ZCTAs are 5-digit codes, while the ZIP_CD has 9-digit ZIP codes for many patients. So, you will need to join the views using “LEFT(ZIP_CD,5)”.

The common.I2B2 view contains a list of the concept codes that are used in the S00.OBSERVATION_FACT view. The concept codes come from standard medical vocabularies, such as ICD-9, ICD-10, LOINC, RxNorm, CPT4/HCPCS, etc. In the S00.OBSERVATION_FACT view, these codes are stored in a field named “concept_cd”. However, in the common.I2B2 view, the field is named “c_basecode”. In the common.I2B2 view, the name of the concept is in the field “c_name”.

A final view, common.CONCEPT_DIMENSION groups concept codes into hierarchies. The CONCEPT_PATH field in this view looks like a URL or computer file directory path, which indicates the path from the root node of the hierarchy down to the specific concept code.

Note that in the i2b2 web client, you can right-click a concept in the ontology and select “Show More Info”. This opens a tab, which contains SQL code for querying the common.CONCEPT_DIMENSION table to obtain the code for that concept as well as for its child/descendant concepts.

Last updated