Format of Participant-Level Data
There are two options of participant-level data format from BDC-PIC-SURE: Dataframe or CSV and PFB.
Dataframe or CSV Format
Participant-level data brought to an analysis platform using the Dataframe or CSV format will be handed off in a single table. In this table, each row represents a participant and each column represents a variable. The variables that are included in the table are the variables that were added as filters to the query and included in the export with the "Add Variable" action.
Example Table: Dataframe Format
1001
Male
31
Never had asthma
1003
Male
56
Currently has asthma
1004
Female
83
1005
Female
26
Currently has asthma
patient_idis a PIC-SURE-generated participant identifier.Each column is labeled with the variable's concept path. For many BDC studies, this is formatted as
\phs (study accession number)\pht (dataset accession number)\phv (variable accession number)\variable name\. For more information, please refer to Data Organization in BDC-PIC-SURE.
A tip with Data "Missing-ness":
In PIC-SURE output, an empty cell indicates that there is no data available for that variable and participant. This is demonstrated with participant 1004 above; there is an empty cell in the asthma column. This means that there is no information available for that participant for asthma status.
This is different than cells with NA values. If a cell contains NA, this was recorded by the study submitters. Depending on the context of the NA value, this could be useful information for analysis.
Portable Format for Biomedical Data (PFB)
Participant-level data brought to an analysis platform using the PFB format will be handed off in a single file with two tables: the data and data dictionary tables.
The data will be labeled as pic_sure_patients_[dataset ID] and show the participant-level data from PIC-SURE. The columns of this table are the variables, which are labeled as the PIC-SURE concept paths.
The data dictionary will be labeled as "pic_sure_data_dicitonary_[dataset ID]" and will contain information about the variables that have been exported. This includes information about each variable, such as the concept path, description, and display name. The data dictionary also includes DRS URIs, or links to the original data file, which can be used to access the files for further analysis in BDC analysis platforms.
Example Table: Data Table of PFB
1001
Male
31
Never had asthma
1003
Male
56
Currently has asthma
1004
Female
83
1005
Female
26
Currently has asthma
patient_idis a PIC-SURE-generated participant identifier.Each column is labeled with the variable's concept path. For many BDC studies, this is formatted as
\phs (study accession number)\pht (dataset accession number)\phv (variable accession number)\variable name\. For more information, please refer to Data Organization in BDC-PIC-SURE.
A tip with Data "Missing-ness":
In PIC-SURE output, an empty cell indicates that there is no data available for that variable and participant. This is demonstrated with participant 1004 above; there is an empty cell in the asthma column. This means that there is no information available for that participant for asthma status.
This is different than cells with NA values. If a cell contains NA, this was recorded by the study submitters. Depending on the context of the NA value, this could be useful information for analysis.
Example Table: Data Dictionary Table of PFB
\example_study\demographics\sex\
example_study
Participant sex recorded by the study
Sex
drs://example.com/unqiueID123
\example_study\demographics\age\
example_study
Participant age recorded by the study
Age
drs://example.com/uniqueID123
\example_study\exam1\asthma\
example_study
Exam 1: What is your current asthma status?
Asthma status
drs://example.com/uniqueID456
Each row of the data dictionary table corresponds to a column in the data table.
DRS URIs link to the study files from which the variable originated.
Last updated
Was this helpful?