Data Organization in BDC-PIC-SURE
Last updated
Last updated
BDC-PIC-SURE integrates clinical and genomic datasets across BDC, including TOPMed and TOPMed related studies, COVID-19 studies, and BioLINCC studies. Each variable is organized as a concept path that contains information about the study, variable group, and variable. Though the specifics of the concept paths are dependent on the type of study, the overall information included in the same.
For more information about additional dbGaP, TOPMed, and PIC-SURE concept paths, refer to Appendix 1.
General organization
Data organized using the format implemented by the . Find more information on the dbGaP datastructure . Generally, a given study will have several tables, and those tables will have several variables.
Data do not follow dbGaP format; there are no phv or pht accessions. Data are organized in groups of like variables, when available. For example, variables like Age, Gender, and Race could be part of the Demographics variable group.
Concept path structure (flexible concept path strucutre)
\phs\pht\phv\variable name\
\phs\variable name or \phs\form name\variable or \phs\form group\form name\variable group\variable
Variable ID
phv corresponding to the variable accession number
Equivalent to variable name
Variable name
Encoded variable name that was used by the original submitters of the data
Encoded variable name that was used by the original submitters of the data
Variable description
Description of the variable
Description of the variable, as available
Dataset ID
pht corresponding to the trait table accession number
Equivalent to dataset name
Dataset name / Form Group / Variable Group
Name of the trait table
Name of a group of like variables, as available
Dataset description / Form description / Variable description
Description of the trait table
Description of a group of variables, as available
Study ID
phs corresponding to the study accession number
phs corresponding to the study accession number
Study description
Description of the study from dbGaP
Description of the study from dbGaP
Note that there are two data types in PIC-SURE: categorical and continuous data. Categorical variables refer to any variables that have categorized values. For example, “Have you ever had asthma?” with values “Yes” and “No” is a categorical variable. Continuous variables refer to any variables that have a numeric range of values. For example, “Age” with a value range from 10 to 90 is a continuous variable. The internal PIC-SURE data load process determines the type of each variable based on the data.