Appendix 1: BDC Identifiers - dbGaP, TOPMed, and PIC-SURE
Table of BioData Catalyst dbGaP/TOPMed Identifiers
Identifiers | Definition |
---|---|
Patient ID | This is the HPDS Patient num. This is PIC-SURE HPDS’s internal Identifier. |
Topmed / Parent Study Accession with Subject ID | These are the identifiers used by each in the team in the consortium to link data. Values must follow this mask: <STUDY_ACCESSION_NUMBER>._<SUBJECT_ID> Eg: phs000007.v30_XXXXXXX |
DBGAP_SUBJECT_ID | This is a generated id that is unique to each patient in a study. Controlled by dbGaP. It is not unique across unrelated studies. However Patients can be linked across studies. See SOURCE_SUBJECT_ID. However a patient will be assigned the same across related studies. For dbGaP to assign the same dbGaP subject ID, include the two variables, SUBJECT_SOURCE and SOURCE_SUBJECT_ID. This identifier is used in all the phenotypic data files and is what we sequence to a HPDS Patient Num ( Patient ID ). All sequenced identifiers are stored in a PatientMapping file and stored in s3. These mappings allow HPDS data to be correlated back to the raw data sets. |
SUBJECT_ID | This is a generated id that is unique to each patient in a study. Controlled by the submitter of a study. For FHS this is replaced with shareid for phs000007. For phs000974 It uses SUBJECT_ID. The values for these two columns are the same however. |
SHARE_ID | For FHS phs000007 this was used instead of SUBJECT_ID, but not for FHS phs000974 |
SOURCE_SUBJECT_ID | This is used internally by dbGaP in conjunction with SUBJECT_SOURCE to allow submitters to associate subjects across studies. |
SAMPLE_ID | De-identified sample identifier. These are the ids that link to the molecular data in dbGaP (vcfs, etc.). |
Table of PIC-SURE Identifiers
Concept Path | Identifier |
---|---|
\_Topmed Study Accession with Subject ID\ | Generated identifier for TOPMed Studies. These identifiers are a concatenation using the accession name and “SUBJECT_ID” from a study’s subject multi file. <STUDY_ACCESSION_NUMBER>._<SUBJECT_ID> Eg: phs000974.v3_XXXXXXX |
\_Parent Study Accession with Subject ID\ | Generated identifier for PARENT Studies. In most studies this follows the same pattern as the TOPMed Study Accession with Subject id. However, Framingham’s parent study phs000007 does not contain SUBJECT_ID column which is replaced using the SHAREID column. Eg: phs000007.v3_XXXXXXX |
\_VCF Sample Id\ | This variable is stored in the sample multi file in each dbGaP study. This is the TOPMed DNA sample identifier. This is used to give each sample/sequence a unique identifier across TOPMed studies. Eg: NWD123456 |
Patient ID (not a concept path but exists in data exports) | This is PIC-SURE’s internal Identifier. It is commonly referred to as HPDS Patient num. |This identifier is generated and assigned to subjects when they are loaded. It is not meant for data correlation between different data sources. |
Last updated