Appendix 1: BDC Identifiers - dbGaP, TOPMed, and PIC-SURE

Table of BioData Catalyst dbGaP/TOPMed Identifiers

Identifiers

Definition

Patient ID

This is the HPDS Patient num. This is PIC-SURE HPDS’s internal Identifier.

Topmed / Parent Study Accession with Subject ID

These are the identifiers used by each in the team in the consortium to link data. Values must follow this mask: <STUDY_ACCESSION_NUMBER>._<SUBJECT_ID> Eg: phs000007.v30_XXXXXXX

DBGAP_SUBJECT_ID

This is a generated id that is unique to each patient in a study. Controlled by dbGaP. It is not unique across unrelated studies. However Patients can be linked across studies. See SOURCE_SUBJECT_ID. However a patient will be assigned the same across related studies. For dbGaP to assign the same dbGaP subject ID, include the two variables, SUBJECT_SOURCE and SOURCE_SUBJECT_ID. This identifier is used in all the phenotypic data files and is what we sequence to a HPDS Patient Num ( Patient ID ). All sequenced identifiers are stored in a PatientMapping file and stored in s3. These mappings allow HPDS data to be correlated back to the raw data sets.

SUBJECT_ID

This is a generated id that is unique to each patient in a study. Controlled by the submitter of a study. For FHS this is replaced with shareid for phs000007. For phs000974 It uses SUBJECT_ID. The values for these two columns are the same however.

SHARE_ID

For FHS phs000007 this was used instead of SUBJECT_ID, but not for FHS phs000974

SOURCE_SUBJECT_ID

This is used internally by dbGaP in conjunction with SUBJECT_SOURCE to allow submitters to associate subjects across studies.

SAMPLE_ID

De-identified sample identifier. These are the ids that link to the molecular data in dbGaP (vcfs, etc.).

Table of PIC-SURE Identifiers

Concept Path

Identifier

\_Topmed Study Accession with Subject ID\

Generated identifier for TOPMed Studies. These identifiers are a concatenation using the accession name and “SUBJECT_ID” from a study’s subject multi file.

<STUDY_ACCESSION_NUMBER>._<SUBJECT_ID> Eg: phs000974.v3_XXXXXXX

\_Parent Study Accession with Subject ID\

Generated identifier for PARENT Studies. In most studies this follows the same pattern as the TOPMed Study Accession with Subject id.

However, Framingham’s parent study phs000007 does not contain SUBJECT_ID column which is replaced using the SHAREID column.

Eg: phs000007.v3_XXXXXXX

\_VCF Sample Id\

This variable is stored in the sample multi file in each dbGaP study.

This is the TOPMed DNA sample identifier. This is used to give each sample/sequence a unique identifier across TOPMed studies.

Eg: NWD123456

Patient ID (not a concept path but exists in data exports)

This is PIC-SURE’s internal Identifier. It is commonly referred to as HPDS Patient num.

|This identifier is generated and assigned to subjects when they are loaded. It is not meant for data correlation between different data sources.

PreviousGlossary NextAppendix 2: Table of TOPMed DCC Harmonized Variables in PIC-SURE

Last updated 1 year ago

Was this helpful?

hashtagTable of BioData Catalyst dbGaP/TOPMed Identifiers

hashtagTable of PIC-SURE Identifiers

Table of BioData Catalyst dbGaP/TOPMed Identifiers

Table of PIC-SURE Identifiers