NHLBI BioData Catalyst® Powered by PIC-SURE
  • NHLBI BioData Catalyst® Powered by PIC-SURE User Guide
    • Frequently Asked Questions
  • Introduction to PIC-SURE
    • General Layout
    • Browse vs. Explore
  • Browse
    • Browse All Data
    • Features of Browse
  • Explore
    • Log in to Explore
    • Features of Explore
      • Prepare for Analysis
      • PFB Handoff to BioData Catalyst Powered by Terra
    • Manage Datasets
  • Data in PIC-SURE
    • Data Organization in BDC-PIC-SURE
      • BDC-PIC-SURE Data Format
    • Available Data & Managing Data Access
      • Publicly Available Datasets
      • TOPMed and TOPMed Related Datasets
        • Harmonized Data (TOPMed DCC Harmonized Clinical Variables)
      • BioLINCC Datasets
      • CONNECTS Datasets
  • Prepare for Data Analysis Using the PIC-SURE API
    • What is the PIC-SURE API?
    • PIC-SURE Personal Access Token
    • Analysis in the BioData Catalyst Ecosystem
      • BDC Powered by Seven Bridges
      • BDC Powered by Terra
    • Data Dictionaries via PIC-SURE API
    • More information about the PIC-SURE API
  • Citation and Acknowledgement of BioData Catalyst
  • Release Notes
    • Release Notes
      • 2025 May 8 Release
      • 2025 April 3 Release
      • 2025 March 5 Release
      • 2025 February 10 Release
      • 2024 Release Notes
        • 2024 December 19 Release
        • 2024 November 21 Release
        • 2024 November 4 Release
        • 2024 October 3 Release
        • 2024 September 5 Release
        • 2024 August 20 Release
        • 2024 August 1 Release
        • 2024 June 18 Release
        • 2024 May 29/30 Release
        • 2024 May 10/14 Release
        • 2024 March 26/28 Release
        • 2024 February 20/22 Release
        • 2024 January 30/31
        • 2024 January 16 Release
        • 2024 June 27 Release
      • 2023 Release Notes
        • 2023 December 12/14 Release
        • 2023 November 17 Release
        • 2023 October 23/31 Releases
        • 2023 October 13 Release
        • 2023 October 6 Release
        • 2023 September 28 Release
        • 2023 August 29 Release
        • 2023 July 27 Release
        • 2023 May 25 Release
        • 2023 March 30 Release
        • 2023 January 26 Release
  • Video Tutorials
    • Introduction to BioData Catalyst Powered by PIC-SURE
    • Basics: Finding Variables
    • Basics: Applying a Filter on a Variable
    • Basics: Editing a Variable Filter
    • PIC-SURE Open Access: Interpreting the Results
    • PIC-SURE Authorized Access: Add Variables to Export
    • PIC-SURE Authorized Access: Applying a Genomic Filter
    • PIC-SURE Authorized Access: Variable Distributions Tool
    • PIC-SURE Open Application Programming Interface (API)
  • Appendix
    • Glossary
    • Appendix 1: BDC Identifiers - dbGaP, TOPMed, and PIC-SURE
    • Appendix 2: Table of TOPMed DCC Harmonized Variables in PIC-SURE
Powered by GitBook
On this page
  • Table of BioData Catalyst dbGaP/TOPMed Identifiers
  • Table of PIC-SURE Identifiers
  1. Appendix

Appendix 1: BDC Identifiers - dbGaP, TOPMed, and PIC-SURE

Table of BioData Catalyst dbGaP/TOPMed Identifiers

Identifiers
Definition

Patient ID

This is the HPDS Patient num. This is PIC-SURE HPDS’s internal Identifier.

Topmed / Parent Study Accession with Subject ID

These are the identifiers used by each in the team in the consortium to link data. Values must follow this mask: <STUDY_ACCESSION_NUMBER>._<SUBJECT_ID> Eg: phs000007.v30_XXXXXXX

DBGAP_SUBJECT_ID

This is a generated id that is unique to each patient in a study. Controlled by dbGaP. It is not unique across unrelated studies. However Patients can be linked across studies. See SOURCE_SUBJECT_ID. However a patient will be assigned the same across related studies. For dbGaP to assign the same dbGaP subject ID, include the two variables, SUBJECT_SOURCE and SOURCE_SUBJECT_ID. This identifier is used in all the phenotypic data files and is what we sequence to a HPDS Patient Num ( Patient ID ). All sequenced identifiers are stored in a PatientMapping file and stored in s3. These mappings allow HPDS data to be correlated back to the raw data sets.

SUBJECT_ID

This is a generated id that is unique to each patient in a study. Controlled by the submitter of a study. For FHS this is replaced with shareid for phs000007. For phs000974 It uses SUBJECT_ID. The values for these two columns are the same however.

SHARE_ID

For FHS phs000007 this was used instead of SUBJECT_ID, but not for FHS phs000974

SOURCE_SUBJECT_ID

This is used internally by dbGaP in conjunction with SUBJECT_SOURCE to allow submitters to associate subjects across studies.

SAMPLE_ID

De-identified sample identifier. These are the ids that link to the molecular data in dbGaP (vcfs, etc.).

Table of PIC-SURE Identifiers

Concept Path
Identifier

\_Topmed Study Accession with Subject ID\

Generated identifier for TOPMed Studies. These identifiers are a concatenation using the accession name and “SUBJECT_ID” from a study’s subject multi file.

<STUDY_ACCESSION_NUMBER>._<SUBJECT_ID> Eg: phs000974.v3_XXXXXXX

\_Parent Study Accession with Subject ID\

Generated identifier for PARENT Studies. In most studies this follows the same pattern as the TOPMed Study Accession with Subject id.

However, Framingham’s parent study phs000007 does not contain SUBJECT_ID column which is replaced using the SHAREID column.

Eg: phs000007.v3_XXXXXXX

\_VCF Sample Id\

This variable is stored in the sample multi file in each dbGaP study.

This is the TOPMed DNA sample identifier. This is used to give each sample/sequence a unique identifier across TOPMed studies.

Eg: NWD123456

Patient ID (not a concept path but exists in data exports)

This is PIC-SURE’s internal Identifier. It is commonly referred to as HPDS Patient num.

|This identifier is generated and assigned to subjects when they are loaded. It is not meant for data correlation between different data sources.

PreviousGlossaryNextAppendix 2: Table of TOPMed DCC Harmonized Variables in PIC-SURE

Last updated 8 months ago