NHLBI BioData Catalyst® Powered by PIC-SURE
  • NHLBI BioData Catalyst® Powered by PIC-SURE User Guide
    • Frequently Asked Questions
  • Introduction to PIC-SURE
    • General Layout
    • Browse vs. Explore
  • Browse
    • Browse All Data
    • Features of Browse
  • Explore
    • Log in to Explore
    • Features of Explore
      • Prepare for Analysis
      • PFB Handoff to BioData Catalyst Powered by Terra
    • Manage Datasets
  • Data in PIC-SURE
    • Data Organization in BDC-PIC-SURE
      • BDC-PIC-SURE Data Format
    • Available Data & Managing Data Access
      • Publicly Available Datasets
      • TOPMed and TOPMed Related Datasets
        • Harmonized Data (TOPMed DCC Harmonized Clinical Variables)
      • BioLINCC Datasets
      • CONNECTS Datasets
  • Prepare for Data Analysis Using the PIC-SURE API
    • What is the PIC-SURE API?
    • PIC-SURE Personal Access Token
    • Analysis in the BioData Catalyst Ecosystem
      • BDC Powered by Seven Bridges
      • BDC Powered by Terra
    • Data Dictionaries via PIC-SURE API
    • More information about the PIC-SURE API
  • Citation and Acknowledgement of BioData Catalyst
  • Release Notes
    • Release Notes
      • 2025 May 8 Release
      • 2025 April 3 Release
      • 2025 March 5 Release
      • 2025 February 10 Release
      • 2024 Release Notes
        • 2024 December 19 Release
        • 2024 November 21 Release
        • 2024 November 4 Release
        • 2024 October 3 Release
        • 2024 September 5 Release
        • 2024 August 20 Release
        • 2024 August 1 Release
        • 2024 June 18 Release
        • 2024 May 29/30 Release
        • 2024 May 10/14 Release
        • 2024 March 26/28 Release
        • 2024 February 20/22 Release
        • 2024 January 30/31
        • 2024 January 16 Release
        • 2024 June 27 Release
      • 2023 Release Notes
        • 2023 December 12/14 Release
        • 2023 November 17 Release
        • 2023 October 23/31 Releases
        • 2023 October 13 Release
        • 2023 October 6 Release
        • 2023 September 28 Release
        • 2023 August 29 Release
        • 2023 July 27 Release
        • 2023 May 25 Release
        • 2023 March 30 Release
        • 2023 January 26 Release
  • Video Tutorials
    • Introduction to BioData Catalyst Powered by PIC-SURE
    • Basics: Finding Variables
    • Basics: Applying a Filter on a Variable
    • Basics: Editing a Variable Filter
    • PIC-SURE Open Access: Interpreting the Results
    • PIC-SURE Authorized Access: Add Variables to Export
    • PIC-SURE Authorized Access: Applying a Genomic Filter
    • PIC-SURE Authorized Access: Variable Distributions Tool
    • PIC-SURE Open Application Programming Interface (API)
  • Appendix
    • Glossary
    • Appendix 1: BDC Identifiers - dbGaP, TOPMed, and PIC-SURE
    • Appendix 2: Table of TOPMed DCC Harmonized Variables in PIC-SURE
Powered by GitBook
On this page
  • Descriptions of Fields in PIC-SURE Data Dictionary
  • PIC-SURE Data Dictionary Fields
  1. Prepare for Data Analysis Using the PIC-SURE API

Data Dictionaries via PIC-SURE API

PreviousBDC Powered by TerraNextMore information about the PIC-SURE API

Last updated 1 year ago

The PIC-SURE API can be used to extract the data dictionary. This can be done regardless of authorization to access data and can be done with one, multiple, or all studies.

Descriptions of Fields in PIC-SURE Data Dictionary

Note that there are several types of studies available in PIC-SURE:

  1. dbGaP format compliant: ingested by dbGaP in the dbGaP recommended format ()

  2. dbGaP ingested, but not format-compliant

  3. Not ingested by dbGaP: are not format-compliant and do not have a study accession (phs number)

PIC-SURE Data Dictionary Fields

There are some fields included that may not be relevant. Some fields are generated during the PIC-SURE data curation process that are duplicates of other fields listed, as well as others that are stored specifically for internal use; these have been identified below.

  • values: An array of all unique values included for the variable.

  • studyId: ID associated with a study. For dbGaP-assosciated studies this is in the format phsxxxxxx. Non-dbGaP studies can be in other formats. The field is consistent with the DBGAP ACCESSION NUMBER in BDC Powered by Gen3.

  • dtId: ID associated with the table the variable is stored in within the study. For studies in dbGaP format, this is provided as “phtXXXXXXX”. Non-compliant studies can instead be names of the table or form, listed as "All Variables" if the variables were not grouped in a table or form.

  • varId: ID associated with the variable. For studies in dbGaP format, this is provided as “phvXXXXXXXX”. Non-compliant studies instead have a short text ID provided that can be a duplicate of the columnmeta_name field.

  • is_categorical: boolean True/False values that describe whether a variable is filtered in PIC-SURE as a set of discrete values (categorical).

  • is_continuous: boolean True/False values that describe whether a variable is filtered in PIC-SURE as a numerical range (continuous).

  • columnmeta_is_stigmatized - boolean True/False value that determines whether a variable is shown in Open PIC-SURE. A value of True means that the variable is not shown in Open PIC-SURE. For further information about stigmatizing variables, please refer to this documentation:

  • columnmeta_name: A short text ID associated with a variable. These are often not human-readable as they are mostly derived from the column names in datasets. For non-compliant studies, this can be a duplicate of the varID field.

  • description: A text field with a human-readable description of the variable. When not provided by the study submitters, this field will be a duplicate of the columnmeta_name field.

  • HPDS_PATH: The concept path used to uniquely identify a variable when exported to users. For more information about concept paths and data organization, please refer to the .

  • derived_group_id: The table ID and version number, when applicable.

  • columnmeta_var_group_description: If provided by the study submitters, this field contains a long text description of variable groupings. Variables are not always grouped together in studies.

  • derived_variable_level_data: An array of additional information that is study- and variable-specific. An example would be units of measurement. This is only available for some of the studies.

  • data_hierarchy: A text field displaying a human-readable path that is used in the PIC-SURE user interface. This is only available for some of the studies.

  • columnmeta_data_type: Text field containing "categorical" or "continuous", based on the is_categorical and is_continuous fields.

  • derived_var_id: Variable ID with version number, when applicable.

  • derived_study_abv_name: Short text abbreviation used to refer to a study and shown in the PIC-SURE user interface.

  • derived_study_description: Description of the study, consistent with the “Full Name” field i BDC Powered by Gen3.

  • columnmeta_min: Field generated internally for use in the PIC-SURE user interface elements for specific studies. Describes the minimum associated with continuous variables.

  • columneta_max: Fields generated internally for use in the PIC-SURE user interface elements for specific studies. Describes the maximum associated with continuous variables.

  • hashed_var_id: Hashed variable ID for internal use.

The following are fields that are duplicated data:

  • columnmeta_hpds_path: duplicate of HPDS_PATH

  • columnmeta_var_id: duplicate of varId

  • derived_var_description: duplicate of description

  • derived_group_description: duplicate of columnmeta_var_group_description

  • columnmeta_description: duplicate of description

  • derived_study_id: duplicate of studyId

  • columnmeta_study_id: duplicate of studyId

  • is_stigmatized: duplicate of columnmeta_is_stigmatized

  • derived_var_name: duplicate of columnmeta_name

  • columnmeta_var_group_id: duplicate of dtId

  • columnmeta_HPDS_PATH: duplicate of HPDS_PATH

  • min, max: duplicates of columnmeta_min, columneta_max

https://www.ncbi.nlm.nih.gov/gap/docs/submissionguide/
https://github.com/hms-dbmi/biodata_catalyst_stigmatizing_variables/tree/main
Data Organization in BDC-PIC-SURE page