Prepare for Analysis
Last updated
Last updated
Prepare for Analysis is used to export participant-level data corresponding to your filters and variable selections. There are several steps to export the data, which are shown using this process.
The first step of the process is to review your cohort details. This provides a tabular summary of the variables that have been filtered and added.
Below the summary is an option to include sample identifiers in the export. This will allow you to connect the phenotypic data you have selected to the sample data associated with the participant. By checking the box, the sample identifier information will be added to your export if the selected participants have sample information available.
Note: Queries with more than 1,000,000 data points will not be exportable.
To complete the export, the user will need to decide what format they would like their participant-level data to be in. There are two options: Export as Data Frame or CSV or Export as PFB.
The Export as Data Frame or CSV option should be selected if you are interested in exporting your selected data as a Comma-Separated Values file or if you intend to complete your export using the PIC-SURE API via R or Python. This includes using Juptyer Notebooks or RStudio to export your data to BioData Catalyst Powered by Seven Bridges or BioData Catalyst Powered by Terra. For more information about using the PIC-SURE API for export, please refer to the Data Analysis Using the PIC-SURE API section.
In some instances, multiple values may relate to a single variable per participant. For example, some participants may have had several samples sequenced, resulting in many sample identifiers for a single participant. If there are multiple values for a given variable, these values will be separated by a tab or \t
character.
The Export as PFB option should be selected if you are interested in exporting your selected data as a Portable Format for Biomedical Data file or if you intend to send your data to BioData Catalyst Powered by Terra. For more information about this, please refer to PFB Handoff to BioData Catalyst Powered by Terra.
The next step is to save the dataset ID. The dataset ID is the unique identifier that is created for the specific cohort and data that you have selected for export. Type a name for the dataset ID into the field in order to save the dataset ID for future reference. For more information about accessing and managing previously saved dataset IDs, please refer to the Manage Datasets section.
The data is now ready for export. Based on your export format selection, there will be options displayed for export.
If you chose Export as Data Frame or CSV, the code to complete this export into a data frame in Python or R is provided. Additionally, the file can be downloaded as a CSV file.
If you chose Export as PFB, you have the option to export the file into a Terra workspace. Clicking either of these options will automatically put the file into the location of your choosing. For more information, please refer to PFB Handoff to BioData Catalyst Powered by Terra.