Phase 3: Data Analysis

This section covers how to use a SageMaker workspace to access saved data subsets for analysis and run analyses on the OCHIN data.

Creating a SageMaker workspace

Create your workspace

Step 1: Navigate to the Studies page. The organization studies are linked to Amazon S3 secure storage. This means that anything saved in these study folders will be securely saved and accessible through any workspace the study is mounted to.

For more information about studies, view the documentation here.

Step 2: Select the Organizations tab

Step 3:Select the study with your project and user name attached

Step 4: Click Next

Step 5: Select the Sagemaker Notebook as shown below

Step 6: Click Next

Step 7: Enter a name: Any name. Note that the Name can contain only alphanumeric characters (case sensitive) and hyphens. It must start with an alphabetic character and cannot be longer than 128 characters.

No change necessary for the Restricted CIDR field

Step 8: Select the Project Id dropdown

Step 9: Select your AIM AHEAD affiliation: For example Research-Fellowship or Consortium-Development-Project

Step 10: Select a sagemaker-small workspace

Step 11: Enter a Description for your benefit: Any description. Note that the Description must be at least 3 characters.

Step 12: Click Create Research Workspace

Wait for your workspace to become available.

This may take 12-20 minutes.

Once your workspace is listed as AVAILABLE, you can connect to it.

Connecting your project database

Connect to your SageMaker workspace.

Step 1: Click Connections

Step 2: Click Connect. A new tab in your internet browser will open with your SageMaker workspace.

Step 3: In the new window, select the Sagemaker Examples tab at the top of the page.

Step 4: Under the Access to Data and Compute Using Service Workbench section, click the Use button next to one of the example notebooks, such as Connecting to OCHIN DB - R.ipynb.

Step 5: Click Create Copy on the pop-up window. NOTE: This will create a copy of ALL example notebooks listed in the Access to Data and Compute Using Service Workbench section, so you only need to do this action once to get access to all the notebooks.

Step 6: Select the newly created folder Access-to-Data-and-Compute-Using-Service-Workbench

Step 7: Select a notebook you will use to access your data. There is an example provided using R, and one using Python. It is recommended that you choose the programming language you are more comfortable with.

Using R and python to access tables in your project database

The R and python example notebooks contain all the steps needed to connect to the OCHIN Database.

They walk through the following steps:

  1. Install necessary drivers and packages

  2. Read and parse your DB credentials

  3. Connect to the database

  4. Query the database

Last updated