Phase 2: Data Exploration

This section covers how to use a SageMaker workspace to access OCHIN data and save your data subsets for later analysis.

Create a SageMaker workspace

chevron-rightCreate your workspacehashtag

Step 1: Navigate to the Studies page. The organization studies are linked to Amazon S3 secure storage. This means that anything saved in these study folders will be securely saved and accessible through any workspace the study is mounted to.

For more information about studies, view the documentation herearrow-up-right.

Step 2: Select the Organizations tab

Step 3:Select the study with your project and user name attached

Step 4: Click Next

Step 5: Select the Sagemaker Notebook as shown below

Step 6: Click Next

Step 7: Enter a name: Any name. Note that the Name can contain only alphanumeric characters (case sensitive) and hyphens. It must start with an alphabetic character and cannot be longer than 128 characters.

No change necessary for the Restricted CIDR field

Step 8: Select the Project Id dropdown

Step 9: Select your AIM AHEAD affiliation: For example Research-Fellowship or Consortium-Development-Project

Step 10: Select a sagemaker-small workspace

Step 11: Enter a Description for your benefit: Any description. Note that the Description must be at least 3 characters.

Step 12: Click Create Research Workspace

chevron-rightWait for your workspace to become available.hashtag

This may take 12-20 minutes.

Once your workspace is listed as AVAILABLE, you can connect to it.

triangle-exclamation

Connecting your project database

chevron-rightConnect to your SageMaker workspace.hashtag

Step 1: Click Connections

Step 2: Click Connect. A new tab in your internet browser will open with your SageMaker workspace.

Step 3: In the new window, select the Sagemaker Examples tab at the top of the page.

Step 4: Under the Access to Data and Compute Using Service Workbench section, click the Use button next to one of the example notebooks, such as Connecting to OCHIN DB - R.ipynb.

Step 5: Click Create Copy on the pop-up window. NOTE: This will create a copy of ALL example notebooks listed in the Access to Data and Compute Using Service Workbench section, so you only need to do this action once to get access to all the notebooks.

Step 6: Select the newly created folder Access-to-Data-and-Compute-Using-Service-Workbench

Step 7: Select a notebook you will use to access your data. There is an example provided using R, and one using Python. It is recommended that you choose the programming language you are more comfortable with.

triangle-exclamation

Using python to access tables in your project database

The Connecting to Ochin Data.ipynb notebook contains all the code needed to connect to the OCHIN Database.

circle-exclamation

The Connecting to Ochin Data.ipynb will walk you through the following steps:

  1. Install necessary drivers and packages

  2. Read and parse your DB credentials

  3. Connect to the database

  4. Query the database

    1. See the Example Queries if needed for SQL assistance

  5. Save a chosen subset of data to your studies folder for future analysis

Last updated