Phase 2: Data Exploration
This section covers how to use a SageMaker workspace to access OCHIN data and save your data subsets for later analysis.
Create a SageMaker workspace
Create your workspace
Step 1: Navigate to the Studies page. The organization studies are linked to Amazon S3 secure storage. This means that anything saved in these study folders will be securely saved and accessible through any workspace the study is mounted to.
For more information about studies, view the documentation here.
Step 2: Select the Organizations tab
Step 3:Select the study with your project and user name attached
Step 4: Click Next
Step 5: Select the Sagemaker Notebook as shown below
Step 6: Click Next
Step 7: Enter a name: Any name. Note that the Name can contain only alphanumeric characters (case sensitive) and hyphens. It must start with an alphabetic character and cannot be longer than 128 characters.
No change necessary for the Restricted CIDR field
Step 8: Select the Project Id dropdown
Step 9: Select your AIM AHEAD affiliation: For example Research-Fellowship or Consortium-Development-Project
Step 10: Select a sagemaker-small workspace
Step 11: Enter a Description for your benefit: Any description. Note that the Description must be at least 3 characters.
Step 12: Click Create Research Workspace

Wait for your workspace to become available.
This may take 12-20 minutes.
Once your workspace is listed as AVAILABLE, you can connect to it.

When you are finished working on your workspace for the day, please STOP the workspace to avoid incurring excess cloud costs.
Connecting your project database
Connect to your SageMaker workspace.
Step 1: Click Connections
Step 2: Click Connect. A new tab in your internet browser will open with your SageMaker workspace.
Step 3: In the new window, select the Sagemaker Examples tab at the top of the page.
Step 4: Under the Access to Data and Compute Using Service Workbench section, click the Use button next to one of the example notebooks, such as Connecting to OCHIN DB - R.ipynb.
Step 5: Click Create Copy on the pop-up window. NOTE: This will create a copy of ALL example notebooks listed in the Access to Data and Compute Using Service Workbench section, so you only need to do this action once to get access to all the notebooks.
Step 6: Select the newly created folder Access-to-Data-and-Compute-Using-Service-Workbench
Step 7: Select a notebook you will use to access your data. There is an example provided using R, and one using Python. It is recommended that you choose the programming language you are more comfortable with.

In order to save your code across different Sagemaker workspaces, be sure to save code files to the ~Sagemaker/studies/[YOUR-PROJECT-NAME]/ folder, where your project name is the same as the study you selected at Step 3 of this page.
Using python to access tables in your project database
The Connecting to Ochin Data.ipynb notebook contains all the code needed to connect to the OCHIN Database.
BEFORE BEGINNING WORK: After opening the Connecting to Ochin Data.ipynb notebook, you will want to save a version for yourself, as shown below.



The Connecting to Ochin Data.ipynb will walk you through the following steps:
Install necessary drivers and packages
Read and parse your DB credentials
Connect to the database
Query the database
See the Example Queries if needed for SQL assistance
Save a chosen subset of data to your studies folder for future analysis
Last updated