Leveraging Relational Databases for Spatial Transcriptomics

This is the repository for the 2022-2023 Harvey Mudd Clinic Team in collaboration with Harvard Center for Computational Biomedicine.

Description

At Harvard CCB, researchers are pioneering the study of various biological and spatial genomic datasets using computational methods. These high-resolution biological datasets collected using imaging techniques can be quite large. Most workflows involve mainly Python and R, which cannot be effectively used to analyze such memory-intensive datasets. We aim to leverage relational database queries in SQL to improve scalability, add flexibility to analyze larger datasets, and eventually find additional underlying spatial relationships in the original data.

Getting Started

Dependencies

To run our scripts and follow along with our process, you'll need to have the following installed.

Python
Some Python packages:
- pandas
- tqdm
Azure Data Studio
Git
Docker

Assignment 1

Assignment 1 is an introduction to SQL Server consisting of a Coursera course on Relational Databases and a few corresponding exercises.
For a breakdown of each step in assignment 1, see the assignment 1 README.

Assignment 2

Assignment 2 focuses on a few exercises with queries in SQL Server in order to gain practice in using the tools we learned about in assignment 1. The assignment uses some flight data and asks us to use queries to find information such as which plane logged the most flight miles.
For a breakdown of each step in assignment 2, see the assignment 2 README.

Assignment 3

Assignment 3 consists of two subtasks: the first to read and present on recent reviews in spatially-resolved omics profiling, and the second to practice working with spatial omics data in SQL Server. This repository will focus only on the second subtask.
For a breakdown of each step in this subtask of assignment 3, see the assignment 3 README.

Assignment 4

Assignment 4 serves as a transition into working with spatial data. We are tasked with analyzing two tables: one containing weather data along iwht latitude and longitude of the weather station, and one containing geographical information. Our goal was to answer questions such as the windiest stations in Massachusetts, or the rainiest statin in Washington, by performing spatial intersect queries on the tables.
For a breakdown of each step of assignment 4, see the assignment 4 README.

Assignment 5

Assignment 5 finally brings our attention to spatial transcriptomics data in SQL Server. We are given multiple subtasks, such as creating a new gene-cell-molecule count table, reshaping that table into a gene expression matrix, and creating convex hulls around every molecule in a given cell.
For a breakdown of each step of assignment 5, see the assignment 5 README. You may also follow along in our assignment 5 notebook.

Assignment 6

Assignment 6 is a continuation of the ideas of Assignment 5, but with a significantly larger dataset of tissue images from 26 mice hypothalamuses. This dataset is currently not publically available but was provided for our use. With this larger dataset, we repeated the objectives of Assignment 5 on an institutional computer cluster: we created a molecule count table, and generated convex hulls around molecules belonging to cells in the first z-slice.
For a breakdown of each step of assignment 6, see the assignment 6 README. You may also follow along in our assignment 6 notebook.

Authors

Chris Couto

Alicia Lu

Elizabeth Lucas-Foley

Mads Mansfield

Acknowledgments

Tim Buchheim

Ludwig Geistlinger

Robert Gentleman

Rafael Goncalves

Tyrone Lee

Jeffrey Moffitt

Nathan Palmer

Sunil Poudel

Sam Pullman

Chris Stone

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
assignment_1		assignment_1
assignment_2		assignment_2
assignment_3		assignment_3
assignment_4		assignment_4
assignment_5		assignment_5
assignment_6		assignment_6
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leveraging Relational Databases for Spatial Transcriptomics

Table of Contents

Description

Getting Started

Dependencies

Assignment 1

Assignment 2

Assignment 3

Assignment 4

Assignment 5

Assignment 6

Authors

Acknowledgments

About

Releases

Packages

Contributors 4

Languages

ccb-hms/hmc-clinic-2022-23

Folders and files

Latest commit

History

Repository files navigation

Leveraging Relational Databases for Spatial Transcriptomics

Table of Contents

Description

Getting Started

Dependencies

Assignment 1

Assignment 2

Assignment 3

Assignment 4

Assignment 5

Assignment 6

Authors

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages