This repository was derived from a template repository located at https://github.com/cytomining/profiling-template. The purpose of the repository is to weld together a versioned data processing pipeline with versioned processed output data for a single image-based profiling experiment.
(Derived from this template)
DO THE FOLLOWING AFTER GENERATING A NEW REPO:
- Change the title following the format: "Image-based Profiling for [ProjectName]"
- Fill in all sections below the marker with your project's information
- Either fill out the metadata template or link to your external tracking system
- Note that all discussions related to this dataset should happen in the corresponding GitHub issues
- Delete the instructions (from the top of this document to the marker line below so that the README starts with "Image-based Profiling for [ProjectName]"
To correctly initialize the repository, we need to perform several manual steps.
By spinning up a new repo using this repo as a template, you will retain all code, configuration files, computational environments, and directory structure that a standard image-based profiling workflow expects and produces.
We first want to fork the official profiling recipe located at https://github.com/cytomining/profiling-recipe.
- Result: The fork creates a copy of a recipe repository.
- Goals: 1) Remove the connection to official recipe updates to avoid unintended weld versioning reversal; 2) Enable independent updates to fork code that does not impact official recipe.
- Execution: See forking instructions.
Next, we will create a submodule in this repo.
- Result: Adding a submodule initiates the weld.
- Goals: 1) Link the processing code (recipe) with the data (current repo); 2) Require a manual step to update the recipe to enable asynchronous development.
- Execution: See below
# In your terminal, clone the repository you just created (THIS REPO)
USER="INSERT-USERNAME-HERE"
REPO="INSERT-NAME-HERE"
git clone [email protected]:$USER/$REPO.git
# Navigate to this directory
cd $REPO
# Add the recipe submodule
git submodule add https://github.com/$USER/profiling-recipe.git profiling-recipe
Refer to "Adding a submodule" for more details.
Lastly, we will commit the submodule to github.
- Result: Committing this change finalizes the weld.
- Goals: 1) Track the submodule (recipe) version with the current repository.
- Execution: See below
# Add, commit, and push the submodule contents
git add profiling-recipe
git add .gitmodules
git commit -m 'finalizing the recipe weld'
git push
DELETE EVERYTHING ABOVE THIS LINE AND START WITH THE CONTENT BELOW
[Brief description of the experiment/dataset (2-3 sentences)]
- Start Date: YYYY_MM_DD (of the first batch)
- Related data repos: [Links to related data repositories]
- Metadata Location: [Link to external metadata tracking system, if applicable]
- Analysis Repo: [Link to associated analysis repositories]
Note: All discussions related to this dataset should happen in the GitHub issues of this repository.
- Pipeline Modifications: [Any changes from standard pipeline]
- Other notes:
- [Any other notes related to processing]
Below is the template for tracking experimental metadata. Fill this out if not using an external tracking system.
Metadata Template
Cell type : _______ (ex: U2OS)
Cell source: ________ (ex: Collab lab) (ex: GPP)
Plate size : _______ (ex: 384)
Plate brand : _______ (ex: Cell carrier Ultra)
Cell densit(y/ies) : _______ (ex: 2K/well) (ex: Columns A-D 1K/well, Columns G-H 500/well)
Type of perturbation : ___________ (ex: Gene overexpression) (ex: CRISPR KO + compounds)
If (at least partially) genetic:
Genetic introduction method : __________ (ex : lentiviral transduction)
Selection method : _________ (ex: None) (ex: Puromycin 1ug/mL 24 hrs)
Number of genes : __________ (ex: 384)
Number of perturbations per gene : __________ (ex: N/A) (ex: 4 guides per gene)
If (at least partially) chemical :
Number of chemicals : __________ (ex: 384)
Number of dose points per chemical : _________ (ex: 1)
Number of replicates per perturbation : __________ (ex: 5)
Perturbation time point : _____________ (ex: 72 hrs)
Staining protocol : ____________ (ex: CellPainting v3 (LINK)) (ex: LipocytePainting (CITATION)) (ex: 1:500 gt anti tubulin (cat #), 1:1000 A488 anti gt (cat #))
Microscope : ________ (ex: Opera Phenix )
Mode : ________ (ex: Confocal)
Excitation / emission details : ______________ (ex: ex 488 laser, em 550/50; ex 568 laser, em 600/30) (ex: see Index.idx.xml file)
Objective : _____________ (ex: 20X water 1.0NA)
Binning : ____________ (ex: 1x1)
Sites per well : __________ (ex: 9)
Pixel size : ____________ (ex: 0.656um)
Number of Z planes : _______ (ex: 3)
Z plane spacing : ________ (ex: 1um)
git clone [email protected]:<org>/<repo>.git
cd <repo>
dvc pull
The DVC cache is typically stored in an AWS S3 bucket, so you will need run aws configure
before running dvc pull
.
If the DVC location is not publicly accessible, you will need AWS credentials to access it.
If the DVC location is not publicly accessible, to access the files stored via DVC, you will need to created a IAM user with the AmazonS3ReadOnlyAccess
policy attached:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:List*",
"s3-object-lambda:Get*",
"s3-object-lambda:List*"
],
"Resource": "*"
}
]
}