Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memento #918

Draft
wants to merge 83 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
38c6dfd
Initial revision
atolopko-czi Dec 6, 2023
8c9ac60
python compatibility fix
atolopko-czi Dec 6, 2023
5025ab3
doc updates
atolopko-czi Dec 6, 2023
5eef133
estimators cube schema improvements
atolopko-czi Dec 11, 2023
66f87eb
estimators cube schema improvements
atolopko-czi Dec 11, 2023
6281a9b
data fixes
atolopko-czi Dec 14, 2023
a0c5bc1
memento cube builder validation
atolopko-czi Dec 15, 2023
e6857fa
simplify joins memento cube builder
atolopko-czi Dec 15, 2023
02aa6c5
diff exp fixes
atolopko-czi Dec 15, 2023
c447dae
parameterize census_fixture.py script
atolopko-czi Dec 15, 2023
84f1bc2
Merge branch 'atol/memento/877-cube-schema-updates' into atol/memento…
atolopko-czi Dec 15, 2023
ae091b2
cube builder regression test
atolopko-czi Dec 15, 2023
d30da8c
Fix size_factors join
atolopko-czi Dec 15, 2023
b5d9a6d
log estimator computation anomalies as debug level
atolopko-czi Dec 17, 2023
592ed54
add missing pip requirement to install notes
atolopko-czi Dec 17, 2023
088ddd9
comments
atolopko-czi Dec 17, 2023
7b39961
TODO
atolopko-czi Dec 18, 2023
66c836d
optimize dense data creation
atolopko-czi Dec 19, 2023
0dda902
pandas memory optimization
atolopko-czi Dec 20, 2023
5eadce8
remove memento cube data existence checks
atolopko-czi Dec 21, 2023
e08daaf
Write memento cube batches within each child process
atolopko-czi Dec 21, 2023
c54aaee
Use obs.raw_sum instead of computing equivalent value
atolopko-czi Dec 21, 2023
bd14204
log progress and estimated time to completion
atolopko-czi Dec 21, 2023
c311ec8
suppress numpy/pandas warnings
atolopko-czi Dec 21, 2023
0ef8e07
add more cube schema tiledb filters
atolopko-czi Jan 2, 2024
4e80b50
normalize obs dims into separate array
atolopko-czi Jan 3, 2024
ec04798
fix overwrite
atolopko-czi Jan 3, 2024
06f8d87
fix diff_expr for new cube structure
atolopko-czi Jan 3, 2024
b4034ae
TODOs
atolopko-czi Jan 4, 2024
a4c134f
Merge branch 'atol/memento/879-move-code-to-census-repo' into atol/me…
atolopko-czi Jan 4, 2024
6aefd86
Merge branch 'atol/memento/877-cube-schema-updates' into atol/memento…
atolopko-czi Jan 4, 2024
6b0b656
optimize dense_gene_data()
atolopko-czi Jan 4, 2024
a6b7408
Merge branch 'atol/memento/880-cube-builder-optimizations' into atol/…
atolopko-czi Jan 4, 2024
f442107
Remove more unused estimators & consolidate post-build
atolopko-czi Jan 5, 2024
ba21c6f
remove improper dependency
atolopko-czi Jan 5, 2024
799754f
use raw categorical codes in diff_expr cube
atolopko-czi Jan 5, 2024
0018f1f
handle invalid SEM values in the cube
atolopko-czi Jan 5, 2024
cc547a6
fix numpy warning
atolopko-czi Jan 5, 2024
04e2efe
parallelize diff expr
atolopko-czi Jan 8, 2024
fb38ba3
move n_obs from estimators to obs_groups array
atolopko-czi Jan 8, 2024
70a7705
fix input check
atolopko-czi Jan 8, 2024
8611fcc
diff expr optimizations
atolopko-czi Jan 9, 2024
f4bd2a5
lint & typing
atolopko-czi Jan 9, 2024
fa48934
cube validator
atolopko-czi Jan 9, 2024
9da7e42
add profiler
atolopko-czi Jan 10, 2024
4ac996f
factor out methods for clearer profiling output
atolopko-czi Jan 10, 2024
e2ffcd8
factor out methods for clearer profiling output
atolopko-czi Jan 10, 2024
669f6c7
replace cProfile with bespoke timing logic
atolopko-czi Jan 11, 2024
b5ed266
Factor out methods for profiling visibility
atolopko-czi Jan 11, 2024
0eb58b0
Factor out methods for profiling visibility
atolopko-czi Jan 11, 2024
1c735e4
optimizations
atolopko-czi Jan 12, 2024
ff448bb
Add polars library for DataFrame manipulation
atolopko-czi Jan 14, 2024
dfcd883
polars fix
atolopko-czi Jan 15, 2024
590ab21
rm timing output
atolopko-czi Jan 15, 2024
ecdd465
fast LR; disable cprofile
atolopko-czi Jan 15, 2024
a74e11d
cli fix
atolopko-czi Jan 15, 2024
bc51be5
use float32 in cube; migration scripts; cube query notebook
atolopko-czi Jan 16, 2024
88e8da9
add features_ids.json to cube
atolopko-czi Jan 16, 2024
2b8b087
add requirements.txt
atolopko-czi Jan 16, 2024
24a0317
update and move adhoc query notebook
atolopko-czi Jan 16, 2024
78afc39
Merge branch 'atol/memento/epic' into atol/memento/parallelize-diffex…
atolopko-czi Jan 16, 2024
a1446b2
rm feature_ids.json creation
atolopko-czi Jan 16, 2024
0d6fc85
refactor profiling code; requirements updates
atolopko-czi Jan 16, 2024
8f3ed5f
rm sklearnex
atolopko-czi Jan 16, 2024
bf32601
Update README.md
atolopko-czi Jan 18, 2024
50fe649
fix tiledb config option
atolopko-czi Jan 18, 2024
1f01145
rename "pass" to "step" and renumber
atolopko-czi Jan 18, 2024
8cf7d33
fix regression test logic
atolopko-czi Jan 19, 2024
d68565b
fixes & typing
atolopko-czi Jan 19, 2024
2601348
replace print with logging
atolopko-czi Jan 19, 2024
cbc271d
update and enable cube builder regression test
atolopko-czi Jan 19, 2024
861641a
fix design matrix
atolopko-czi Jan 20, 2024
ed4b1d7
TODO, comments
atolopko-czi Jan 23, 2024
0701209
Merge remote-tracking branch 'origin/main' into atol/memento/epic
atolopko-czi Jan 23, 2024
4f225f6
lint
atolopko-czi Jan 23, 2024
55f9c22
Merge branch 'atol/memento/epic' of github.com:chanzuckerberg/cellxge…
atolopko-czi Jan 23, 2024
17d12a3
Merge branch 'main' into atol/memento/epic
prathapsridharan Jan 30, 2024
3493f06
Add vscode launch.json
atolopko-czi Feb 1, 2024
9d54c36
lint
atolopko-czi Feb 1, 2024
4d05055
feat: allow users to select covariates for memento (#963)
atarashansky Feb 6, 2024
e16ffd7
chore: Write unit test for differential expression computation (#976)
prathapsridharan Feb 7, 2024
866eb47
Merge branch 'main' into atol/memento/epic
prathapsridharan Mar 12, 2024
e726acb
Merge branch 'main' into atol/memento/epic
prathapsridharan Mar 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -82,10 +82,14 @@ repos:
exclude: ^tools/(cellxgene_census_builder|census_contrib)
args: ["--config", "./tools/pyproject.toml"]
additional_dependencies:
- attrs
- numpy
- pandas-stubs
- typing_extensions
- types-PyYAML
- pytest
- types-click


- repo: https://github.com/igorshubovych/markdownlint-cli
rev: v0.39.0
Expand Down
72 changes: 72 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
{
"version": "0.2.0",
"configurations": [
{
"name": "Memento (tiny)",
"type": "python",
"request": "launch",
"cwd": "${workspaceFolder}/api/python/cellxgene_census/src/",
"module": "cellxgene_census.experimental.diffexp.memento.diff_expr",
"justMyCode": true,
"args": [
"tissue_general_ontology_term_id in ['UBERON:0002405']",
"sex_ontology_term_id",
"/mnt/census/estimators-cube-70a7705/",
"1",
"1000"],
"subProcess": true
},
{
"name": "Memento (small)",
"type": "python",
"request": "launch",
"cwd": "${workspaceFolder}/api/python/cellxgene_census/src/",
"module": "cellxgene_census.experimental.diffexp.memento.diff_expr",
"justMyCode": true,
"args": [
"tissue_general_ontology_term_id in ['UBERON:0000030', 'UBERON:0000992']",
"tissue_general_ontology_term_id",
"/mnt/census/estimators-cube-70a7705/",
"1",
"1000"],
"subProcess": true
},
{
"name": "Memento (medium)",
"type": "python",
"request": "launch",
"cwd": "${workspaceFolder}/api/python/cellxgene_census/src/",
"module": "cellxgene_census.experimental.diffexp.memento.diff_expr",
"justMyCode": true,
"args": [
"tissue_general_ontology_term_id in ['UBERON:0000948', 'UBERON:0001004']",
"tissue_general_ontology_term_id",
"/mnt/census/estimators-cube-70a7705/",
"1",
"5000"],
"subProcess": true
},
{
"name": "Memento (large)",
"type": "python",
"request": "launch",
"cwd": "${workspaceFolder}/api/python/cellxgene_census/src/",
"module": "cellxgene_census.experimental.diffexp.memento.diff_expr",
"justMyCode": true,
"args": [
"tissue_general_ontology_term_id in ['UBERON:0000948', 'UBERON:0001004']",
"tissue_general_ontology_term_id",
"/mnt/census/estimators-cube-70a7705/",
"1",
""],
"subProcess": true
},
{
"name": "Python: File",
"type": "python",
"request": "launch",
"program": "${file}",
"justMyCode": true
}
]
}
1 change: 1 addition & 0 deletions api/python/cellxgene_census/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ experimental = [
"torchdata~=0.7",
"scikit-learn~=1.0",
"scikit-misc>=0.2", # scikit-misc 0.3 dropped Python 3.8 support
"polars==0.20.4",
"psutil~=5.0",
"datasets~=2.0",
"tdigest~=0.5",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Differential Expression using memento

This directory contains code for a Census-integrated version of the `memento` method for differential expression
analysis, including differential variability and co-expression. The underlying method is described in
the [memento pre-print](https://www.biorxiv.org/content/10.1101/2022.11.09.515836v1).

This implementation relies upon a database of pre-computed estimators that are derived from a given Census data release.
The database is a TileDB array, structured as a multi-dimensional cube. It is built by
the `tools/models/memento/src/estimators_cube_builder/cube_builder.py` script.
Loading
Loading