Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LSI #552

Merged
merged 35 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
b4a87a0
Add LSI
SarahOuologuem Sep 13, 2023
033c2c0
Small fixes
SarahOuologuem Sep 28, 2023
95fb617
Add check
SarahOuologuem Sep 28, 2023
c684f32
Add tests
SarahOuologuem Oct 12, 2023
a4cb6cd
Merge branch 'openpipelines-bio:main' into feature/lsi
SarahOuologuem Oct 12, 2023
35efd42
Small fixes
SarahOuologuem Oct 12, 2023
6df6d16
Merge remote-tracking branch 'origin/feature/lsi' into feature/lsi
SarahOuologuem Oct 12, 2023
5b91750
Add LSI component
SarahOuologuem Oct 12, 2023
a5e91c9
Fix typo: obs -> obsm
VladimirShitov Oct 23, 2023
1e74979
Merge remote-tracking branch 'upstream/main' into feature/lsi
rcannood Nov 17, 2023
19d5e85
Merge branch 'openpipelines-bio:main' into feature/lsi
SarahOuologuem Nov 30, 2023
a5fbb3d
Use pre-existing test data
SarahOuologuem Nov 30, 2023
07c478e
Merge branch 'feature/lsi' of https://github.com/SarahOuologuem/openp…
SarahOuologuem Nov 30, 2023
4b6b8da
Add author yaml
SarahOuologuem Nov 30, 2023
e0f681d
fix yaml
rcannood Jan 9, 2024
1703bf9
Merge remote-tracking branch 'upstream/main' into feature/lsi
rcannood Jan 9, 2024
ee9d22c
Add a layer and HVGs to the file to fix tests
VladimirShitov Aug 27, 2024
3b8ec87
Add missing dependencies for h5py
VladimirShitov Aug 27, 2024
23458b0
Merge remote-tracking branch 'upstream/main' into feature/lsi
DriesSchaumont Aug 30, 2024
e54481e
Update for viash 0.9.0-RC7
DriesSchaumont Aug 30, 2024
a3a1c1a
Merge test setup instead of manual writing
VladimirShitov Sep 17, 2024
5822a35
Import setting up logger
VladimirShitov Sep 17, 2024
97c06e1
Add myself to contributors
VladimirShitov Sep 17, 2024
07bc90c
Add min to the number of components
VladimirShitov Sep 17, 2024
5827b55
Change the default of varm_output to lowercase "lsi"
VladimirShitov Sep 17, 2024
fe1759b
Add PR number for the LSI component
VladimirShitov Sep 17, 2024
8df14c7
Use `run_component` to run tests
VladimirShitov Sep 17, 2024
8fcb746
Fix indentation
VladimirShitov Sep 17, 2024
5105789
Change "LSI" in varm to lowercase acc to the new default
VladimirShitov Sep 17, 2024
1731ff5
Fix indentation
VladimirShitov Sep 17, 2024
214833d
Use `run_component` instead of subprocess for all tests
VladimirShitov Sep 17, 2024
5bcd79c
Update muon
VladimirShitov Sep 17, 2024
b598e19
Update python to 3.11
VladimirShitov Sep 17, 2024
c5ad8d3
Update CHANGELOG
DriesSchaumont Sep 25, 2024
2870f95
Merge branch 'main' into feature/lsi
DriesSchaumont Sep 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -894,6 +894,7 @@ on each modality instead of on the whole `MuData` object.
## NEW FUNCTIONALITY

* Added `from_10x_to_h5ad` and `download_10x_dataset` components.
* Added `dimred/lsi` component
VladimirShitov marked this conversation as resolved.
Show resolved Hide resolved

## MINOR CHANGES
* Workflow `bd_rhapsody_wta`: Minor change to workflow to allow for easy processing of multiple samples with a tsv.
Expand Down
117 changes: 117 additions & 0 deletions src/dimred/lsi/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
functionality:
name: lsi
namespace: "dimred"
description: |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SarahOuologuem , do you also want to put yourself in authors? :) You can find an example here:

- __merge__: /src/authors/dries_de_maeyer.yaml

Runs Latent Semantic Indexing. Computes cell embeddings, feature loadings and singular values. Uses the implementation of scipy.

argument_groups:
- name: Inputs
arguments:
- name: "--input"
alternatives: ["-i"]
type: file
description: Path to input h5mu file
direction: input
required: true
example: input.h5mu

- name: "--modality"
type: string
default: "atac"
description: On which modality to run LSI on.
required: false

- name: "--layer"
type: string
description: Use specified layer for expression values. If not specified, uses adata.X.
required: false

- name: "--var_input"
type: string
description: Column name in .var matrix that will be used to select which genes to run the LSI on. If not specified, uses all features.
required: false

- name: LSI options
arguments:
- name: "--num_components"
type: integer
default: 50
description: Number of components to compute.
required: false

- name: "--scale_embeddings"
type: boolean
default: true
description: Scale embeddings to zero mean and unit variance.

- name: Outputs
arguments:
- name: "--output"
alternatives: ["-o"]
type: file
description: Output h5mu file.
direction: output
required: true
example: output.h5mu

- name: "--output_compression"
type: string
default: "gzip"
description: The compression format to be used on the output h5mu object.
choices: ["gzip", "lzf"]
required: false

- name: "--obsm_output"
type: string
default: "X_lsi"
description: In which .obsm slot to store the resulting embedding.
required: false

- name: "--varm_output"
type: string
default: "LSI"
description: In which .varm slot to store the resulting loadings matrix.
required: false

- name: "--uns_output"
type: string
default: "lsi"
description: In which .uns slot to store the stdev.
required: false

- name: "--overwrite"
type: boolean_true
description: Allow overwriting .obsm, .varm and .uns slots.


resources:
- type: python_script
path: script.py
- path: ../../utils/subset_vars.py
test_resources:
- type: python_script
path: test.py
- path: ../../utils/subset_vars.py
- path: ../../../resources_test/LSI

platforms:
- type: docker
image: python:3.9-slim
VladimirShitov marked this conversation as resolved.
Show resolved Hide resolved
setup:
- type: apt
packages:
- procps
- type: python
__merge__: [../../../src/base/requirements/anndata_mudata.yaml, .]
packages:
- muon~=0.1.5
VladimirShitov marked this conversation as resolved.
Show resolved Hide resolved
test_setup:
- type: python
__merge__: [../../../src/base/requirements/anndata_mudata.yaml, .]
packages:
- pytest
- type: nextflow
directives:
label:
- highcpu
- highmem
108 changes: 108 additions & 0 deletions src/dimred/lsi/script.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
import muon as mu
import mudata as md
from anndata import AnnData
import numpy as np
import sys


## VIASH START
par = {
"num_components": 50, # number of components to calculate with SVD
"scale_embeddings": True, # scale embeddings to zero mean and unit variance
"modality": "atac", # on which modality the LSI should be run
"layer": None, # on which layer to run the LSI, if None, will run it on anndata.X
"var_input": None, # column in anndata.var of the highly variable features

"overwrite": True,
"obsm_output": "X_lsi",
"varm_output": "LSI",
"uns_output": "lsi",
"output": "output.h5mu",
"output_compression": "gzip"
}
## VIASH END


sys.path.append(meta["resources_dir"])
from subset_vars import subset_vars


# START TEMPORARY WORKAROUND setup_logger
# reason: resources aren't available when using Nextflow fusion
# from setup_logger import setup_logger
def setup_logger():
import logging
from sys import stdout

logger = logging.getLogger()
logger.setLevel(logging.INFO)
console_handler = logging.StreamHandler(stdout)
logFormatter = logging.Formatter("%(asctime)s %(levelname)-8s %(message)s")
console_handler.setFormatter(logFormatter)
logger.addHandler(console_handler)

return logger
# END TEMPORARY WORKAROUND setup_logger
logger = setup_logger()
SarahOuologuem marked this conversation as resolved.
Show resolved Hide resolved


#1.read in mudata
logger.info("Reading %s.", par["input"])
mdata = md.read_h5mu(par["input"])

#2. subset on modality
if par["modality"] not in mdata.mod:
raise ValueError(f"Modality '{par['modality']}' was not found in mudata {par['input']}.")
adata = mdata.mod[par['modality']]


#3. Specify layer
if par['layer'] and par["layer"] not in adata.layers:
raise ValueError(f"Layer '{par['layer']}' was not found in modality '{par['modality']}'.")
layer = adata.X if not par['layer'] else adata.layers[par['layer']]
adata_input_layer = AnnData(layer, var=adata.var)


if not par["layer"]:
logger.info("Using modality '%s' and adata.X for LSI computation", par['modality'])
else:
logger.info("Using modality '%s' and layer '%s' for LSI computation", par['modality'], par["layer"])


#4. Subset on highly variable features if applicable
if par["var_input"]:
adata_input_layer = subset_vars(adata_input_layer, par["var_input"])



#5. Run LSI
logger.info("Computing %s LSI components on %s features", par["num_components"], adata_input_layer.X.shape[1])
mu.atac.tl.lsi(adata_input_layer, scale_embeddings = par["scale_embeddings"], n_comps = par["num_components"])



#6. Store output in object
check_exist_dict = {
"obsm_output": ("obs"),
VladimirShitov marked this conversation as resolved.
Show resolved Hide resolved
"varm_output": ("varm"),
"uns_output": ("uns")
}
for parameter_name, field in check_exist_dict.items():
if par[parameter_name] in getattr(adata, field):
if not par["overwrite"]:
raise ValueError(f"Requested to create field {par[parameter_name]} in .{field} "
DriesSchaumont marked this conversation as resolved.
Show resolved Hide resolved
f"for modality {par['modality']}, but field already exists.")
del getattr(adata, field)[par[parameter_name]]
SarahOuologuem marked this conversation as resolved.
Show resolved Hide resolved

adata.obsm[par["obsm_output"]] = adata_input_layer.obsm['X_lsi']
adata.uns[par["uns_output"]] = adata_input_layer.uns['lsi']
SarahOuologuem marked this conversation as resolved.
Show resolved Hide resolved
if par["var_input"]:
adata.varm[par["varm_output"]] = np.zeros(shape=(adata.n_vars, adata_input_layer.varm["LSI"].shape[1]))
adata.varm[par["varm_output"]][adata.var[par["var_input"]]] = adata_input_layer.varm['LSI']
SarahOuologuem marked this conversation as resolved.
Show resolved Hide resolved
else:
adata.varm[par["varm_output"]] = adata_input_layer.varm['LSI']

logger.info("Writing to %s.", par["output"])
mdata.write(filename = par["output"], compression=par["output_compression"])

logger.info("Finished")
Loading
Loading