Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: #371 #372

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions besca/Import/_read.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhoellbacher I think we want to keep the cache behavior in Besca, but not in the pipeline. I would suggest to add "cache" as an additional parameter to the "read_mtx" function, and change this parameter in the pipeline, but keep the default in Besca as True. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.
Let me know what you think of the drafted code changes.

Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ def assert_adata(adata: AnnData, attempFix=True):


def read_mtx(
filepath, annotation=True, use_genes="SYMBOL", species="human", citeseq=None
filepath, annotation=True, use_genes="SYMBOL", species="human", citeseq=None, read_cache=True
):
"""Read matrix.mtx, genes.tsv, barcodes.tsv to AnnData object.
By specifiying an input folder this function reads the contained matrix.mtx,
Expand All @@ -129,6 +129,9 @@ def read_mtx(
citeseq: 'gex_only' or 'citeseq_only' or False or None | default = None
string indicating if only gene expression values (gex_only) or only protein
expression values ('citeseq_only') or everything is read if None is specified
read_cache: `bool` (default=True)
boolian identifier if scanpy should read the AnnData object from fast h5ad
cache or from source

Returns
-------
Expand All @@ -138,7 +141,7 @@ def read_mtx(
if gzfiles == "gz":
print("reading matrix.mtx.gz")
adata = read(
os.path.join(filepath, "matrix.mtx.gz"), cache=True
os.path.join(filepath, "matrix.mtx.gz"), cache=read_cache
).T # transpose the data
print("adding cell barcodes")
adata.obs_names = pd.read_csv(
Expand All @@ -155,7 +158,7 @@ def read_mtx(
else:
print("reading matrix.mtx")
adata = read(
os.path.join(filepath, "matrix.mtx"), cache=True
os.path.join(filepath, "matrix.mtx"), cache=read_cache
).T # transpose the data
print("adding cell barcodes")
adata.obs_names = pd.read_csv(
Expand Down
4 changes: 4 additions & 0 deletions besca/pp/_normalization.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import numpy as np
from scipy.sparse.csr import csr_matrix
from scipy.sparse._csc import csc_matrix
from anndata._core.views import SparseCSRView

def closure(mat):
Expand Down Expand Up @@ -180,6 +181,9 @@ def normalize_geometric(adata):
# need to add a catch for newly encountered datatype
elif type(X) == SparseCSRView:
X = X.todense()
# need to add a catch for new sparse matrix datatype
elif type(X) == csc_matrix:
X = X.todense()

# ensure that X is an array otherwise this will cause type issue with multiplicative replacement function
X = np.array(X)
Expand Down
2 changes: 1 addition & 1 deletion workbooks/standard_workflow_besca2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -760,7 +760,7 @@
" n_prots = len(adata_prot.var_names)\n",
" percent_top = (int(round(0.01*n_prots, 0)) if int(round(0.01*n_prots, 0)) >= 1 else 1, int(round(0.1*n_prots, 0)), int(round(0.25*n_prots, 0)))\n",
" qc_adata = sc.pp.calculate_qc_metrics(adata_prot, percent_top=percent_top, var_type=\"antibodies\", inplace=False)\n",
" fig = sns.jointplot(\"log1p_total_counts\", \"n_antibodies_by_counts\", qc_adata[0], kind=\"hex\", norm=mpl.colors.LogNorm())\n",
" fig = sns.jointplot(x=\"log1p_total_counts\", y=\"n_antibodies_by_counts\", data=qc_adata[0], kind=\"hex\", norm=mpl.colors.LogNorm())\n",
" fig.savefig(os.path.join(results_folder_citeseq, 'citeseq', 'figures', 'CITESEQ_QC_plot.png'))\n",
" \n",
" #generate overview of n_counts\n",
Expand Down