PyTorch and pandas syntax fixes and expression matrix performance improvement #24
+758
−651
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @changwn, I've been trying to use
scFEA
on some of my datasets, and updated the main modulesrc/scFEA.py
to get it working on my end. Here's a list of the modifications I made:src/
with black. This is a side-effect of how my editor is set up and I was too many edits in to revert it.pandas
andtorch
were raising deprecation warnings on some of the functions/methods used. It might be a good idea to lock the versions in yourrequirements.txt
. I have pandas 1.4.0 and pytorch 1.10.2 in my environment and they don't seem to match your versions.FLUX
class had amatrix
parameter in its constructor method but was never used. I'm assuming yourDataset
class handles the input data already?geneExprDf
was a slow step. You were copyinggeneExpr
intotemp
in each iteration and transposingtemp
, so by simply taking the transpose outside of the loop improves performance by over 50% on your test dataset.You may also be able to avoid the entire loop by constructing
geneExprDf
through a table join ofmoduleGene
andgeneExpr
, followed by pivoting the joined table from long to wide. Not sure about the performance of this method whengeneExpr
is huge though.