Add LSI #552

SarahOuologuem · 2023-09-13T12:13:33Z

Changelog

Added LSI component

Issue ticket number and link

Checklist before requesting a review

VladimirShitov

Thank you! Overall, it looks good, but I left a few minor comments. Also, could you please add tests? I would test that:

All output keys exist after running the component
The varm contains correct information, when genes are subsetted
The --overwrite flag works correctly

src/dimred/lsi/config.vsh.yaml

src/dimred/lsi/script.py

DriesSchaumont · 2023-09-28T12:45:28Z

Hi @SarahOuologuem thanks for opening this PR and thanks @VladimirShitov for the review! I read through it and left some comments with thoughts on some of the conversations. Let me know if I can be of more help to keep this PR moving forward!

DriesSchaumont · 2023-10-12T14:02:47Z

Hi @SarahOuologuem I noticed that you implemented tests, which is really great! Currently, the test data was not uploaded into our test s3 bucket. Could you provide me with a link so that I can download the data (assuming it is public)? I will put it in our bucket. Otherwise, I think we could quickly connect on slack. Thanks :)

VladimirShitov

Looks great to me. I love the tests! Waiting for them to pass, and I believe, it can be merged

VladimirShitov · 2023-10-23T13:32:07Z

src/dimred/lsi/config.vsh.yaml

+functionality:
+  name: lsi
+  namespace: "dimred"
+  description: |


@SarahOuologuem , do you also want to put yourself in authors? :) You can find an example here:

openpipeline/src/neighbors/find_neighbors/config.vsh.yaml

Line 9 in b41a658

- __merge__: /src/authors/dries_de_maeyer.yaml

src/dimred/lsi/script.py

src/dimred/lsi/test.py

VladimirShitov · 2023-10-23T13:49:47Z

A general recommendation: it would be great to have more descriptive commit comments. For example "Change tabulation" or "Remove spaces" instead of "Small fixes". It would allow to quickly understand what happened without diving deeper in the code

src/dimred/lsi/test.py

…ipeline into feature/lsi

rcannood · 2024-01-09T08:33:02Z

Hi Sarah!

Have you tried running viash test src/dimred/lsi/config.vsh.yaml?

I get:


=================================== FAILURES ===================================
______________________ test_select_highly_variable_column ______________________

tmp_path = PosixPath('/tmp/pytest-of-root/pytest-0/test_select_highly_variable_co0')

    def test_select_highly_variable_column(tmp_path):
        output_path = tmp_path / "output_lsi.h5mu"
    
        # run component
        cmd_args = [
        meta["executable"],
         "--input", str(input_path),
         "--output", str(output_path),
         "--var_input", "highly_variable"
        ]
>       subprocess.run(cmd_args, check=True)

tmp/viash-run-lsi-UhQpoQ.py:81: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = (['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_...5mu', '--output', '/tmp/pytest-of-root/pytest-0/test_select_highly_variable_co0/output_lsi.h5mu', '--var_input', ...],)
kwargs = {}
process = <Popen: returncode: 1 args: ['/viash_automount/tmp/viash_test_lsi35782840702...>
stdout = None, stderr = None, retcode = 1

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.
    
        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.
    
        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.
    
        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.
    
        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.
    
        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.
    
        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE
    
        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE
    
        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu', '--output', '/tmp/pytest-of-root/pytest-0/test_select_highly_variable_co0/output_lsi.h5mu', '--var_input', 'highly_variable']' returned non-zero exit status 1.

/usr/local/lib/python3.9/subprocess.py:528: CalledProcessError
----------------------------- Captured stdout call -----------------------------
2024-01-09 08:26:17,969 INFO     Reading /viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu.
2024-01-09 08:26:18,700 INFO     Using modality 'atac' and adata.X for LSI computation
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
  File "/tmp/viash-run-lsi-NUVvcM.py", line 93, in <module>
    adata_input_layer = subset_vars(adata_input_layer, par["var_input"])
  File "/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/subset_vars.py", line 17, in subset_vars
    raise ValueError(f"Requested to use .var column '{subset_col}' as a selection of genes, but the column is not available.")
ValueError: Requested to use .var column 'highly_variable' as a selection of genes, but the column is not available.
__________________________ test_selecting_input_layer __________________________

tmp_path = PosixPath('/tmp/pytest-of-root/pytest-0/test_selecting_input_layer0')

    def test_selecting_input_layer(tmp_path):
        output_path = tmp_path / "output_lsi.h5mu"
    
        # run component
        cmd_args = [
            meta["executable"],
            "--input", str(input_path),
            "--output", str(output_path),
            "--num_components", "20",
            "--layer", "counts"
            ]
>       subprocess.run(cmd_args, check=True)

tmp/viash-run-lsi-UhQpoQ.py:136: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = (['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_...mu', '--output', '/tmp/pytest-of-root/pytest-0/test_selecting_input_layer0/output_lsi.h5mu', '--num_components', ...],)
kwargs = {}
process = <Popen: returncode: 1 args: ['/viash_automount/tmp/viash_test_lsi35782840702...>
stdout = None, stderr = None, retcode = 1

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.
    
        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.
    
        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.
    
        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.
    
        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.
    
        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.
    
        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE
    
        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE
    
        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu', '--output', '/tmp/pytest-of-root/pytest-0/test_selecting_input_layer0/output_lsi.h5mu', '--num_components', '20', '--layer', 'counts']' returned non-zero exit status 1.

/usr/local/lib/python3.9/subprocess.py:528: CalledProcessError
----------------------------- Captured stdout call -----------------------------
2024-01-09 08:26:34,271 INFO     Reading /viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu.
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
  File "/tmp/viash-run-lsi-tOA0u2.py", line 80, in <module>
    raise ValueError(f"Layer '{par['layer']}' was not found in modality '{par['modality']}'.")
ValueError: Layer 'counts' was not found in modality 'atac'.
=============================== warnings summary ===============================
tmp/viash-run-lsi-UhQpoQ.py::test_lsi
tmp/viash-run-lsi-UhQpoQ.py::test_lsi
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_raises
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_raises
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
  /usr/local/lib/python3.9/site-packages/anndata/_core/anndata.py:453: PendingDeprecationWarning: The dtype argument will be deprecated in anndata 0.10.0
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tmp/viash-run-lsi-UhQpoQ.py::test_select_highly_variable_column - subp...
FAILED tmp/viash-run-lsi-UhQpoQ.py::test_selecting_input_layer - subprocess.C...
============= 2 failed, 6 passed, 8 warnings in 117.84s (0:01:57) ==============
====================================================================
ERROR! Only 0 out of 1 test scripts succeeded!
Unexpected error occurred! If you think this is a bug, please post
create an issue at https://github.com/viash-io/viash/issues containing
a reproducible example and the stack trace below.

Does the same error show up when you run it locally?

rcannood · 2024-01-19T08:34:37Z

Hi Sarah! Just checking in with this PR. When would you have some time to look at the issue I posted?

SarahOuologuem · 2024-01-26T09:31:40Z

sorry for the very late reply! yes, the errors make sense, haven't checked the new test data, i only ran the tests on my old test data.
i'm currently drowning in work, especially because of exam season. please feel free to correct it yourself to speed up the process! so sorry! can't really say when i will have time to resolve the issue myself

VladimirShitov · 2024-01-30T11:15:09Z

I can take it over :) When I'll swim out of other work as well...

VladimirShitov · 2024-08-27T18:00:29Z

@rcannood , @DriesSchaumont , I fixed the test, so it should be ready for merging or the final review :)

CHANGELOG.md

src/dimred/lsi/config.vsh.yaml

DriesSchaumont · 2024-08-30T08:54:18Z

src/dimred/lsi/config.vsh.yaml

+        required: false
+
+      - name: "--scale_embeddings"
+        type: boolean


I prefer boolean_false or boolean_true over just boolean. Could you check if one of these is appropriate?

Thanks! I considered it, but here is the problem. I want to set the default to "True". To do that, we could use boolean_false. But then the meaning of the argument has to be inverted to something like not_scale_embeddings. I find it rather confusing. I'd leave it as is but I'm open to discussion :)

src/dimred/lsi/config.vsh.yaml

src/dimred/lsi/test.py

Co-authored-by: Dries Schaumont <[email protected]>

DriesSchaumont

LGTM! Thanks @VladimirShitov @SarahOuologuem

Co-authored-by: Vladimir Shitov <[email protected]>

Add LSI

b4a87a0

VladimirShitov reviewed Sep 18, 2023

View reviewed changes

SarahOuologuem and others added 7 commits September 28, 2023 15:24

Small fixes

033c2c0

Add check

95fb617

Add tests

c684f32

Merge branch 'openpipelines-bio:main' into feature/lsi

a4cb6cd

Small fixes

35efd42

Merge remote-tracking branch 'origin/feature/lsi' into feature/lsi

6df6d16

Add LSI component

5b91750

SarahOuologuem requested review from VladimirShitov and DriesSchaumont October 12, 2023 12:12

VladimirShitov approved these changes Oct 23, 2023

View reviewed changes

Fix typo: obs -> obsm

a5e91c9

Merge remote-tracking branch 'upstream/main' into feature/lsi

1e74979

rcannood requested changes Nov 24, 2023

View reviewed changes

src/dimred/lsi/test.py Outdated Show resolved Hide resolved

SarahOuologuem and others added 4 commits November 30, 2023 10:27

Merge branch 'openpipelines-bio:main' into feature/lsi

19d5e85

Use pre-existing test data

a5fbb3d

Merge branch 'feature/lsi' of https://github.com/SarahOuologuem/openp…

07c478e

…ipeline into feature/lsi

Add author yaml

4b6b8da

SarahOuologuem requested a review from rcannood December 5, 2023 12:37

fix yaml

e0f681d

Merge remote-tracking branch 'upstream/main' into feature/lsi

1703bf9

DriesSchaumont marked this pull request as draft August 12, 2024 11:25

VladimirShitov added 2 commits August 27, 2024 19:44

Add a layer and HVGs to the file to fix tests

ee9d22c

Add missing dependencies for h5py

3b8ec87

VladimirShitov marked this pull request as ready for review August 27, 2024 17:58

DriesSchaumont added 2 commits August 30, 2024 07:17

Merge remote-tracking branch 'upstream/main' into feature/lsi

23458b0

Update for viash 0.9.0-RC7

e54481e

DriesSchaumont requested changes Aug 30, 2024

View reviewed changes

VladimirShitov and others added 14 commits September 17, 2024 11:18

Merge test setup instead of manual writing

a3a1c1a

Co-authored-by: Dries Schaumont <[email protected]>

Import setting up logger

5822a35

Co-authored-by: Dries Schaumont <[email protected]>

Add myself to contributors

97c06e1

Add min to the number of components

07bc90c

Change the default of varm_output to lowercase "lsi"

5827b55

Add PR number for the LSI component

fe1759b

Use run_component to run tests

8df14c7

Co-authored-by: Dries Schaumont <[email protected]>

Fix indentation

8fcb746

Co-authored-by: Dries Schaumont <[email protected]>

Change "LSI" in varm to lowercase acc to the new default

5105789

Fix indentation

1731ff5

Use run_component instead of subprocess for all tests

214833d

Update muon

5bcd79c

Co-authored-by: Dries Schaumont <[email protected]>

Update python to 3.11

b598e19

Co-authored-by: Dries Schaumont <[email protected]>

Update CHANGELOG

c5ad8d3

DriesSchaumont approved these changes Sep 25, 2024

View reviewed changes

Merge branch 'main' into feature/lsi

2870f95

DriesSchaumont merged commit 1ea6a6c into openpipelines-bio:main Sep 25, 2024
1 check passed

dorien-er pushed a commit that referenced this pull request Nov 18, 2024

Add LSI (#552)

4563934

Co-authored-by: Vladimir Shitov <[email protected]>

dorien-er pushed a commit that referenced this pull request Nov 18, 2024

Add LSI (#552)

dc38e15

Co-authored-by: Vladimir Shitov <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LSI #552

Add LSI #552

SarahOuologuem commented Sep 13, 2023 •

edited

Loading

VladimirShitov left a comment

DriesSchaumont commented Sep 28, 2023

DriesSchaumont commented Oct 12, 2023 •

edited

Loading

VladimirShitov left a comment

VladimirShitov Oct 23, 2023

VladimirShitov commented Oct 23, 2023

rcannood commented Jan 9, 2024

rcannood commented Jan 19, 2024

SarahOuologuem commented Jan 26, 2024

VladimirShitov commented Jan 30, 2024

VladimirShitov commented Aug 27, 2024

DriesSchaumont Aug 30, 2024

VladimirShitov Sep 17, 2024

DriesSchaumont left a comment

Add LSI #552

Add LSI #552

Conversation

SarahOuologuem commented Sep 13, 2023 • edited Loading

Changelog

Issue ticket number and link

Checklist before requesting a review

VladimirShitov left a comment

Choose a reason for hiding this comment

DriesSchaumont commented Sep 28, 2023

DriesSchaumont commented Oct 12, 2023 • edited Loading

VladimirShitov left a comment

Choose a reason for hiding this comment

VladimirShitov Oct 23, 2023

Choose a reason for hiding this comment

VladimirShitov commented Oct 23, 2023

rcannood commented Jan 9, 2024

rcannood commented Jan 19, 2024

SarahOuologuem commented Jan 26, 2024

VladimirShitov commented Jan 30, 2024

VladimirShitov commented Aug 27, 2024

DriesSchaumont Aug 30, 2024

Choose a reason for hiding this comment

VladimirShitov Sep 17, 2024

Choose a reason for hiding this comment

DriesSchaumont left a comment

Choose a reason for hiding this comment

SarahOuologuem commented Sep 13, 2023 •

edited

Loading

DriesSchaumont commented Oct 12, 2023 •

edited

Loading