Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LSI #552

Merged
merged 35 commits into from
Sep 25, 2024
Merged

Add LSI #552

merged 35 commits into from
Sep 25, 2024

Conversation

SarahOuologuem
Copy link
Contributor

@SarahOuologuem SarahOuologuem commented Sep 13, 2023

Changelog

Added LSI component

Issue ticket number and link

#398

Checklist before requesting a review

  • I have performed a self-review of my code

  • Conforms to the Contributor's guide

  • Check the correct box. Does this PR contain:

    • Breaking changes
    • New functionality
    • Major changes
    • Minor changes
    • Documentation
    • Bug fixes
  • Proposed changes are described in the CHANGELOG.md

  • CI tests succeed!

Copy link
Collaborator

@VladimirShitov VladimirShitov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Overall, it looks good, but I left a few minor comments. Also, could you please add tests? I would test that:

  1. All output keys exist after running the component
  2. The varm contains correct information, when genes are subsetted
  3. The --overwrite flag works correctly

src/dimred/lsi/config.vsh.yaml Outdated Show resolved Hide resolved
src/dimred/lsi/config.vsh.yaml Outdated Show resolved Hide resolved
src/dimred/lsi/config.vsh.yaml Outdated Show resolved Hide resolved
src/dimred/lsi/config.vsh.yaml Outdated Show resolved Hide resolved
src/dimred/lsi/script.py Outdated Show resolved Hide resolved
src/dimred/lsi/script.py Show resolved Hide resolved
src/dimred/lsi/script.py Show resolved Hide resolved
src/dimred/lsi/script.py Outdated Show resolved Hide resolved
src/dimred/lsi/script.py Show resolved Hide resolved
src/dimred/lsi/script.py Show resolved Hide resolved
@DriesSchaumont
Copy link
Member

Hi @SarahOuologuem thanks for opening this PR and thanks @VladimirShitov for the review! I read through it and left some comments with thoughts on some of the conversations. Let me know if I can be of more help to keep this PR moving forward!

@DriesSchaumont
Copy link
Member

DriesSchaumont commented Oct 12, 2023

Hi @SarahOuologuem I noticed that you implemented tests, which is really great! Currently, the test data was not uploaded into our test s3 bucket. Could you provide me with a link so that I can download the data (assuming it is public)? I will put it in our bucket. Otherwise, I think we could quickly connect on slack. Thanks :)

Copy link
Collaborator

@VladimirShitov VladimirShitov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. I love the tests! Waiting for them to pass, and I believe, it can be merged

functionality:
name: lsi
namespace: "dimred"
description: |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SarahOuologuem , do you also want to put yourself in authors? :) You can find an example here:

- __merge__: /src/authors/dries_de_maeyer.yaml

src/dimred/lsi/script.py Outdated Show resolved Hide resolved
src/dimred/lsi/test.py Show resolved Hide resolved
@VladimirShitov
Copy link
Collaborator

A general recommendation: it would be great to have more descriptive commit comments. For example "Change tabulation" or "Remove spaces" instead of "Small fixes". It would allow to quickly understand what happened without diving deeper in the code

src/dimred/lsi/test.py Outdated Show resolved Hide resolved
@rcannood
Copy link
Contributor

rcannood commented Jan 9, 2024

Hi Sarah!

Have you tried running viash test src/dimred/lsi/config.vsh.yaml?

I get:


=================================== FAILURES ===================================
______________________ test_select_highly_variable_column ______________________

tmp_path = PosixPath('/tmp/pytest-of-root/pytest-0/test_select_highly_variable_co0')

    def test_select_highly_variable_column(tmp_path):
        output_path = tmp_path / "output_lsi.h5mu"
    
        # run component
        cmd_args = [
        meta["executable"],
         "--input", str(input_path),
         "--output", str(output_path),
         "--var_input", "highly_variable"
        ]
>       subprocess.run(cmd_args, check=True)

tmp/viash-run-lsi-UhQpoQ.py:81: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = (['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_...5mu', '--output', '/tmp/pytest-of-root/pytest-0/test_select_highly_variable_co0/output_lsi.h5mu', '--var_input', ...],)
kwargs = {}
process = <Popen: returncode: 1 args: ['/viash_automount/tmp/viash_test_lsi35782840702...>
stdout = None, stderr = None, retcode = 1

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.
    
        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.
    
        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.
    
        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.
    
        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.
    
        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.
    
        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE
    
        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE
    
        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu', '--output', '/tmp/pytest-of-root/pytest-0/test_select_highly_variable_co0/output_lsi.h5mu', '--var_input', 'highly_variable']' returned non-zero exit status 1.

/usr/local/lib/python3.9/subprocess.py:528: CalledProcessError
----------------------------- Captured stdout call -----------------------------
2024-01-09 08:26:17,969 INFO     Reading /viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu.
2024-01-09 08:26:18,700 INFO     Using modality 'atac' and adata.X for LSI computation
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
  File "/tmp/viash-run-lsi-NUVvcM.py", line 93, in <module>
    adata_input_layer = subset_vars(adata_input_layer, par["var_input"])
  File "/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/subset_vars.py", line 17, in subset_vars
    raise ValueError(f"Requested to use .var column '{subset_col}' as a selection of genes, but the column is not available.")
ValueError: Requested to use .var column 'highly_variable' as a selection of genes, but the column is not available.
__________________________ test_selecting_input_layer __________________________

tmp_path = PosixPath('/tmp/pytest-of-root/pytest-0/test_selecting_input_layer0')

    def test_selecting_input_layer(tmp_path):
        output_path = tmp_path / "output_lsi.h5mu"
    
        # run component
        cmd_args = [
            meta["executable"],
            "--input", str(input_path),
            "--output", str(output_path),
            "--num_components", "20",
            "--layer", "counts"
            ]
>       subprocess.run(cmd_args, check=True)

tmp/viash-run-lsi-UhQpoQ.py:136: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = (['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_...mu', '--output', '/tmp/pytest-of-root/pytest-0/test_selecting_input_layer0/output_lsi.h5mu', '--num_components', ...],)
kwargs = {}
process = <Popen: returncode: 1 args: ['/viash_automount/tmp/viash_test_lsi35782840702...>
stdout = None, stderr = None, retcode = 1

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.
    
        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.
    
        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.
    
        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.
    
        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.
    
        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.
    
        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE
    
        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE
    
        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test/lsi', '--input', '/viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu', '--output', '/tmp/pytest-of-root/pytest-0/test_selecting_input_layer0/output_lsi.h5mu', '--num_components', '20', '--layer', 'counts']' returned non-zero exit status 1.

/usr/local/lib/python3.9/subprocess.py:528: CalledProcessError
----------------------------- Captured stdout call -----------------------------
2024-01-09 08:26:34,271 INFO     Reading /viash_automount/tmp/viash_test_lsi3578284070294657065/test_test//concat_test_data/e18_mouse_brain_fresh_5k_filtered_feature_bc_matrix_subset_unique_obs.h5mu.
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
  File "/tmp/viash-run-lsi-tOA0u2.py", line 80, in <module>
    raise ValueError(f"Layer '{par['layer']}' was not found in modality '{par['modality']}'.")
ValueError: Layer 'counts' was not found in modality 'atac'.
=============================== warnings summary ===============================
tmp/viash-run-lsi-UhQpoQ.py::test_lsi
tmp/viash-run-lsi-UhQpoQ.py::test_lsi
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_raises
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_raises
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
tmp/viash-run-lsi-UhQpoQ.py::test_output_field_already_present_overwrite
  /usr/local/lib/python3.9/site-packages/anndata/_core/anndata.py:453: PendingDeprecationWarning: The dtype argument will be deprecated in anndata 0.10.0
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tmp/viash-run-lsi-UhQpoQ.py::test_select_highly_variable_column - subp...
FAILED tmp/viash-run-lsi-UhQpoQ.py::test_selecting_input_layer - subprocess.C...
============= 2 failed, 6 passed, 8 warnings in 117.84s (0:01:57) ==============
====================================================================
ERROR! Only 0 out of 1 test scripts succeeded!
Unexpected error occurred! If you think this is a bug, please post
create an issue at https://github.com/viash-io/viash/issues containing
a reproducible example and the stack trace below.

Does the same error show up when you run it locally?

@rcannood
Copy link
Contributor

Hi Sarah! Just checking in with this PR. When would you have some time to look at the issue I posted?

@SarahOuologuem
Copy link
Contributor Author

sorry for the very late reply! yes, the errors make sense, haven't checked the new test data, i only ran the tests on my old test data.
i'm currently drowning in work, especially because of exam season. please feel free to correct it yourself to speed up the process! so sorry! can't really say when i will have time to resolve the issue myself

@VladimirShitov
Copy link
Collaborator

I can take it over :) When I'll swim out of other work as well...

@DriesSchaumont DriesSchaumont marked this pull request as draft August 12, 2024 11:25
@VladimirShitov VladimirShitov marked this pull request as ready for review August 27, 2024 17:58
@VladimirShitov
Copy link
Collaborator

@rcannood , @DriesSchaumont , I fixed the test, so it should be ready for merging or the final review :)

CHANGELOG.md Outdated Show resolved Hide resolved
src/dimred/lsi/config.vsh.yaml Show resolved Hide resolved
src/dimred/lsi/config.vsh.yaml Show resolved Hide resolved
required: false

- name: "--scale_embeddings"
type: boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer boolean_false or boolean_true over just boolean. Could you check if one of these is appropriate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I considered it, but here is the problem. I want to set the default to "True". To do that, we could use boolean_false. But then the meaning of the argument has to be inverted to something like not_scale_embeddings. I find it rather confusing. I'd leave it as is but I'm open to discussion :)

src/dimred/lsi/config.vsh.yaml Outdated Show resolved Hide resolved
src/dimred/lsi/config.vsh.yaml Outdated Show resolved Hide resolved
src/dimred/lsi/config.vsh.yaml Show resolved Hide resolved
src/dimred/lsi/test.py Show resolved Hide resolved
src/dimred/lsi/test.py Outdated Show resolved Hide resolved
src/dimred/lsi/test.py Outdated Show resolved Hide resolved
Copy link
Member

@DriesSchaumont DriesSchaumont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DriesSchaumont DriesSchaumont merged commit 1ea6a6c into openpipelines-bio:main Sep 25, 2024
1 check passed
dorien-er pushed a commit that referenced this pull request Nov 18, 2024
Co-authored-by: Vladimir Shitov <[email protected]>
dorien-er pushed a commit that referenced this pull request Nov 18, 2024
Co-authored-by: Vladimir Shitov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants