Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overflow error #31

Open
dawe opened this issue Mar 10, 2020 · 0 comments
Open

Overflow error #31

dawe opened this issue Mar 10, 2020 · 0 comments

Comments

@dawe
Copy link

dawe commented Mar 10, 2020

I'm trying to apply mnn on my data, basically following the README of this project, but

corrected = mnnpy.mnn_correct(*tn5_data_list, batch_categories=batches)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/site-packages/mnnpy/mnn.py", line 126, in mnn_correct
    svd_mode=svd_mode, do_concatenate=do_concatenate, **kwargs)
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/site-packages/mnnpy/mnn.py", line 157, in mnn_correct
    var_subset, n_jobs)
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/site-packages/mnnpy/utils.py", line 54, in transform_input_data
    in_scaling = p_n.map(l2_norm, in_batches)
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
    put(task)
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/dcittaro/miniconda3/envs/default_env/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB

Indeed I'm dealing with a large dataset (3 AnnData with 10,000 cells and 1M features).
A solution would be to start processing data in a standard way (without mnn) to reach the point in which I have identified features that can be retained (in the order of 20k), then restart mnn only on those features, would it work with similar results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant