Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running into OOM on 256 GB RAM #5

Open
catchmoosa opened this issue Oct 23, 2024 · 1 comment
Open

Running into OOM on 256 GB RAM #5

catchmoosa opened this issue Oct 23, 2024 · 1 comment

Comments

@catchmoosa
Copy link

catchmoosa commented Oct 23, 2024

Traceback (most recent call last): File "/teamspace/studios/this_studio/mcgill_fiam/0X-Causal_discovery/discovery.py", line 30, in <module> g_prob = model(x=x) File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/avici/pretrain.py", line 109, in __call__ out = onp.array(out) File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/jax/_src/array.py", line 429, in __array__ return np.asarray(self._value, dtype=dtype, **kwds) File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/jax/_src/profiler.py", line 333, in wrapper return func(*args, **kwargs) File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/jax/_src/array.py", line 628, in _value self._npy_value = self._single_device_array_to_np_array() jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: Buffer Definition Event: Error preparing computation: %sOut of memory allocating 332034480032 bytes.

This is on 10,000 rows with 51 variables. Can you help me with this issue?

@larslorch
Copy link
Owner

It's quite possible that 10,000 rows is simply too large for the forward pass. One idea -- though I've never tried it -- could be to split the rows into smaller chunks and create a bootstrapped estimate of the graph by running several forward passes.

However, it seems that your error occurs here, after the forward pass is already done, can you confirm this?
Maybe call jax.block_until_ready before this line to confirm, see here. In that case I don't currently know what could be the issue and would have to investigate. It would be great if you could provide a minimal example that reproduces this with random synthetic data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants