ggml_new_tensor_impl: not enough space in the context's memory pool #47

m1chae1bx · 2023-08-13T06:40:34Z

I tried writing a few lines of code. I got my first completion working properly.

# Function that prints hello world
def hello_world():
    print('Hello World!')

hello_world()

But when I started adding more code, I got an error from Turbopilot saying the following:

(turbopilot-test) mbonon@mbonon-tm01926-mbp turbopilot-test % ./turbopilot-bin -m stablecode -f ./models/stablecode-instruct-alpha-3b.ggmlv1.q8_0.bin
[2023-08-13 14:14:20.759] [info] Initializing StableLM type model for 'stablecode' model type
[2023-08-13 14:14:20.760] [info] Attempt to load model from stablecode
load_model: loading model from './models/stablecode-instruct-alpha-3b.ggmlv1.q8_0.bin' - please wait ...
load_model: n_vocab = 49152
load_model: n_ctx   = 4096
load_model: n_embd  = 2560
load_model: n_head  = 32
load_model: n_layer = 32
load_model: n_rot   = 20
load_model: par_res = 1
load_model: ftype   = 2007
load_model: qntvr   = 2
load_model: ggml ctx size = 6169.28 MB
load_model: memory_size =  1280.00 MB, n_mem = 131072
load_model: ................................................ done
load_model: model size =  2809.08 MB / num tensors = 388
[2023-08-13 14:14:22.712] [info] Loaded model in 1951.30ms
(2023-08-13 06:14:22) [INFO    ] Crow/1.0 server is running at http://0.0.0.0:18080 using 8 threads
(2023-08-13 06:14:22) [INFO    ] Call `app.loglevel(crow::LogLevel::Warning)` to hide Info level logs.
(2023-08-13 06:19:29) [INFO    ] Request: 127.0.0.1:52574 0x105008200 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:31) [INFO    ] Response: 0x105008200 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:31) [INFO    ] Request: 127.0.0.1:52575 0x131813a00 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:32) [INFO    ] Response: 0x131813a00 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:36) [INFO    ] Request: 127.0.0.1:52577 0x13200fa00 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:37) [INFO    ] Response: 0x13200fa00 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:37) [INFO    ] Request: 127.0.0.1:52578 0x131813a00 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:38) [INFO    ] Response: 0x131813a00 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:43) [INFO    ] Request: 127.0.0.1:52581 0x13180e000 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:46) [INFO    ] Response: 0x13180e000 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:47) [INFO    ] Request: 127.0.0.1:52582 0x13200fa00 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:48) [INFO    ] Response: 0x13200fa00 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:49) [INFO    ] Request: 127.0.0.1:52583 0x132009200 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:50) [INFO    ] Response: 0x132009200 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:50) [INFO    ] Request: 127.0.0.1:52584 0x125809000 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:54) [INFO    ] Response: 0x125809000 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:54) [INFO    ] Request: 127.0.0.1:52586 0x13180e000 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:56) [INFO    ] Response: 0x13180e000 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:56) [INFO    ] Request: 127.0.0.1:52588 0x131814200 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:58) [INFO    ] Response: 0x131814200 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:58) [INFO    ] Request: 127.0.0.1:52589 0x125809000 HTTP/1.1 POST /v1/engines/codegen/completions
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 510492368, available 268435456)
GGML_ASSERT: /Users/mbonon/coding/turbopilot-test/turbopilot/extern/ggml/src/ggml.c:16810: buf
zsh: abort      ./turbopilot-bin -m stablecode -f

I'm running on a MacBook Pro with Apple M1 Pro chip and 16 GB of memory.

The text was updated successfully, but these errors were encountered:

ravenscroftj · 2023-08-13T09:14:26Z

Thanks for opening this issue. I think there's an issue with the way the simple example code in the GGML repo allocates memory for the models. I probably need to have a dig around in the ggml code and see if I can get it to allocate more memory.

Did this happen after repeated generations appending to the same file? Are you using the fauxcode vscode plugin or the huggingface plugin?

czkoko · 2023-08-13T10:04:08Z

I have the same problem when using Fauxpilot. If you use curl post data, there will be no problem, and the codegen-serve works normally.

m1chae1bx · 2023-08-13T12:23:22Z

Happens after repeated generations I think as I'm typing, also based from the number of calls to the completions endpoint from the logs. I'm using the fauxpilot extension

gilbok · 2023-08-23T08:41:33Z

I also experienced it.
I'm running stablecode model on a MacBook Air M2 and 16 GB of memory.
I'm using fauxcode vscode plugin.

kkqin · 2023-08-23T12:43:36Z

I use Fauxpilot, and when I use a longer prompt, it encounters such issues and exits.

ravenscroftj · 2023-08-24T13:16:26Z

I've deployed a change to allow users to specify smaller batch sizes (#59). Normally we set batch size (number of tokens to attempt to process in the same forward pass) to 512. However, this is quite memory intensive, especially with the larger models (starcoder/wizardcoder). If you build from main and then pass -b 256 or even -b 128 you might find that this issue goes away.

I will package up a new minor release in the next couple of days that will include this change if you don't want to build from main.

ElYaiko · 2023-08-26T22:55:27Z

I've deployed a change to allow users to specify smaller batch sizes (#59). Normally we set batch size (number of tokens to attempt to process in the same forward pass) to 512. However, this is quite memory intensive, especially with the larger models (starcoder/wizardcoder). If you build from main and then pass -b 256 or even -b 128 you might find that this issue goes away.

I will package up a new minor release in the next couple of days that will include this change if you don't want to build from main.

The probem is still happenning for me with stablecode and santacoder I tried even -b 64 and I still get:

ggml_new_object: not enough space in the context's memory pool (needed 510481728, available 268435456)

This was referenced Aug 27, 2023

gpt-j, starcoder, gptneox examples cause "not enough space in the context's memory pool" for batches >32 ggerganov/ggml#484

Open

Temporary fix for stablecode and starcoder on mac #67

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml_new_tensor_impl: not enough space in the context's memory pool #47

ggml_new_tensor_impl: not enough space in the context's memory pool #47

m1chae1bx commented Aug 13, 2023

ravenscroftj commented Aug 13, 2023

czkoko commented Aug 13, 2023

m1chae1bx commented Aug 13, 2023

gilbok commented Aug 23, 2023

kkqin commented Aug 23, 2023

ravenscroftj commented Aug 24, 2023 •

edited

Loading

ElYaiko commented Aug 26, 2023

ggml_new_tensor_impl: not enough space in the context's memory pool #47

ggml_new_tensor_impl: not enough space in the context's memory pool #47

Comments

m1chae1bx commented Aug 13, 2023

ravenscroftj commented Aug 13, 2023

czkoko commented Aug 13, 2023

m1chae1bx commented Aug 13, 2023

gilbok commented Aug 23, 2023

kkqin commented Aug 23, 2023

ravenscroftj commented Aug 24, 2023 • edited Loading

ElYaiko commented Aug 26, 2023

ravenscroftj commented Aug 24, 2023 •

edited

Loading