Skip to content
This repository has been archived by the owner on Sep 30, 2023. It is now read-only.

ggml_new_tensor_impl: not enough space in the context's memory pool #47

Open
m1chae1bx opened this issue Aug 13, 2023 · 7 comments
Open

Comments

@m1chae1bx
Copy link

I tried writing a few lines of code. I got my first completion working properly.

# Function that prints hello world
def hello_world():
    print('Hello World!')

hello_world()

But when I started adding more code, I got an error from Turbopilot saying the following:

(turbopilot-test) mbonon@mbonon-tm01926-mbp turbopilot-test % ./turbopilot-bin -m stablecode -f ./models/stablecode-instruct-alpha-3b.ggmlv1.q8_0.bin
[2023-08-13 14:14:20.759] [info] Initializing StableLM type model for 'stablecode' model type
[2023-08-13 14:14:20.760] [info] Attempt to load model from stablecode
load_model: loading model from './models/stablecode-instruct-alpha-3b.ggmlv1.q8_0.bin' - please wait ...
load_model: n_vocab = 49152
load_model: n_ctx   = 4096
load_model: n_embd  = 2560
load_model: n_head  = 32
load_model: n_layer = 32
load_model: n_rot   = 20
load_model: par_res = 1
load_model: ftype   = 2007
load_model: qntvr   = 2
load_model: ggml ctx size = 6169.28 MB
load_model: memory_size =  1280.00 MB, n_mem = 131072
load_model: ................................................ done
load_model: model size =  2809.08 MB / num tensors = 388
[2023-08-13 14:14:22.712] [info] Loaded model in 1951.30ms
(2023-08-13 06:14:22) [INFO    ] Crow/1.0 server is running at http://0.0.0.0:18080 using 8 threads
(2023-08-13 06:14:22) [INFO    ] Call `app.loglevel(crow::LogLevel::Warning)` to hide Info level logs.
(2023-08-13 06:19:29) [INFO    ] Request: 127.0.0.1:52574 0x105008200 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:31) [INFO    ] Response: 0x105008200 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:31) [INFO    ] Request: 127.0.0.1:52575 0x131813a00 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:32) [INFO    ] Response: 0x131813a00 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:36) [INFO    ] Request: 127.0.0.1:52577 0x13200fa00 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:37) [INFO    ] Response: 0x13200fa00 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:37) [INFO    ] Request: 127.0.0.1:52578 0x131813a00 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:38) [INFO    ] Response: 0x131813a00 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:43) [INFO    ] Request: 127.0.0.1:52581 0x13180e000 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:46) [INFO    ] Response: 0x13180e000 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:47) [INFO    ] Request: 127.0.0.1:52582 0x13200fa00 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:48) [INFO    ] Response: 0x13200fa00 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:49) [INFO    ] Request: 127.0.0.1:52583 0x132009200 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:50) [INFO    ] Response: 0x132009200 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:50) [INFO    ] Request: 127.0.0.1:52584 0x125809000 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:54) [INFO    ] Response: 0x125809000 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:54) [INFO    ] Request: 127.0.0.1:52586 0x13180e000 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:56) [INFO    ] Response: 0x13180e000 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:56) [INFO    ] Request: 127.0.0.1:52588 0x131814200 HTTP/1.1 POST /v1/engines/codegen/completions
(2023-08-13 06:19:58) [INFO    ] Response: 0x131814200 /v1/engines/codegen/completions 200 1
(2023-08-13 06:19:58) [INFO    ] Request: 127.0.0.1:52589 0x125809000 HTTP/1.1 POST /v1/engines/codegen/completions
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 510492368, available 268435456)
GGML_ASSERT: /Users/mbonon/coding/turbopilot-test/turbopilot/extern/ggml/src/ggml.c:16810: buf
zsh: abort      ./turbopilot-bin -m stablecode -f 

I'm running on a MacBook Pro with Apple M1 Pro chip and 16 GB of memory.

@ravenscroftj
Copy link
Owner

Thanks for opening this issue. I think there's an issue with the way the simple example code in the GGML repo allocates memory for the models. I probably need to have a dig around in the ggml code and see if I can get it to allocate more memory.

Did this happen after repeated generations appending to the same file? Are you using the fauxcode vscode plugin or the huggingface plugin?

@czkoko
Copy link

czkoko commented Aug 13, 2023

I have the same problem when using Fauxpilot. If you use curl post data, there will be no problem, and the codegen-serve works normally.

@m1chae1bx
Copy link
Author

Happens after repeated generations I think as I'm typing, also based from the number of calls to the completions endpoint from the logs. I'm using the fauxpilot extension

@gilbok
Copy link

gilbok commented Aug 23, 2023

I also experienced it.
I'm running stablecode model on a MacBook Air M2 and 16 GB of memory.
I'm using fauxcode vscode plugin.

@kkqin
Copy link

kkqin commented Aug 23, 2023

I use Fauxpilot, and when I use a longer prompt, it encounters such issues and exits.

@ravenscroftj
Copy link
Owner

ravenscroftj commented Aug 24, 2023

I've deployed a change to allow users to specify smaller batch sizes (#59). Normally we set batch size (number of tokens to attempt to process in the same forward pass) to 512. However, this is quite memory intensive, especially with the larger models (starcoder/wizardcoder). If you build from main and then pass -b 256 or even -b 128 you might find that this issue goes away.

I will package up a new minor release in the next couple of days that will include this change if you don't want to build from main.

@ElYaiko
Copy link

ElYaiko commented Aug 26, 2023

I've deployed a change to allow users to specify smaller batch sizes (#59). Normally we set batch size (number of tokens to attempt to process in the same forward pass) to 512. However, this is quite memory intensive, especially with the larger models (starcoder/wizardcoder). If you build from main and then pass -b 256 or even -b 128 you might find that this issue goes away.

I will package up a new minor release in the next couple of days that will include this change if you don't want to build from main.

The probem is still happenning for me with stablecode and santacoder I tried even -b 64 and I still get:

ggml_new_object: not enough space in the context's memory pool (needed 510481728, available 268435456)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants