Releases · OpenBMB/llama.cpp

30 Aug 08:08

7ea8d80

b3645

llava : the function "clip" should be int (#9237)

Assets 19

22 Aug 10:22

github-actions

b3615

1731d42

b3615

[SYCL] Add oneDNN primitive support (#9091)

* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc

Assets 19

10 Aug 09:43

github-actions

b3621

fc1c860

b3621

Merge branch 'prepare-PR-of-minicpm-v2.6' into master

Assets 20

24 Jun 04:04

github-actions

b3209

95f57bb

b3209

ggml : remove ggml_task_type and GGML_PERF (#8017)

* ggml : remove ggml_task_type and GGML_PERF

* check abort_callback on main thread only

* vulkan : remove usage of ggml_compute_params

* remove LLAMA_PERF

Assets 20

04 Jun 07:37

github-actions

b3078

bde7cd3

b3078

llama : offload to RPC in addition to other backends (#7640)

* llama : offload to RPC in addition to other backends

* - fix copy_tensor being called on the src buffer instead of the dst buffer

- always initialize views in the view_src buffer

- add RPC backend to Makefile build

- add endpoint to all RPC object names

* add rpc-server to Makefile

* Update llama.cpp

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>

Assets 21

28 May 20:05

github-actions

b3026

5442939

b3026

llama : support small Granite models (#7481)

* Add optional MLP bias for Granite models

Add optional MLP bias for ARCH_LLAMA to support Granite models.
Partially addresses ggerganov/llama.cpp/issues/7116
Still needs some more changes to properly support Granite.

* llama: honor add_space_prefix from the model configuration

propagate the add_space_prefix configuration from the HF model
configuration to the gguf file and honor it with the gpt2 tokenizer.

Signed-off-by: Giuseppe Scrivano <[email protected]>

* llama: add support for small granite models

it works only for the small models 3b and 8b.

The convert-hf-to-gguf.py script uses the vocabulary size of the
granite models to detect granite and set the correct configuration.

Signed-off-by: Giuseppe Scrivano <[email protected]>

---------

Signed-off-by: Giuseppe Scrivano <[email protected]>
Co-authored-by: Steffen Roecker <[email protected]>

Assets 21

28 May 20:04

github-actions

b3025

56411a9

b3025

vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE …

Assets 21

23 May 11:53

github-actions

b2979

9b82476

b2979

Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-…

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: OpenBMB/llama.cpp

b3645

b3615

b3621

b3209

b3078

b3026

b3025

b2979