Releases: OpenBMB/llama.cpp
Releases · OpenBMB/llama.cpp
b3645
llava : the function "clip" should be int (#9237)
b3615
[SYCL] Add oneDNN primitive support (#9091) * add onednn * add sycl_f16 * add dnnl stream * add engine map * use dnnl for intel only * use fp16fp16fp16 * update doc
b3621
Merge branch 'prepare-PR-of-minicpm-v2.6' into master
b3209
ggml : remove ggml_task_type and GGML_PERF (#8017) * ggml : remove ggml_task_type and GGML_PERF * check abort_callback on main thread only * vulkan : remove usage of ggml_compute_params * remove LLAMA_PERF
b3078
llama : offload to RPC in addition to other backends (#7640) * llama : offload to RPC in addition to other backends * - fix copy_tensor being called on the src buffer instead of the dst buffer - always initialize views in the view_src buffer - add RPC backend to Makefile build - add endpoint to all RPC object names * add rpc-server to Makefile * Update llama.cpp Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>
b3026
llama : support small Granite models (#7481) * Add optional MLP bias for Granite models Add optional MLP bias for ARCH_LLAMA to support Granite models. Partially addresses ggerganov/llama.cpp/issues/7116 Still needs some more changes to properly support Granite. * llama: honor add_space_prefix from the model configuration propagate the add_space_prefix configuration from the HF model configuration to the gguf file and honor it with the gpt2 tokenizer. Signed-off-by: Giuseppe Scrivano <[email protected]> * llama: add support for small granite models it works only for the small models 3b and 8b. The convert-hf-to-gguf.py script uses the vocabulary size of the granite models to detect granite and set the correct configuration. Signed-off-by: Giuseppe Scrivano <[email protected]> --------- Signed-off-by: Giuseppe Scrivano <[email protected]> Co-authored-by: Steffen Roecker <[email protected]>
b3025
vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE …
b2979
Add missing inference support for GPTNeoXForCausalLM (Pythia and GPT-…