Skip to content

Releases: OpenBMB/llama.cpp

b4049

08 Nov 07:48
76c6e7f
Compare
Choose a tag to compare
server : minor UI fix (#10207)

b3923

16 Oct 09:20
becfd38
Compare
Choose a tag to compare
[CANN] Fix cann compilation error (#9891)

Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.

b3918

14 Oct 09:31
Compare
Choose a tag to compare
fix memory leaks in minicpmv

b3917

14 Oct 09:00
a89f75e
Compare
Choose a tag to compare
server : handle "logprobs" field with false value (#9871)

Co-authored-by: Gimling <[email protected]>

b3916

14 Oct 07:31
13dca2a
Compare
Choose a tag to compare
Vectorize load instructions in dmmv f16 CUDA kernel (#9816)

* Vectorize load instructions in dmmv f16 CUDA kernel

Replaces scalar with vector load instructions, which substantially
improves performance on NVIDIA HBM GPUs, e.g. gives a 1.27X overall
speedup for Meta-Llama-3-8B-Instruct-F16 BS1 inference evaluation on
H100 SXM 80GB HBM3. On GDDR GPUs, there is a slight (1.01X) speedup.

* addressed comment

* Update ggml/src/ggml-cuda/dmmv.cu

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

b3899

09 Oct 08:24
dca1d4b
Compare
Choose a tag to compare
ggml : fix BLAS with unsupported types (#9775)

* ggml : do not use BLAS with types without to_float

* ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies

* ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits

it's not really internal if everybody uses it

b3848

30 Sep 08:26
c919d5d
Compare
Choose a tag to compare
ggml : define missing HWCAP flags (#9684)

ggml-ci

Co-authored-by: Willy Tarreau <[email protected]>

b3669

05 Sep 15:16
4db0478
Compare
Choose a tag to compare
cuda : fix defrag with quantized KV (#9319)

b3662

04 Sep 04:31
7605ae7
Compare
Choose a tag to compare
flake.lock: Update (#9261)

Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/8471fe90ad337a8074e957b69ca4d0089218391d?narHash=sha256-XOQkdLafnb/p9ij77byFQjDf5m5QYl9b2REiVClC%2Bx4%3D' (2024-08-01)
  → 'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/c374d94f1536013ca8e92341b540eba4c22f9c62?narHash=sha256-Z/ELQhrSd7bMzTO8r7NZgi9g5emh%2BaRKoCdaAv5fiO0%3D' (2024-08-21)
  → 'github:NixOS/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2?narHash=sha256-GnR7/ibgIH1vhoy8cYdmXE6iyZqKqFxQSVkFgosBh6w%3D' (2024-08-28)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

b3660

03 Sep 08:16
b69a480
Compare
Choose a tag to compare
readme : refactor API section + remove old hot topics