Releases · OpenBMB/llama.cpp

08 Nov 07:48

76c6e7f

b4049 Latest

Latest

server : minor UI fix (#10207)

Assets 22

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-11-08T07:48:47Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-11-08T07:48:56Z
llama-b1-bin-win-hip-x64-gfx1030.zip

236 MB 2024-11-08T07:49:08Z
llama-b1-bin-win-hip-x64-gfx1100.zip

238 MB 2024-11-08T07:49:16Z
llama-b1-bin-win-hip-x64-gfx1101.zip

238 MB 2024-11-08T07:49:24Z
llama-b4049-bin-macos-arm64.zip

52.1 MB 2024-11-08T07:49:32Z
llama-b4049-bin-macos-x64.zip

53.7 MB 2024-11-08T07:49:34Z
llama-b4049-bin-ubuntu-x64.zip

56.9 MB 2024-11-08T07:49:37Z
llama-b4049-bin-win-avx-x64.zip

8.12 MB 2024-11-08T07:49:39Z
llama-b4049-bin-win-avx2-x64.zip

8.12 MB 2024-11-08T07:49:40Z
Source code (zip)

2024-11-07T22:44:38Z
Source code (tar.gz)

2024-11-07T22:44:38Z

16 Oct 09:20

github-actions

b3923

becfd38

b3923

[CANN] Fix cann compilation error (#9891)

Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.

Assets 22

14 Oct 09:31

github-actions

b3918

ccc7bb7

b3918

fix memory leaks in minicpmv

Assets 22

14 Oct 09:00

github-actions

b3917

a89f75e

b3917

server : handle "logprobs" field with false value (#9871)

Co-authored-by: Gimling <[email protected]>

Assets 22

14 Oct 07:31

github-actions

b3916

13dca2a

b3916

Vectorize load instructions in dmmv f16 CUDA kernel (#9816)

* Vectorize load instructions in dmmv f16 CUDA kernel

Replaces scalar with vector load instructions, which substantially
improves performance on NVIDIA HBM GPUs, e.g. gives a 1.27X overall
speedup for Meta-Llama-3-8B-Instruct-F16 BS1 inference evaluation on
H100 SXM 80GB HBM3. On GDDR GPUs, there is a slight (1.01X) speedup.

* addressed comment

* Update ggml/src/ggml-cuda/dmmv.cu

Co-authored-by: Johannes Gäßler <[email protected]>

---------

Co-authored-by: Johannes Gäßler <[email protected]>

Assets 22

09 Oct 08:24

github-actions

b3899

dca1d4b

b3899

ggml : fix BLAS with unsupported types (#9775)

* ggml : do not use BLAS with types without to_float

* ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies

* ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits

it's not really internal if everybody uses it

Assets 22

30 Sep 08:26

github-actions

b3848

c919d5d

b3848

ggml : define missing HWCAP flags (#9684)

ggml-ci

Co-authored-by: Willy Tarreau <[email protected]>

Assets 22

05 Sep 15:16

github-actions

b3669

4db0478

b3669

cuda : fix defrag with quantized KV (#9319)

Assets 19

04 Sep 04:31

github-actions

b3662

7605ae7

b3662

flake.lock: Update (#9261)

Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/8471fe90ad337a8074e957b69ca4d0089218391d?narHash=sha256-XOQkdLafnb/p9ij77byFQjDf5m5QYl9b2REiVClC%2Bx4%3D' (2024-08-01)
  → 'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/c374d94f1536013ca8e92341b540eba4c22f9c62?narHash=sha256-Z/ELQhrSd7bMzTO8r7NZgi9g5emh%2BaRKoCdaAv5fiO0%3D' (2024-08-21)
  → 'github:NixOS/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2?narHash=sha256-GnR7/ibgIH1vhoy8cYdmXE6iyZqKqFxQSVkFgosBh6w%3D' (2024-08-28)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Assets 19

03 Sep 08:16

github-actions

b3660

b69a480

b3660

readme : refactor API section + remove old hot topics

Assets 19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: OpenBMB/llama.cpp

b4049

b3923

b3918

b3917

b3916

b3899

b3848

b3669

b3662

b3660