Llama2 7b model C++ example #3666

ototh-htec · 2024-11-29T13:46:47Z

Implemented an example for https://huggingface.co/amd/Llama-2-7b-chat-hf-awq-int4-asym-gs128-onnx/tree/main Llama2 7b model with MIGraphX.

Details about the example and description for running is available in README (https://github.com/ROCm/AMDMIGraphX/tree/htec/mgx-llama2-7b-example/examples/transformers/mgx_llama2)

…struction

gyulaz-htec · 2024-11-29T13:51:25Z

src/api/api.cpp

@@ -1522,6 +1522,21 @@ extern "C" migraphx_status migraphx_module_add_instruction(migraphx_instruction_
    return api_error_result;
 }

+extern "C" migraphx_status migraphx_module_get_last_instruction(migraphx_instruction_t* out,


We can remove these changes (commits) which are related to exposing this API part. Because we ended up not needing these changes.

Removed them

codecov · 2024-11-29T15:24:57Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.23%. Comparing base (2e9104a) to head (b499638).
Report is 2 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #3666   +/-   ##
========================================
  Coverage    92.23%   92.23%           
========================================
  Files          514      514           
  Lines        21746    21746           
========================================
  Hits         20057    20057           
  Misses        1689     1689

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…_last_instruction" This reverts commit d970b30.

migraphx-bot · 2024-12-06T18:25:24Z

Test	Batch	Rate new b49963	Rate old 4b15b6	Diff	Compare
torchvision-resnet50	64	3,254.17	3,257.89	-0.11%	✅
torchvision-resnet50_fp16	64	6,988.94	6,918.72	1.01%	✅
torchvision-densenet121	32	2,432.26	2,432.91	-0.03%	✅
torchvision-densenet121_fp16	32	4,085.65	4,086.33	-0.02%	✅
torchvision-inceptionv3	32	1,627.76	1,628.71	-0.06%	✅
torchvision-inceptionv3_fp16	32	2,742.43	2,745.51	-0.11%	✅
cadene-inceptionv4	16	764.96	764.62	0.04%	✅
cadene-resnext64x4	16	810.58	814.13	-0.44%	✅
slim-mobilenet	64	7,461.51	7,458.12	0.05%	✅
slim-nasnetalarge	64	208.46	209.03	-0.27%	✅
slim-resnet50v2	64	3,440.03	3,436.59	0.10%	✅
bert-mrpc-onnx	8	1,150.36	1,150.32	0.00%	✅
bert-mrpc-tf	1	475.69	449.82	5.75%	🔆
pytorch-examples-wlang-gru	1	428.15	437.44	-2.12%	✅
pytorch-examples-wlang-lstm	1	436.50	383.25	13.89%	🔆
torchvision-resnet50_1	1	777.19	741.45	4.82%	🔆
cadene-dpn92_1	1	399.33	400.49	-0.29%	✅
cadene-resnext101_1	1	382.84	382.70	0.04%	✅
onnx-taau-downsample	1	345.53	345.15	0.11%	✅
dlrm-criteoterabyte	1	33.35	33.31	0.11%	✅
dlrm-criteoterabyte_fp16	1	52.51	52.72	-0.40%	✅
agentmodel	1	8,059.84	8,463.32	-4.77%	🔴
unet_fp16	2	58.74	58.77	-0.05%	✅
resnet50v1_fp16	1	935.52	934.37	0.12%	✅
resnet50v1_int8	1	995.64	1,030.33	-3.37%	🔴
bert_base_cased_fp16	64	1,169.76	1,170.28	-0.04%	✅
bert_large_uncased_fp16	32	363.68	363.03	0.18%	✅
bert_large_fp16	1	199.12	199.90	-0.39%	✅
distilgpt2_fp16	16	2,197.04	2,197.64	-0.03%	✅
yolov5s	1	528.96	518.78	1.96%	✅
tinyllama	1	43.34	43.68	-0.79%	✅
vicuna-fastchat	1	165.26	173.15	-4.56%	🔴
whisper-tiny-encoder	1	418.07	417.60	0.11%	✅
whisper-tiny-decoder	1	432.36	429.03	0.78%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2024-12-06T18:25:26Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

turneram · 2024-12-06T21:32:15Z

examples/transformers/mgx_llama2/README.md

+make -j
+
+# Running the example
+export MIOPEN_FIND_ENFORCE=3


Is this needed? I don't believe we make any MIOpen calls for this model.

spolifroni-amd

readme looks fine from a grammar/spelling angle

causten · 2025-01-23T00:55:17Z

examples/transformers/mgx_llama2/CMakeLists.txt

+#####################################################################################
+# The MIT License (MIT)
+#
+# Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.


Suggested change

# Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.

# Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.

causten · 2025-01-23T00:56:13Z

examples/transformers/mgx_llama2/README.md

+make -j
+
+# Running the example
+export MIOPEN_FIND_ENFORCE=3


Suggested change

export MIOPEN_FIND_ENFORCE=3

causten · 2025-01-23T00:56:34Z

examples/transformers/mgx_llama2/build_docker.sh

+#####################################################################################
+# The MIT License (MIT)
+#
+# Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.


Suggested change

# Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.

# Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.

causten · 2025-01-23T00:56:46Z

examples/transformers/mgx_llama2/eval_accuracy.py

+#####################################################################################
+# The MIT License (MIT)
+#
+# Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.


Suggested change

# Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.

# Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.

causten · 2025-01-23T00:57:07Z

examples/transformers/mgx_llama2/harness/buffer.hpp

+/*
+ * The MIT License (MIT)
+ *
+ * Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.


Suggested change

* Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.

* Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.

causten · 2025-01-23T00:59:51Z

examples/transformers/mgx_llama2/harness/timer.hpp

+/*
+ * The MIT License (MIT)
+ *
+ * Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.


Suggested change

* Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.

* Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.

causten · 2025-01-23T01:00:03Z

examples/transformers/mgx_llama2/harness/utils.hpp

+/*
+ * The MIT License (MIT)
+ *
+ * Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.


Suggested change

* Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.

* Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.

causten · 2025-01-23T01:00:17Z

examples/transformers/mgx_llama2/mgxllama2.cc

+/*
+ * The MIT License (MIT)
+ *
+ * Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.


Suggested change

* Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.

* Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.

causten · 2025-01-23T01:00:34Z

examples/transformers/mgx_llama2/preprocess_dataset.py

+#####################################################################################
+# The MIT License (MIT)
+#
+# Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.


Suggested change

# Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.

# Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.

causten · 2025-01-23T01:00:49Z

examples/transformers/mgx_llama2/run_docker.sh

+#####################################################################################
+# The MIT License (MIT)
+#
+# Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.


Suggested change

# Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.

# Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.

causten · 2025-01-23T01:16:09Z

examples/transformers/mgx_llama2/README.md

+### Getting the pre-quantized model from HuggingFace
+```bash
+pip install -U "huggingface_hub[cli]"
+huggingface-cli login YOUR_HF_TOKEN


Suggested change

huggingface-cli login YOUR_HF_TOKEN

huggingface-cli login login --token <YOUR_HF_TOKEN>

causten · 2025-01-23T01:18:10Z

examples/transformers/mgx_llama2/README.md

+```bash
+pip install -U "huggingface_hub[cli]"
+huggingface-cli login YOUR_HF_TOKEN
+hugginggface-cli download https://huggingface.co/amd/Llama-2-7b-chat-hf-awq-int4-asym-gs128-onnx


Suggested change

hugginggface-cli download https://huggingface.co/amd/Llama-2-7b-chat-hf-awq-int4-asym-gs128-onnx

huggingface-cli download amd/Llama-2-7b-chat-hf-awq-int4-asym-gs128-onnx

causten · 2025-01-23T02:18:52Z

examples/transformers/mgx_llama2/README.md

+
+```bash
+# Convert dataset to numpy format
+./prepocess_dataset.py


I think a step is missing. I performed the steps listed to use the pre-quantized model

Suggested change

./prepocess_dataset.py

gunzip /mgx_llama2/open_orca/open_orca_gpt4_tokenized_llama.calibration_1000.pkl.gz

python3 preprocess_dataset.py

causten · 2025-01-23T02:24:00Z

examples/transformers/mgx_llama2/README.md

+
+# Builidng the example
+cd mgx_llama2
+mkdir build && cd build


there was already a build dir so the command failed.

causten · 2025-01-23T02:27:20Z

examples/transformers/mgx_llama2/README.md

+
+# Running the example
+export MIOPEN_FIND_ENFORCE=3
+./mgxllama2


root@12320f9edbf9:/mgx_llama2/build# ./mgxllama2 terminate called after throwing an instance of 'std::runtime_error' what(): hip error (101): invalid device ordinal Aborted (core dumped)

causten · 2025-01-23T02:29:55Z

examples/transformers/mgx_llama2/README.md

+./mgxllama2
+
+# Test the accuracy of the output
+python3 eval_accuracy.py


root@12320f9edbf9:/mgx_llama2# python3 eval_accuracy.py None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_http.py", line 406, in hf_raise_for_status response.raise_for_status() File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1024, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json

causten · 2025-01-23T02:44:22Z

examples/transformers/mgx_llama2/README.md

+#### Building and running the example
+
+```bash
+# Convert dataset to numpy format


when I launched the docker via run_docker.sh I landed in the /mgx_llama2/build directory so the preprocess_dataset.py command's path is wrong

gyulaz-htec and others added 30 commits November 17, 2024 04:33

WIP mgx llama2-7b example

4b6c099

Code works with offload copy

1edf198

Add support to load onnx file

7cd6a0c

Add dockerization for mgx_llama2 example

595771f

Rework buffer allocation so offload_copy can be turned on/off

fdd16ed

Use dedicated hipStream for synchronization

1b78a1c

Save onnx model to mxr file

0731e26

Only copy changed data

2ab6659

Extend model loading options with fast_math

e7f84c2

Fix quant message

0fe7924

Basic tokens/sec counting

984e0dc

Add preprocess dataset script

4134505

Use dataset from numpy files if available

6735534

Support npy dataset with multiple samples

f1960f7

Add missing upload to device for multiple samples

55e41f4

Add accuracy calculation for mgx_llama2 example

4d25c49

Use MIGraphX from develop branch in Dockerfile

3dd55b5

Fix dataset loading

2416786

Add README to C++ LLama2 example

0877f72

Add buffers for llama2 7b quantized models

513cc0d

Fix Llama2-7b model file parse and input buffers

048e587

Fix llama 7b quantized model evaluation step, use 2 models

71b7522

Support llama 7b quantized model without offload copy

9220cf1

Connect dataset to llama 7b quantized model

3842b54

Fix output buffer usage for new sample

d8a85cd

Comment out past/present_key_value binding

37ed15b

Add new migraphx public API functions: replace_return and get_last_in…

d970b30

…struction

Add romxProfileData to the example container

a7801b0

Update MGX branch and en variables in the example docker

9152cd1

Fix example readme

14df5db

Fix last batch when there is not enough sample

2142a40

ototh-htec requested review from turneram, attila-dusnoki-htec and gyulaz-htec November 29, 2024 13:46

gyulaz-htec reviewed Nov 29, 2024

View reviewed changes

ototh-htec added 4 commits December 4, 2024 04:11

Revert "Add new migraphx public API functions: replace_return and get…

b919b0f

…_last_instruction" This reverts commit d970b30.

Remove unused python imports

41447b0

Format files with clang-format

91f368b

Format files with clang-format

e5c0d62

ototh-htec requested a review from gyulaz-htec December 4, 2024 12:16

ototh-htec added 7 commits December 4, 2024 07:42

Fix delete modifier ident

7d0bb2f

Fix python files format issues

43364a1

Fix python files format issues 2

5f2190f

Merge branch 'develop' into htec/mgx-llama2-7b-example

7468a3e

Make output results const for cppcheck

0640809

Pass GPU_TARGET from build_docker script to Dockerfile

20fb8bc

Merge branch 'develop' into htec/mgx-llama2-7b-example

9165b90

ototh-htec requested a review from marko-fabo-htec December 6, 2024 09:11

ototh-htec changed the title ~~[WIP] Llama2 7b model C++ example~~ Llama2 7b model C++ example Dec 6, 2024

ototh-htec marked this pull request as ready for review December 6, 2024 09:11

ototh-htec requested review from a team and causten as code owners December 6, 2024 09:11

Add license

b499638

turneram reviewed Dec 7, 2024

View reviewed changes

spolifroni-amd approved these changes Dec 10, 2024

View reviewed changes

causten reviewed Jan 23, 2025

View reviewed changes

causten requested changes Jan 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama2 7b model C++ example #3666

Llama2 7b model C++ example #3666

ototh-htec commented Nov 29, 2024 •

edited

Loading

gyulaz-htec Nov 29, 2024 •

edited

Loading

ototh-htec Dec 6, 2024

codecov bot commented Nov 29, 2024 •

edited

Loading

migraphx-bot commented Dec 6, 2024

migraphx-bot commented Dec 6, 2024

turneram Dec 6, 2024

spolifroni-amd left a comment

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

causten Jan 23, 2025

	# Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.
	# Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.

	* Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.
	* Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.

	huggingface-cli login YOUR_HF_TOKEN
	huggingface-cli login login --token <YOUR_HF_TOKEN>

	hugginggface-cli download https://huggingface.co/amd/Llama-2-7b-chat-hf-awq-int4-asym-gs128-onnx
	huggingface-cli download amd/Llama-2-7b-chat-hf-awq-int4-asym-gs128-onnx

	./prepocess_dataset.py
	gunzip /mgx_llama2/open_orca/open_orca_gpt4_tokenized_llama.calibration_1000.pkl.gz
	python3 preprocess_dataset.py

Llama2 7b model C++ example #3666

Are you sure you want to change the base?

Llama2 7b model C++ example #3666

Conversation

ototh-htec commented Nov 29, 2024 • edited Loading

gyulaz-htec Nov 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 29, 2024 • edited Loading

Codecov Report

migraphx-bot commented Dec 6, 2024

migraphx-bot commented Dec 6, 2024

Choose a reason for hiding this comment

spolifroni-amd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ototh-htec commented Nov 29, 2024 •

edited

Loading

gyulaz-htec Nov 29, 2024 •

edited

Loading

codecov bot commented Nov 29, 2024 •

edited

Loading