Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cohere ONNX export support #1905

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Add Cohere ONNX export support #1905

wants to merge 1 commit into from

Conversation

xenova
Copy link
Contributor

@xenova xenova commented Jun 12, 2024

What does this PR do?

This PR adds export support for Cohere models (similar to Llama). This does require one patch in transformers, however, due to a problematic torch.repeat_interleave op within CohereRotaryEmbedding:

with torch.autocast(device_type=device_type, enabled=False):
    freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
-   emb = torch.repeat_interleave(freqs, 2, dim=-1)
+   emb = freqs[..., None].expand(*freqs.shape, 2).reshape(*freqs.shape[:-1], -1)
    cos = emb.cos()
    sin = emb.sin()

I suspect this is a bug in torch/onnx, maybe @fxmarty can confirm? cc @saurabhdash2512 also, who contributed the model in huggingface/transformers#29622.

Export logs without change:
$ optimum-cli export onnx -m hf-internal-testing/tiny-random-CohereModel o
Framework not specified. Using pt to export the model.
/usr/local/python/3.10.13/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Automatic task detection to feature-extraction-with-past (possible synonyms are: default-with-past).
/usr/local/python/3.10.13/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: CohereModel *****
Using framework PyTorch: 2.2.2+cu121
Overriding 1 configuration item(s)
        - use_cache -> True
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/cohere/modeling_cohere.py:1014: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if sequence_length != 1:
2024-06-12 12:23:32.921382493 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Expand node. Name:'/layers.0/self_attn/rotary_emb/Expand' Status Message: invalid expand shape
Traceback (most recent call last):
  File "/home/codespace/.python/current/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "/workspaces/optimum/optimum/commands/optimum_cli.py", line 208, in main
    service.run()
  File "/workspaces/optimum/optimum/commands/export/onnx.py", line 265, in run
    main_export(
  File "/workspaces/optimum/optimum/exporters/onnx/__main__.py", line 352, in main_export
    onnx_export_from_model(
  File "/workspaces/optimum/optimum/exporters/onnx/convert.py", line 1170, in onnx_export_from_model
    _, onnx_outputs = export_models(
  File "/workspaces/optimum/optimum/exporters/onnx/convert.py", line 776, in export_models
    export(
  File "/workspaces/optimum/optimum/exporters/onnx/convert.py", line 910, in export
    config.fix_dynamic_axes(output, device=device, input_shapes=input_shapes, dtype=dtype)
  File "/workspaces/optimum/optimum/exporters/onnx/base.py", line 335, in fix_dynamic_axes
    outputs = session.run(None, onnx_inputs)
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Expand node. Name:'/layers.0/self_attn/rotary_emb/Expand' Status Message: invalid expand shape
Export logs with change:
$ optimum-cli export onnx -m hf-internal-testing/tiny-random-CohereModel o
Framework not specified. Using pt to export the model.
/usr/local/python/3.10.13/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Automatic task detection to feature-extraction-with-past (possible synonyms are: default-with-past).
/usr/local/python/3.10.13/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: CohereModel *****
Using framework PyTorch: 2.2.2+cu121
Overriding 1 configuration item(s)
        - use_cache -> True
/usr/local/python/3.10.13/lib/python3.10/site-packages/transformers/models/cohere/modeling_cohere.py:1014: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if sequence_length != 1:
Post-processing the exported models...
Deduplicating shared (tied) weights...

Validating ONNX model o/model.onnx...
        -[✓] ONNX model output names match reference model (last_hidden_state, present.0.value, present.1.key, present.0.key, present.1.value)
        - Validating ONNX Model output "last_hidden_state":
                -[✓] (2, 16, 32) matches (2, 16, 32)
                -[✓] all values close (atol: 1e-05)
        - Validating ONNX Model output "present.0.key":
                -[✓] (2, 4, 16, 8) matches (2, 4, 16, 8)
                -[✓] all values close (atol: 1e-05)
        - Validating ONNX Model output "present.0.value":
                -[✓] (2, 4, 16, 8) matches (2, 4, 16, 8)
                -[✓] all values close (atol: 1e-05)
        - Validating ONNX Model output "present.1.key":
                -[✓] (2, 4, 16, 8) matches (2, 4, 16, 8)
                -[✓] all values close (atol: 1e-05)
        - Validating ONNX Model output "present.1.value":
                -[✓] (2, 4, 16, 8) matches (2, 4, 16, 8)
                -[✓] all values close (atol: 1e-05)
The ONNX export succeeded and the exported model was saved at: o

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@xenova xenova changed the title [WIP] Add cohere ONNX export support Add Cohere ONNX export support Jun 12, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@fxmarty
Copy link
Contributor

fxmarty commented Jun 24, 2024

Thank you @xenova, would you like to use the model patcher to patch this bit of code for the torchscript export?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants