You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importtimeimportonnxruntimeimportnumpyasnp# Set the random seednp.random.seed(0)
onnx_model_path='model.onnx'# Load the ONNX model with the CPUExecutionProviderort_session=onnxruntime.InferenceSession(onnx_model_path, providers=['CPUExecutionProvider'])
ort_session.get_modelmeta()
inputs=ort_session.get_inputs()
nth=100000# Warm-up inference to cache optimizationsinput_data=np.load("input.npy", allow_pickle=True).item()
ort_session.run(None, input_data)
# Measure inference time excluding input creationtotal_time_ns=0for_inrange(nth):
start_ns=time.perf_counter_ns()
ort_session.run(None, input_data)
end_ns=time.perf_counter_ns()
total_time_ns+=end_ns-start_nsavg_time_ns=total_time_ns/nthavg_time_ms=avg_time_ns/1e6print(f'[{onnxruntime.__version__}] Average inference time: {avg_time_ms:.5f} ms')
Describe the issue
From commit 2cdc05f, ONNX Runtime (ORT) no longer performs Gelu fusion, resulting in a 4X performance slowdown.
Bisect range: de7a02b .. 2cdc05f.
Optimized model of de7a02b
Optimized model of 2cdc05f
Performance Comparison
To reproduce
Urgency
No response
Platform
Linux
OS Version
6.8.0
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.20.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
model.zip
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: