Performance and Accuracy Benchmarks #53

jamescho72 · 2024-09-29T21:17:48Z

Setup and run benchmarks against our continue.dev/ollama/granite environment.
Run baselines against our competitors Deepseek2.5 2.4B active and 21B active, codestral-mamba 7B, llama3-8B-instruct, and granite 8B instruct 128k context length.

Find 100 line code example
Ask chat to document
Measure latency (how long to complete)
Measure accuracy (How many lines of documentation was generated, How accurate/correct was the documentation IE 9/10 lines correctly)
Measure CPU consumption, Memory consumption
Automate/standardize the test as much as possible

harshmittalibm · 2024-10-07T14:28:42Z

I have put my initial findings here -

https://ibm.box.com/s/l69aksjokmnwdb6u2frpd715d6537pq8

It consists of the latency comparison between different models. I will update it with latency of documentation and its accuracy.

jamescho72 assigned harshmittalibm Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance and Accuracy Benchmarks #53

Performance and Accuracy Benchmarks #53

jamescho72 commented Sep 29, 2024

harshmittalibm commented Oct 7, 2024

Performance and Accuracy Benchmarks #53

Performance and Accuracy Benchmarks #53

Comments

jamescho72 commented Sep 29, 2024

harshmittalibm commented Oct 7, 2024