Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf #4

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from
Draft

Perf #4

wants to merge 16 commits into from

Conversation

jerinphilip
Copy link
Owner

@jerinphilip jerinphilip commented Aug 17, 2023

Reports a few batch-level metrics (wps, occupancy) and some aggregates. Mostly to verify stdin read and batching is working favourably before starting more perf analysis.

# Download WNGT20 dataset into data/wngt20
bash scripts/download-wngt20.sh

# Use a python script to sort for better batching
# Sorted file is available as data/wngt20/sources.shuf.sorted
python3 scripts/order-sources-shuf.py data/wngt20/sources.shuf

TODO:

  • gprof
  • I want to try tracy, just for TIL goals. See if it's any useful.
  • cachegrind
  • Some mechanism to get speed per commit, so we can track speed aspect progress. Mandate to continuously improve, and absolutely do not degrade. perf.iree.dev?
  • Conceptual improvements - maybe prune samples as they complete during greedy-decoding to do fewer matmuls?
  • There's some trickling down possible from 512 -> 256 -> 128 bit SIMD registers in certain functions, not sure how much of a speedup this will give.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant