Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeGen: Rewrite dot product lowering using a dedicated IR instruction #1512

Merged
merged 6 commits into from
Nov 9, 2024

Commits on Nov 8, 2024

  1. CodeGen: Implement support for vdpps AVX instruction

    This will help optimize lowering of vector.dot and vector.normalize
    zeux committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    d2c008c View commit details
    Browse the repository at this point in the history
  2. CodeGen: Implement DOT_VEC IR opcode

    This exposes vdpps on X64 and allows to compute a 3-wide dot product for
    two vectors, returning the result as a number.
    zeux committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    74a9128 View commit details
    Browse the repository at this point in the history
  3. CodeGen: Use DOT_VEC under a fast flag for lowering vector lib

    This is useful for vector.dot, vector.magnitude and vector.normalize.
    zeux committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    cd73807 View commit details
    Browse the repository at this point in the history
  4. CodeGen: Implement a naive version of A64 DOT_VEC

    This is using existing instructions and scalar adds to have a baseline.
    This is still faster than the original implementation of vector. ops.
    zeux committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    8fc458e View commit details
    Browse the repository at this point in the history
  5. CodeGen: Implement faddp opcode for A64

    We now support scalar version of faddp opcode which can add the
    first two floats of the vector into the first scalar of the destination.
    zeux committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    6ebac70 View commit details
    Browse the repository at this point in the history
  6. CodeGen: Rewrite DOT_VEC lowering for A64 using faddp

    This results in about the same performance as a naive version on M2,
    but uses fewer registers and is what clang generates for a similar
    source.
    zeux committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    81b691b View commit details
    Browse the repository at this point in the history