matrixmultiply

General matrix multiplication for f32, f64 matrices.

Allows arbitrary row, column strided matrices.

Uses the same microkernel algorithm as BLIS, but in a much simpler and less featureful implementation. See their multithreading page for a very good diagram over how the algorithm partitions the matrix (Note: this crate does not implement multithreading).

Please read the API documentation here

Blog posts about this crate:

A Gemmed Rabbit Hole

NOTE: Compile this crate using RUSTFLAGS="-C target-cpu=native" so that the compiler can produce the best output.

Recent Changes

0.1.14
- Avoid an unused code warning
0.1.13
- Pick 8x8 sgemm (f32) kernel when AVX target feature is enabled (with Rust 1.14 or later, no effect otherwise).
- Use rawpointer, a µcrate with raw pointer methods taken from this project.
0.1.12
- Internal cleanup with retained performance
0.1.11
- Adjust sgemm (f32) kernel to optimize better on recent Rust.
0.1.10
- Update doc links to docs.rs
0.1.9
- Workaround optimization regression in rust nightly (1.12-ish) (#9)
0.1.8
- Improved docs
0.1.7
- Reduce overhead slightly for small matrix multiplication problems by using only one allocation call for both packing buffers.
0.1.6
- Disable manual loop unrolling in debug mode (quicker debug builds)
0.1.5
- Update sgemm to use a 4x8 microkernel (“still in simplistic rust”), which improves throughput by 10%.
0.1.4
- Prepare support for aligned packed buffers
- Update dgemm to use a 8x4 microkernel, still in simplistic rust, which improves throughput by 10-20% when using AVX.
0.1.3
- Silence some debug prints
0.1.2
- Major performance improvement for sgemm and dgemm (20-30% when using AVX). Since it all depends on what the optimizer does, I'd love to get issue reports that report good or bad performance.
- Made the kernel masking generic, which is a cleaner design
0.1.1
- Minor improvement in the kernel

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
src		src
target/rls		target/rls
.travis.yml		.travis.yml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.rst		README.rst
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

matrixmultiply

Recent Changes

About

Licenses found

Releases

Packages

Languages

License

Licenses found

Ramla-I/matrixmultiply

Folders and files

Latest commit

History

Repository files navigation

matrixmultiply

Recent Changes

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages