-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move math types over to nalgebra with Generic dimensions > 2 and f32/f64 support #96
Conversation
Added the f32x3 optimization, It can be done quite easily to also handle f32x2,3,4, and f64x2,3,4, which I will do later. Some new bench numbers:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, great stuff. There appears to be some dead code that should be handled.
use crate::bounding_hierarchy::{BHShape, BoundingHierarchy}; | ||
use crate::ray::Ray; | ||
|
||
// TODO These all need to be realtyped and bounded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO current?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I plan to expand the unit tests later to handle 2d and 4d cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like to do that in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marstaik ping
src/ray.rs
Outdated
#[inline(always)] | ||
#[cfg(target_arch = "x86_64")] | ||
fn vec_to_mm(vec: &SVector<f32, 3>) -> __m128 { | ||
unsafe { _mm_set_ps(vec.z, vec.z, vec.y, vec.x) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really digging all this custom SIMD but I'm wondering whether perhaps packed_simd might be the way forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
packed_simd says on the readme that std::simd is the way forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm true. I didn't know there's still no good stabilized SIMD stuff in Rust. What do you think instead about wide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into it and maybe trying Simba (what nalgebra uses under the hood) again. But with Simba I had performance magnitudes worse that with the pure simd, and I'm unsure why. I was having difficulty looking into it with cargo asm due to windows linking issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also simdeez which might be worth checking out.
src/aabb.rs
Outdated
/// use bvh::aabb::AABB; | ||
/// | ||
/// # fn main() { | ||
/// let aabb :AABB<f32,3> = AABB::infinite(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some dead code here.
@marstaik Are you likely to make more bigger contributions? I could give you contributor access if you're interested to help with development and maintenance but I couldn't find your email or Matrix for further communication. If you're interested, it might be good to talk first. |
@svenstaro I can use SIMD with pure The issue I am having is that I still need the specialization feature from nightly to be safely able to handle all cases and dimensions. I am currently trying to find a way to do this without, but it seems very convoluted. I have a generic ray_intersects_bvh for any Vector size, but I need to manually specialize for the lower dimension cases, as well as for when x86_64 is available. |
Well realistically people will mostly be using this on x86_64. However, even if we make good performance nightly only for the time being, that should be put behind a feature flag so people can still choose to use this on stable. |
I'm not sure it was clear earlier: If we have to use nightly for the speedy stuff anyway, then might as well use the most convenient solution to achieve that so you don't necessarily have to switch to |
…tly and "full_simd" feature
I ended up sticking with I added a feature flag called "full_simd" that only works with nightly enabled, allowing for specialization of the fast vectors. I would like to add some benches and tests for 2d and 4d as well. |
Please make sure to change the CI so it runs twice: Once with and once without the |
Cargo.toml
Outdated
@@ -30,7 +30,7 @@ doc-comment = "0.3" | |||
|
|||
[features] | |||
bench = [] | |||
serde = ["dep:serde", "glam/serde"] | |||
full_simd = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to document this in the README.
Hey, I fixed the CI builds up a bit and added the SIMD builds via nightly only. The maintainer of the script for the rust environment disappeared and CI complained node was out of date, so I just overhauled it a bit. I fixed all of the clippy errors, and all that was left were errors about BVH and AABB naming conventions so I also renamed BVH, AABB, BVHNode to Bvh, Aabb, BvhNode, and made the "Testing" versions start with a "T" to see how it would look. I kind of like it, but if you don't want to do a breaking change name wise I can revert those changes, and we can ignore on clippy again. I fixed a couple of other issues clippy complained about as well. |
Also, the README probably needs an update to reflect new data about rebuilding times versus building the tree from scratch again, because the rebuild numbers have drastically changed. That and the generics/testing of f32/f64 N dimensions would be nice to get done in a separate commit. |
.github/workflows/ci.yml
Outdated
- name: cargo build | ||
uses: actions-rs/cargo@v1 | ||
with: | ||
command: build | ||
args: --workspace | ||
run: cargo build --workspace | ||
|
||
- name: cargo test | ||
uses: actions-rs/cargo@v1 | ||
with: | ||
command: test | ||
|
||
- name: cargo fmt | ||
uses: actions-rs/cargo@v1 | ||
with: | ||
command: fmt | ||
args: --all -- --check | ||
|
||
- name: cargo clippy | ||
uses: actions-rs/cargo@v1 | ||
with: | ||
command: clippy | ||
args: --workspace --all-targets --all-features -- -D warnings | ||
if: matrix.rust == 'nightly' | ||
|
||
# - name: Run cargo-tarpaulin | ||
# uses: actions-rs/[email protected] | ||
# if: matrix.os == 'ubuntu-latest' && matrix.rust == 'stable' | ||
run: cargo test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this just
- run: cargo build
- run: cargo test
Since we're not actually using a workspace, we don't need the extra flag.
.github/workflows/ci.yml
Outdated
- name: cargo build | ||
run: cargo build --workspace --features simd | ||
|
||
- name: cargo test | ||
run: cargo test --features simd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
.github/workflows/ci.yml
Outdated
- name: Run clippy | ||
run: cargo clippy --workspace --all-targets --all-features -- -D warnings | ||
|
||
build-and-test-no-features: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build-and-test-no-features: | |
build-and-test-no-simd: |
.github/workflows/ci.yml
Outdated
run: cargo clippy --workspace --all-targets --all-features -- -D warnings | ||
|
||
build-and-test-no-features: | ||
name: CI with ${{ matrix.rust }} on ${{ matrix.os }} [no-features] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: CI with ${{ matrix.rust }} on ${{ matrix.os }} [no-features] | |
name: CI with ${{ matrix.rust }} on ${{ matrix.os }} [no SIMD] |
.github/workflows/ci.yml
Outdated
|
||
- name: Upload coverage report to codecov.io | ||
uses: codecov/codecov-action@v1 | ||
if: matrix.os == 'ubuntu-latest' && matrix.rust == 'stable' | ||
|
||
build-and-test-simd: | ||
name: CI with nightly on ${{ matrix.os }} [simd] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: CI with nightly on ${{ matrix.os }} [simd] | |
name: CI with nightly on ${{ matrix.os }} [SIMD] |
README.md
Outdated
@@ -12,42 +12,42 @@ volume hierarchies.** | |||
## About | |||
|
|||
This crate can be used for applications which contain intersection computations of rays | |||
with primitives. For this purpose a binary tree BVH (Bounding Volume Hierarchy) is of great | |||
use if the scene which the ray traverses contains a huge number of primitives. With a BVH the | |||
with primitives. For this purpose a binary tree Bvh (Bounding Volume Hierarchy) is of great |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert the capitalization changes in docs and other prose where the docs do not reference a type. The code change is fine and in line with Rust best practices, though. I think this was probably renamed by mistake by find and replace.
Also check other files.
Ideally this rename would have been done in another PR since this one is gargantuan anyway but oh well.
src/aabb.rs
Outdated
use nalgebra::ClosedAdd; | ||
use nalgebra::ClosedMul; | ||
use nalgebra::ClosedSub; | ||
use nalgebra::Point; | ||
use nalgebra::SVector; | ||
use nalgebra::Scalar; | ||
use nalgebra::SimdPartialOrd; | ||
use num::Float; | ||
use num::FromPrimitive; | ||
use num::One; | ||
use num::Signed; | ||
use num::Zero; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge these.
I've also added a dummy |
I think that should be everything :) |
README.md
Outdated
@@ -83,7 +95,7 @@ it is faster to update the tree, instead of rebuilding it from scratch. | |||
First of all, optimizing is not helpful if more than half of the scene is not static. | |||
This is due to how optimizing takes place: | |||
Given a set of indices of all shapes which have changed, the optimize procedure tries to rotate fixed constellations | |||
in search for a better surface area heuristic (SAH) value. This is done recursively from bottom to top while fixing the AABBs | |||
in search for a better surface area heuristic (SAH) value. This is done recursively from bottom to top while fixing the Aabbs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still a few miss-capitalized occurrences like this in prose and docs. Could you do another pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, fixed! I didn't actually find any in the docs though.
@svenstaro is this ready for merge? |
Will take another look in a bit. |
I'm doing one final check now. One thing I was wondering about: Should we kick out optimization? It never performed super well it did degrade the tree quickly. Should we perhaps just focus on super quick tree building using multithreading and SIMD? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, very close now. I'm just missing some docs on the SIMD functions and some capitalization fixes.
//! By passing the indices of shapes that have changed, the function determines possible | ||
//! tree rotations and optimizes the BVH using a SAH. | ||
//! tree rotations and optimizes the Bvh using a SAH. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! tree rotations and optimizes the Bvh using a SAH. | |
//! tree rotations and optimizes the BVH using a SAH. |
src/bvh/optimization.rs
Outdated
use rand::{thread_rng, Rng}; | ||
use std::collections::HashSet; | ||
|
||
// TODO Consider: Instead of getting the scene's shapes passed, let leaf nodes store an AABB | ||
// TODO Consider: Instead of getting the scene's shapes passed, let leaf nodes store an Aabb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO Consider: Instead of getting the scene's shapes passed, let leaf nodes store an Aabb | |
// TODO Consider: Instead of getting the scene's shapes passed, let leaf nodes store an AABB |
src/bvh/optimization.rs
Outdated
// that is updated from the outside, perhaps by passing not only the indices of the changed | ||
// shapes, but also their new AABBs into optimize(). | ||
// TODO Consider: Stop updating AABBs upwards the tree once an AABB didn't get changed. | ||
// shapes, but also their new Aabbs into optimize(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// shapes, but also their new Aabbs into optimize(). | |
// shapes, but also their new AABBs into optimize(). |
src/bvh/optimization.rs
Outdated
// shapes, but also their new AABBs into optimize(). | ||
// TODO Consider: Stop updating AABBs upwards the tree once an AABB didn't get changed. | ||
// shapes, but also their new Aabbs into optimize(). | ||
// TODO Consider: Stop updating Aabbs upwards the tree once an Aabb didn't get changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO Consider: Stop updating Aabbs upwards the tree once an Aabb didn't get changed. | |
// TODO Consider: Stop updating AABBs upwards the tree once an AABB didn't get changed. |
src/bvh/optimization.rs
Outdated
/// Based on | ||
/// [`https://github.com/jeske/SimpleScene/blob/master/SimpleScene/Util/ssBVH/ssBVH.cs`] | ||
/// [`https://github.com/jeske/SimpleScene/blob/master/SimpleScene/Util/ssBvh/ssBvh.cs`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// [`https://github.com/jeske/SimpleScene/blob/master/SimpleScene/Util/ssBvh/ssBvh.cs`] | |
/// [`https://github.com/jeske/SimpleScene/blob/master/SimpleScene/Util/ssBVH/ssBVH.cs`] |
src/flat_bvh.rs
Outdated
}; | ||
|
||
#[bench] | ||
/// Benchmark the flattening of a BVH with 120,000 triangles. | ||
/// Benchmark the flattening of a Bvh with 120,000 triangles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Benchmark the flattening of a Bvh with 120,000 triangles. | |
/// Benchmark the flattening of a BVH with 120,000 triangles. |
use crate::bounding_hierarchy::{BHShape, BoundingHierarchy}; | ||
use crate::ray::Ray; | ||
|
||
// TODO These all need to be realtyped and bounded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marstaik ping
src/testbase.rs
Outdated
@@ -515,17 +522,17 @@ fn bench_intersect_sponza_list(b: &mut ::test::Bencher) { | |||
intersect_list(&triangles, &bounds, b); | |||
} | |||
|
|||
/// Benchmark intersecting the `triangles` list with `AABB` checks, but without acceleration | |||
/// Benchmark intersecting the `triangles` list with `Aabb` checks, but without acceleration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Benchmark intersecting the `triangles` list with `Aabb` checks, but without acceleration | |
/// Benchmark intersecting the `triangles` list with `AABB` checks, but without acceleration |
src/testbase.rs
Outdated
let mut seed = 0; | ||
b.iter(|| { | ||
let ray = create_ray(&mut seed, bounds); | ||
|
||
// Iterate over the list of triangles. | ||
for triangle in triangles { | ||
// First test whether the ray intersects the AABB of the triangle. | ||
// First test whether the ray intersects the Aabb of the triangle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// First test whether the ray intersects the Aabb of the triangle. | |
// First test whether the ray intersects the AABB of the triangle. |
@svenstaro I'll make the requested changes and look into things later today - but before I do that I was wondering if the Aabb to AABB capitalization changes in the docs make sense - they are generally referring to them via their It doesn't bother me either way but I wanted to note it. |
That's a good point. Let's make them clickable where they refer to the specific classes by doing
|
Once this PR lands I can take a look at adapting my work in #80. I think there's probably 3 PRs worth of stuff in that. The multithreaded builds, the replacement of optimize with the ability to add/remove individual nodes, and the additional shapes you can query with. The return of nalgebra to this crate makes the stuff I had been doing to have 64bit bvhs obsolete which makes me very happy. |
I think we need to re-look at this. The new rebuild times are really really fast (400%) faster eg 200 sec to 40 sec for 50% modified. It might actually be worth it now. Maybe we can look into a better self optimizing bvh on removals and additions as well. We should definitely look at both and compare, especially after multithreaded building is done. I wonder if optimizing can be improved via multithreading too thoough. Maybe @dbenson24 can look into it, as my next focus will probably be better unit tests and some more docs. |
If I recall correctly, the change to make optimize utilize the add/remove node functions was because that method produced trees with much faster traversal times compared to optimize |
I may have gone overboard on the document cleaning :) |
Merging as-is. Amazing work! |
BVH should now work (in-theory) with 2d,3d,4d+ and with f32,f64 using the base nalgebra types.
Consistent performance, everything does mostly get SIMD. On my older first gen threadripper, I see up to 10% performance improvements across the board. Those that match or are slightly above are usually within error.
Axis and some re-exports were simply deleted as they aren't needed when using the nalgebra types.
The only outliers are:
bench_intersects_aabb
, which runs a bit slower than the fastest variant of before nalgebrabench_optimize_bvh_*
The optimization tests run over 200% faster, and blow the old optimize code out the water. It is actually faster to optimize now than to rebuild.