benchmark_kdtree

koide3 · Mar 30, 2024 · fc3f9ce · fc3f9ce
1 parent 7c31fe1
commit fc3f9ce
Show file tree

Hide file tree

Showing 11 changed files with 2,825 additions and 10 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -106,6 +106,20 @@ if(BUILD_BENCHMARKS)
     ${Iridescence_LIBRARIES}
   )
 
+  # KdTree construction benchmark
+  add_executable(kdtree_benchmark
+    src/kdtree_benchmark.cpp
+  )
+  target_include_directories(kdtree_benchmark PUBLIC
+    include
+    ${TBB_INCLUDE_DIRS}
+    ${EIGEN3_INCLUDE_DIR}
+  )
+  target_link_libraries(kdtree_benchmark
+    fmt::fmt
+    ${TBB_LIBRARIES}
+  )
+
   if(BUILD_WITH_PCL)
     # Downsampling benchmark
     add_executable(downsampling_benchmark

diff --git a/README.md b/README.md
@@ -1,8 +1,8 @@
 # small_gicp (fast_gicp2)
 
-**small_gicp** is a header-only C++ library that provides efficient and parallelized fine point cloud registration algorithms (ICP, Point-to-Plane ICP, GICP, VGICP, etc.). It is a regined and optimized version of its predecessor, [fast_gicp](https://github.com/SMRT-AIST/fast_gicp), with the following features. 
+**small_gicp** is a header-only C++ library that provides efficient and parallelized fine point cloud registration algorithms (ICP, Point-to-Plane ICP, GICP, VGICP, etc.). It is a refined and optimized version of its predecessor, [fast_gicp](https://github.com/SMRT-AIST/fast_gicp), with the following features. 
 
-- **Highly optimized** : The implementation of the core registration algorithm is further optimized from that in fast_gicp. It can provide up to 2x speed up compared to fast_gicp.
+- **Highly ptimized** : The implementation of the core registration algorithm is further optimized from that in fast_gicp. It can provide up to 2x speed up compared to fast_gicp.
 - **All parallerized** : small_gicp provides parallelized implementations of several algorithms in the point cloud registration process (Downsampling, KdTree construction, Normal/covariance estimation). As a parallelism backend, either (or both) of [OpenMP](https://www.openmp.org/) and [Intel TBB](https://github.com/oneapi-src/oneTBB) can be used. 
 - **Minimum dependency** : Only [Eigen](https://eigen.tuxfamily.org/) (and bundled [nanoflann](https://github.com/jlblancoc/nanoflann) and [Sophus](https://github.com/strasdat/Sophus)) are required at a minimum. Optionally, it provides the [PCL](https://pointclouds.org/) registration interface so that it can be used as a drop-in replacement in many systems.
 - **Customizable** : small_gicp is implemented with the trait mechanism that allows feeding any custom point cloud class to the registration algorithm. Furthermore, the template-based implementation allows customizing the regisration process with your original correspondence estimator and registration factors.
@@ -241,13 +241,21 @@ Coming soon.
 
 ### Downsampling
 
-- Single-thread `small_gicp::voxelgrid_sampling` is about 1.3x faster than `pcl::VoxelGrid`.
-- Multi-thread `small_gicp::voxelgrid_sampling_tbb` (6 threads) is about 3.2x faster than `pcl::VoxelGrid`.
+- Single-threaded `small_gicp::voxelgrid_sampling` is about 1.3x faster than `pcl::VoxelGrid`.
+- Multi-threaded `small_gicp::voxelgrid_sampling_tbb` (6 threads) is about 3.2x faster than `pcl::VoxelGrid`.
 - `small_gicp::voxelgrid_sampling` gives accurate downsampling results (almost identical to those of `pcl::VoxelGrid`) while `pcl::ApproximateVoxelGrid` yields spurious points (up to 2x points).
-- `small_gicp::voxelgrid_sampling` can process a larger point cloud with a fine voxel resolution compared to `pcl::VoxelGrid`.
+- `small_gicp::voxelgrid_sampling` can process a larger point cloud with a fine voxel resolution compared to `pcl::VoxelGrid` (for a point cloud of 150m width, minimum voxel resolution can be 0.07 mm).
 
 ![downsampling_comp](docs/assets/downsampling_comp.png)
 
+### KdTree construction
+
+- Multi-threaded implementation (TBB and OMP) can be up to 4x faster than the single-threaded one (All the implementations are based on nanoflann).
+- Basically the processing speed get faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?).
+- This benchmark only compares the construction time (query time is not included). 
+
+![kdtree_time](docs/assets/kdtree_time.png)
+
 ### Odometry estimation
 
 - Single-thread `small_gicp::GICP` is about 2.4x and 1.9x faster than `pcl::GICP` and `fast_gicp::GICP`, respectively.
@@ -259,6 +267,8 @@ Coming soon.
 ## License
 This package is released under the MIT license.
 
+If you find this package useful for your project, please consider leaving a comment here. It would help the author gain internal recognition in his organization and keep working on this project.
+
 ## Papers
 - Kenji Koide, Masashi Yokozuka, Shuji Oishi, and Atsuhiko Banno, Voxelized GICP for Fast and Accurate 3D Point Cloud Registration, ICRA2021
 

diff --git a/docker/Dockerfile.gcc b/docker/Dockerfile.gcc
@@ -18,7 +18,7 @@ RUN rm -rf ./*
 RUN cmake .. -DBUILD_WITH_TBB=ON
 RUN cmake --build . -j$(nproc)
 
-RUN cmake .. -DBUILD_TESTS=ON -DBUILD_WITH_TBB=ON -DBUILD_BENCHMARKS=ON -DBUILD_WITH_PCL=ON
+RUN cmake .. -DBUILD_TESTS=ON -DBUILD_EXAMPLES=ON -DBUILD_BENCHMARKS=ON -DBUILD_WITH_TBB=ON -DBUILD_WITH_PCL=ON
 RUN cmake --build . -j$(nproc)
 RUN ctest -j$(nproc)
 

diff --git a/docs/assets/kdtree_time.png b/docs/assets/kdtree_time.png
diff --git a/include/small_gicp/util/downsampling.hpp b/include/small_gicp/util/downsampling.hpp
@@ -29,8 +29,12 @@ std::shared_ptr<OutputPointCloud> voxelgrid_sampling(const InputPointCloud& poin
 
   std::vector<std::pair<std::uint64_t, size_t>> coord_pt(points.size());
   for (size_t i = 0; i < traits::size(points); i++) {
-    // TODO: Check if coord is within 21bit range
     const Eigen::Array4i coord = fast_floor(traits::point(points, i) * inv_leaf_size) + coord_offset;
+    if ((coord < 0).any() || (coord > coord_bit_mask).any()) {
+      std::cerr << "warning: voxel coord is out of range!!" << std::endl;
+      coord_pt[i] = {0, i};
+      continue;
+    }
 
     // Compute voxel coord bits (0|1bit, z|21bit, y|21bit, x|21bit)
     const std::uint64_t bits =                                 //

diff --git a/include/small_gicp/util/downsampling_omp.hpp b/include/small_gicp/util/downsampling_omp.hpp
@@ -30,9 +30,12 @@ std::shared_ptr<OutputPointCloud> voxelgrid_sampling_omp(const InputPointCloud&
   std::vector<std::pair<std::uint64_t, size_t>> coord_pt(points.size());
 #pragma omp parallel for num_threads(num_threads) schedule(guided, 32)
   for (size_t i = 0; i < traits::size(points); i++) {
-    // TODO: Check if coord is within 21bit range
     const Eigen::Array4i coord = fast_floor(traits::point(points, i) * inv_leaf_size) + coord_offset;
-
+    if ((coord < 0).any() || (coord > coord_bit_mask).any()) {
+      std::cerr << "warning: voxel coord is out of range!!" << std::endl;
+      coord_pt[i] = {0, i};
+      continue;
+    }
     // Compute voxel coord bits (0|1bit, z|21bit, y|21bit, x|21bit)
     const std::uint64_t bits =                                 //
       ((coord[0] & coord_bit_mask) << (coord_bit_size * 0)) |  //

diff --git a/include/small_gicp/util/downsampling_tbb.hpp b/include/small_gicp/util/downsampling_tbb.hpp
@@ -30,8 +30,12 @@ std::shared_ptr<OutputPointCloud> voxelgrid_sampling_tbb(const InputPointCloud&
   std::vector<std::pair<std::uint64_t, size_t>> coord_pt(points.size());
   tbb::parallel_for(tbb::blocked_range<size_t>(0, traits::size(points), 64), [&](const tbb::blocked_range<size_t>& range) {
     for (size_t i = range.begin(); i != range.end(); i++) {
-      // TODO: Check if coord is within 21bit range
       const Eigen::Array4i coord = fast_floor(traits::point(points, i) * inv_leaf_size) + coord_offset;
+      if ((coord < 0).any() || (coord > coord_bit_mask).any()) {
+        std::cerr << "warning: voxel coord is out of range!!" << std::endl;
+        coord_pt[i] = {0, i};
+        continue;
+      }
 
       // Compute voxel coord bits (0|1bit, z|21bit, y|21bit, x|21bit)
       const std::uint64_t bits =                                 //