Skip to content

Commit

Permalink
benchmark_kdtree
Browse files Browse the repository at this point in the history
  • Loading branch information
koide3 committed Mar 30, 2024
1 parent 7c31fe1 commit fc3f9ce
Show file tree
Hide file tree
Showing 11 changed files with 2,825 additions and 10 deletions.
14 changes: 14 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,20 @@ if(BUILD_BENCHMARKS)
${Iridescence_LIBRARIES}
)

# KdTree construction benchmark
add_executable(kdtree_benchmark
src/kdtree_benchmark.cpp
)
target_include_directories(kdtree_benchmark PUBLIC
include
${TBB_INCLUDE_DIRS}
${EIGEN3_INCLUDE_DIR}
)
target_link_libraries(kdtree_benchmark
fmt::fmt
${TBB_LIBRARIES}
)

if(BUILD_WITH_PCL)
# Downsampling benchmark
add_executable(downsampling_benchmark
Expand Down
20 changes: 15 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# small_gicp (fast_gicp2)

**small_gicp** is a header-only C++ library that provides efficient and parallelized fine point cloud registration algorithms (ICP, Point-to-Plane ICP, GICP, VGICP, etc.). It is a regined and optimized version of its predecessor, [fast_gicp](https://github.com/SMRT-AIST/fast_gicp), with the following features.
**small_gicp** is a header-only C++ library that provides efficient and parallelized fine point cloud registration algorithms (ICP, Point-to-Plane ICP, GICP, VGICP, etc.). It is a refined and optimized version of its predecessor, [fast_gicp](https://github.com/SMRT-AIST/fast_gicp), with the following features.

- **Highly optimized** : The implementation of the core registration algorithm is further optimized from that in fast_gicp. It can provide up to 2x speed up compared to fast_gicp.
- **Highly ptimized** : The implementation of the core registration algorithm is further optimized from that in fast_gicp. It can provide up to 2x speed up compared to fast_gicp.
- **All parallerized** : small_gicp provides parallelized implementations of several algorithms in the point cloud registration process (Downsampling, KdTree construction, Normal/covariance estimation). As a parallelism backend, either (or both) of [OpenMP](https://www.openmp.org/) and [Intel TBB](https://github.com/oneapi-src/oneTBB) can be used.
- **Minimum dependency** : Only [Eigen](https://eigen.tuxfamily.org/) (and bundled [nanoflann](https://github.com/jlblancoc/nanoflann) and [Sophus](https://github.com/strasdat/Sophus)) are required at a minimum. Optionally, it provides the [PCL](https://pointclouds.org/) registration interface so that it can be used as a drop-in replacement in many systems.
- **Customizable** : small_gicp is implemented with the trait mechanism that allows feeding any custom point cloud class to the registration algorithm. Furthermore, the template-based implementation allows customizing the regisration process with your original correspondence estimator and registration factors.
Expand Down Expand Up @@ -241,13 +241,21 @@ Coming soon.

### Downsampling

- Single-thread `small_gicp::voxelgrid_sampling` is about 1.3x faster than `pcl::VoxelGrid`.
- Multi-thread `small_gicp::voxelgrid_sampling_tbb` (6 threads) is about 3.2x faster than `pcl::VoxelGrid`.
- Single-threaded `small_gicp::voxelgrid_sampling` is about 1.3x faster than `pcl::VoxelGrid`.
- Multi-threaded `small_gicp::voxelgrid_sampling_tbb` (6 threads) is about 3.2x faster than `pcl::VoxelGrid`.
- `small_gicp::voxelgrid_sampling` gives accurate downsampling results (almost identical to those of `pcl::VoxelGrid`) while `pcl::ApproximateVoxelGrid` yields spurious points (up to 2x points).
- `small_gicp::voxelgrid_sampling` can process a larger point cloud with a fine voxel resolution compared to `pcl::VoxelGrid`.
- `small_gicp::voxelgrid_sampling` can process a larger point cloud with a fine voxel resolution compared to `pcl::VoxelGrid` (for a point cloud of 150m width, minimum voxel resolution can be 0.07 mm).

![downsampling_comp](docs/assets/downsampling_comp.png)

### KdTree construction

- Multi-threaded implementation (TBB and OMP) can be up to 4x faster than the single-threaded one (All the implementations are based on nanoflann).
- Basically the processing speed get faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?).
- This benchmark only compares the construction time (query time is not included).

![kdtree_time](docs/assets/kdtree_time.png)

### Odometry estimation

- Single-thread `small_gicp::GICP` is about 2.4x and 1.9x faster than `pcl::GICP` and `fast_gicp::GICP`, respectively.
Expand All @@ -259,6 +267,8 @@ Coming soon.
## License
This package is released under the MIT license.

If you find this package useful for your project, please consider leaving a comment here. It would help the author gain internal recognition in his organization and keep working on this project.

## Papers
- Kenji Koide, Masashi Yokozuka, Shuji Oishi, and Atsuhiko Banno, Voxelized GICP for Fast and Accurate 3D Point Cloud Registration, ICRA2021

Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile.gcc
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ RUN rm -rf ./*
RUN cmake .. -DBUILD_WITH_TBB=ON
RUN cmake --build . -j$(nproc)

RUN cmake .. -DBUILD_TESTS=ON -DBUILD_WITH_TBB=ON -DBUILD_BENCHMARKS=ON -DBUILD_WITH_PCL=ON
RUN cmake .. -DBUILD_TESTS=ON -DBUILD_EXAMPLES=ON -DBUILD_BENCHMARKS=ON -DBUILD_WITH_TBB=ON -DBUILD_WITH_PCL=ON
RUN cmake --build . -j$(nproc)
RUN ctest -j$(nproc)

Expand Down
Binary file added docs/assets/kdtree_time.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 5 additions & 1 deletion include/small_gicp/util/downsampling.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,12 @@ std::shared_ptr<OutputPointCloud> voxelgrid_sampling(const InputPointCloud& poin

std::vector<std::pair<std::uint64_t, size_t>> coord_pt(points.size());
for (size_t i = 0; i < traits::size(points); i++) {
// TODO: Check if coord is within 21bit range
const Eigen::Array4i coord = fast_floor(traits::point(points, i) * inv_leaf_size) + coord_offset;
if ((coord < 0).any() || (coord > coord_bit_mask).any()) {
std::cerr << "warning: voxel coord is out of range!!" << std::endl;
coord_pt[i] = {0, i};
continue;
}

// Compute voxel coord bits (0|1bit, z|21bit, y|21bit, x|21bit)
const std::uint64_t bits = //
Expand Down
7 changes: 5 additions & 2 deletions include/small_gicp/util/downsampling_omp.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,12 @@ std::shared_ptr<OutputPointCloud> voxelgrid_sampling_omp(const InputPointCloud&
std::vector<std::pair<std::uint64_t, size_t>> coord_pt(points.size());
#pragma omp parallel for num_threads(num_threads) schedule(guided, 32)
for (size_t i = 0; i < traits::size(points); i++) {
// TODO: Check if coord is within 21bit range
const Eigen::Array4i coord = fast_floor(traits::point(points, i) * inv_leaf_size) + coord_offset;

if ((coord < 0).any() || (coord > coord_bit_mask).any()) {
std::cerr << "warning: voxel coord is out of range!!" << std::endl;
coord_pt[i] = {0, i};
continue;
}
// Compute voxel coord bits (0|1bit, z|21bit, y|21bit, x|21bit)
const std::uint64_t bits = //
((coord[0] & coord_bit_mask) << (coord_bit_size * 0)) | //
Expand Down
6 changes: 5 additions & 1 deletion include/small_gicp/util/downsampling_tbb.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,12 @@ std::shared_ptr<OutputPointCloud> voxelgrid_sampling_tbb(const InputPointCloud&
std::vector<std::pair<std::uint64_t, size_t>> coord_pt(points.size());
tbb::parallel_for(tbb::blocked_range<size_t>(0, traits::size(points), 64), [&](const tbb::blocked_range<size_t>& range) {
for (size_t i = range.begin(); i != range.end(); i++) {
// TODO: Check if coord is within 21bit range
const Eigen::Array4i coord = fast_floor(traits::point(points, i) * inv_leaf_size) + coord_offset;
if ((coord < 0).any() || (coord > coord_bit_mask).any()) {
std::cerr << "warning: voxel coord is out of range!!" << std::endl;
coord_pt[i] = {0, i};
continue;
}

// Compute voxel coord bits (0|1bit, z|21bit, y|21bit, x|21bit)
const std::uint64_t bits = //
Expand Down
Loading

0 comments on commit fc3f9ce

Please sign in to comment.