Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regenerate processing time graph for Rtabmap process #1336

Open
naitiknakrani-eic opened this issue Sep 3, 2024 · 12 comments
Open

Regenerate processing time graph for Rtabmap process #1336

naitiknakrani-eic opened this issue Sep 3, 2024 · 12 comments

Comments

@naitiknakrani-eic
Copy link

Hi @francoisferland @matlabbe,

Can you help me to regenerate processing time logs for creating graph as shown in figure below?
image

Any tools will be helpful ? If tools are not available, can you tell methodology used to derive these graph ?

Thanks

@matlabbe
Copy link
Member

matlabbe commented Sep 5, 2024

After you created a database, you can extract all timing statistics and export them to MATLAB or Octave to show a graph like that. To do so, you can see the available statistics in the database with:

rtabmap-report --stats rtabmap.db

To show the statistics of the figure above (respectively to legend, the names may slightly differ):

rtabmap-report \
   Timing/Memory_update/ms \
   Timing/Proximity_by_space/ms \
   Timing/Posterior_computation/ms \
   Timing/Map_optimization/ms \
   RtabmapROS/TimeUpdatingMaps/ms \
   Timing/Reactivation/ms \
   rtabmap.db

In the figure that will open, you can already see something similar than the figure above. To export the data, right-click on the legend area and click "Copy all curve data to clipboard".
image

Paste data in a text file, then import in any data analyzing software you want.

x	Timing/Memory_update/ms	Timing/Proximity_by_space/ms	Timing/Posterior_computation/ms	Timing/Map_optimization/ms	RtabmapROS/TimeUpdatingMaps/ms	Timing/Reactivation/ms	
0.000000	6.505010	0.001907	0.012875	0.025034	NaN	0.000000
1.004863	11.481000	0.000954	0.005960	0.011921	NaN	0.005960
2.071385	8.010860	0.000000	0.006199	0.012875	NaN	0.001907
3.072435	7.767920	0.000000	0.005960	4.143950	NaN	1.037120
4.072935	10.537100	0.000954	0.008106	0.799894	NaN	0.039101
5.073380	10.127100	0.001192	0.005007	0.937939	NaN	0.032902
6.073429	10.460100	0.000954	0.007868	1.488920	NaN	0.041962
7.139824	10.862100	0.000954	0.005960	1.379010	NaN	0.034094
8.139961	7.934090	0.000954	0.005960	1.073120	NaN	0.035048
9.140281	8.811000	0.000954	0.006914	1.636030	NaN	0.039101
10.140464	7.201910	0.000954	0.007868	1.781940	NaN	0.049830
11.140506	8.593800	0.000954	0.010014	2.104040	NaN	0.061989
12.206952	8.467200	0.000954	0.020981	1.698020	NaN	0.044823
13.273479	10.787000	0.000954	0.025034	2.268080	NaN	0.051975
...

Note that in this case that map didn't have "Map Assembling Time" (or RtabmapROS/TimeUpdatingMaps/ms), I took a random database, it may not have been taken on a robot with map assembling enabled.

@naitiknakrani-eic
Copy link
Author

@matlabbe Thanks. This helps a lot. For our database, we are getting graph like this,
image

We have Map assembling time, but its very less hence invisible in the graph. Below one is separate graph of Map assembling.
image

From this graph, it seems Global map assembling is not time consuming process. Is it on expected line ? You mentioned in other thread as well as in paper, that Map assembling is computationally heavy process. Are we missing something ?

Also, we did detailed time profile of STM, and other modules but can't able to find which function does Global map assembling. Is it part of Rtabmap.cpp or Corewrapper.cpp ? Can you help us pointing that function ?

@matlabbe
Copy link
Member

matlabbe commented Sep 7, 2024

You mentioned in other thread as well as in paper, that Map assembling is computationally heavy process

... (and more precisely) when there is a loop closure and graph deformation is over GridGlobal/UpdateError (default=0.01m).

The global map assembling is called from the MapsManager called from here, but the actual implementation is in those files: https://github.com/introlab/rtabmap/tree/master/corelib/src/global_map (depending on the topic we are subscribing). The global maps are not generated if no one is subscribed on one of them.

The biggest update spikes will happen when the map has to be fully updated because of a loop closure or memory management retrieved past data back in the current WM (or more explicitly when this is true). Depending on the GlobalMap flavor, the full update may take more or less time (e..g, 2d maps are relatively fast to regenerate). In this paper Table 10, the last two columns show a comparison depending on the map used and if a loop closure happened or not. As you can see, when there is no loop closure, appending new local grids to global map is relatively fast (maybe what you are seeing).
image

@samarth-eic
Copy link

Hi @matlabbe, continuing with above issues we kept different decimation on pointcloud, coming from pointcloudXYZ node with decimation kept as 1, 2 and 4. This can be viewed in below graphs

Screenshot 2024-09-24 162036

For decimation 1 and decimation 2 the neighbors value are 105 and 292 respectively, shouldn't be it like for more data points neighbors should be more as compared to case for decimation 2 and decimation 4, can you help us with reason for which this behavior can be seen here.

@matlabbe
Copy link
Member

matlabbe commented Sep 25, 2024

The number of neighbor links are roughly the same number of nodes added to the graph. I don't now what Rtabmap/DetectionRate you are using and actual processing time per node, but if the processing time was over the update rate in decimation 1, that could make sense that with decimation 2 there are more nodes because the processing time would be lower (higher rate to add nodes). The difference between decimation 2 and decimation 4 I cannot really say without seeing the actual processing time. If you can share the three databases, that could be useful.

@samarth-eic
Copy link

samarth-eic commented Sep 25, 2024

@matlabbe Thanks, is Rtabmap/UpdateRate same as Rtabmap/DetectionRate which is 30 in all three cases, for processing time
Below is the attached zip for three databases, hope this could be useful here
https://file.io/iUN8rPnCqU3S

@naitiknakrani-eic
Copy link
Author

@matlabbe If Memory management is disabled, and actual processing time exceeds the Rtabmap/detectionrate, then what happen ?

  • Does it skip the node creation ? and hence the nodes created are less in number ?
  • Does it start transferring nodes to LTM, i.e. does memory management job ?

@matlabbe
Copy link
Member

@samarth-eic
Yes, Rtabmap/DetectionRate is the update rate I referred. I corrected the typo in my previous post.

Like what I thought, the processing time is high, so it cannot process in real-time at rate of 30 Hz (Rtabmap/DetectionRate). Here is a breakdown of the computation on the first database (from your computer):
Screenshot from 2024-09-29 15-05-35

Rtabmap/DetectionRate is not meant to be used at 30 Hz, as updating the map is likely to be higher than 33 ms. The default is 1 Hz. If you want more poses without increasing Rtabmap/DetectionRate, you can set Rtabmap/CreateIntermediateNodes at true (there is no benefice to save more poses, that parameter is used mostly just to compare with benchmark datasets providing poses for every frame).

A lot of time is used to create the local grids, that could be optimized like this:

  1. Grid/ClusterRadius=0.1 instead of 1 (for node 39, 600 ms -> 20 ms)
  2. Grid/NormalsSegmentation=false instead of true (20 ms -> 10 ms) (with Grid/MaxObstacleHeight=1 and Grid/MaxGroundHeight=0.1)
  3. Grid/NoiseFilteringRadius=0 instead of 0.07 (10 ms -> <1 ms)

Example by reprocessing on my computer with these values:

rtabmap-reprocess \
   --Grid/ClusterRadius 0.1 \
   --Grid/NormalsSegmentation false \
   --Grid/MaxObstacleHeight 1 \
   --Grid/MaxGroundHeight 0.1 \
   --Grid/NoiseFilteringRadius 0 \
   radius_decimation1.db output.db

Screenshot from 2024-09-29 15-21-04

To improve proximity detection time, if you don't care not keeping the raw laser scans, is to downsample the scans with --Mem/LaserScanVoxelSize 0.05 and precompute the normals (--Mem/LaserScanNormalK 20). Also in general for 3D point clouds I use --Icp/Iterations 10. If computing ressources are limited, set also --RGBD/ProximityMaxPaths 1 and disable one-to-many detection by setting --RGBD/ProximityPathMaxNeighbors 0:

rtabmap-reprocess \
   --Grid/ClusterRadius 0.1 \
   --Grid/NormalsSegmentation false \
   --Grid/MaxObstacleHeight 1 \
   --Grid/MaxGroundHeight 0.1 \
   --Grid/NoiseFilteringRadius 0 \
   --Mem/LaserScanVoxelSize 0.05 \
   --Mem/LaserScanNormalK 20 \
   --Icp/Iterations 10 \
   --RGBD/ProximityMaxPaths 1 \
   --RGBD/ProximityPathMaxNeighbors 0 \
   radius_decimation1.db output.db

Screenshot from 2024-09-29 15-42-57

@naitiknakrani-eic
With memory management disabled or not, input data are always dropped between each rtabmap updates (i.e., the rate is limited either by Rtabmap/DetectionRate or actual maximum rate rtabmap can process based on the machine). When memory management is disabled, if processing time increases over time, there will be more and more space between nodes in the graph. With memory management enabled, nodes in the graph will be uniformly spaced out based on Rtabmap/DetectionRate, but some nodes may be transferred to LTM and not available in the current local map.

@naitiknakrani-eic
Copy link
Author

naitiknakrani-eic commented Sep 30, 2024

@matlabbe That's fantastic to know how these parameters really help in optimizing computation time.

"the rate is limited either by Rtabmap/DetectionRate or actual maximum rate rtabmap can process based on the machine", Like this is the reason why we kept it at 30 in order to achieve as max throughput as possible. Keeping detection rate at 1, actually underutilized processing capabilities of NVIDIA boards.

@matlabbe
Copy link
Member

matlabbe commented Sep 30, 2024

"the rate is limited either by Rtabmap/DetectionRate or actual maximum rate rtabmap can process based on the machine", Like this is the reason why we kept it at 30 in order to achieve as max throughput as possible. Keeping detection rate at 1, actually underutilized processing capabilities of NVIDIA boards.

Here some reasons why we set Rtabmap/DetectionRate at 1 Hz by default:

  • In most situations, we don't need more nodes added to map. This saves RAM, hard-drive, CPU time... for other stuff
  • Processing time of rtabmap also increases with the map size, in particular loop closure detection time, graph optimization (after a loop closure), global occupancy grid updates... the more nodes there are, the longer it takes to update. So the lower the Rtabmap/DetectionRate is, the bigger the map (in space) can be processed online without any memory management.
  • Ideally, we want hundred of nodes in the map that can be updated online (by doing optimizations like above) before starting to think about enabling memory management of rtabmap (which has some quirks to be aware of before enabling it).

Where you would want to put more computation resources first is the visual odometry or lidar odometry, for which higher frequency usually means better accuracy and lower drift. Then equally important, robot controller and obstacle detection with low latency. Then lastly, re-localization/mapping.

cheers,
Mathieu

@naitiknakrani-eic
Copy link
Author

Hi @matlabbe Thanks for your explanation and sorry for delay in revert back. We were understanding code and logic for NormalSegmentation. It was surprise to see that the Rtabmap works and create occupancy with keeping NormalSegmentation = off. It is a process which takes lot of CPU resources and contributes to ~60% of total timing (Refer First Figure 1 of early response, Timing/total avg - 1266 ms and Timing/occupancy - 781 ms).

This parameter skips the following function from file "rtabmap/corelib/include/rtabmap/core/impl
/util3d_mapping.hpp":

if(segmentFlatObstacles && flatSurfaces->size())
		{
			int biggestFlatSurfaceIndex;
			std::vector<pcl::IndicesPtr> clusteredFlatSurfaces = extractClusters(
					cloud,
					flatSurfaces,
					clusterRadius,
					minClusterSize,
					std::numeric_limits<int>::max(),
					&biggestFlatSurfaceIndex);

			...

Can you please tell us in what scenario the segmentation is required and must ? what was original thought process behind implementing it ?

@matlabbe
Copy link
Member

matlabbe commented Oct 16, 2024

The normal segmentation is based on that experiment: http://wiki.ros.org/rtabmap_ros/Tutorials/StereoOutdoorNavigation#Global_costmap :

Since the ground is not even, the ground is segmented by normal filtering: all points with normal in the +z direction (+- fixed angle) are labelled as ground, all the others are labelled as obstacles. The images below show an example.
image

We enable it by default because it is more safe in general and easier to use to detect small obstacles. When we disable it, we do a simple passthrought filtering at some height (e..g, 10 cm) over the ground, which is very fast to do, but requires more tuning to detect small obstacles and cannot handle a ground that is not perfectly flat (e.g., ramps or slopes). Here another example from this paper:
image

For less-powerful computers and/or the robot is moving always on a flat floor, disabling it can give a good boost of performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants