Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved Collada loader #569

Open
wants to merge 10 commits into
base: gz-common5
Choose a base branch
from

Conversation

fredmarkus
Copy link

@fredmarkus fredmarkus commented Jan 11, 2024

Bug fix

Summary

This PR accelerates the loading of collada files which, for large meshes, drastically cuts down on the loading times.
This was done using several techniques, ranked below in order of impact:

  1. Pass by reference rather than pass by value. The most significant cut in time was achieved by replacing copy assignments of large objects, with shared pointer copies. This meant that the object that had to be copied became much, much smaller and thus leading to significant boosts in performance
  2. Data structure changes. On several occasions, std::map was changed to std::unordered_map to benefit from the constant time element access.
  3. Smaller algorithmic improvements. Some other algorithmic improvements were done, such as improving the split() function and allocating memory ahead of time for known sized objects.

To see the improvements for yourself, I have linked a folder called torino below, containing a map of the center Turin, Italy with medium-high resolution. You can run this map using:
GZ_SIM_RESOUCE_PATH=/path/to/parent/dir gz sim -r /path/to/torino/torino.sdf

When running the default colladaloader.cc, this map took about 85 seconds to load on my machine. With this new implementation it took me about 10 seconds. For larger maps, I have seen even more drastic performance improvements. A map that previously did not load in more than 2 hours, now only takes 3 minutes to load.

One technique that I did not manage to use was parallelization. A majority of the startup time is still spent in the loadtriangles() function, which itself calls several functions. Based on my understanding it was not possible to parallelize these functions. The other major loss of time is now the xml parsing itself. The current implementation uses tinyxml2. While good, after some research I found that there are several other parsers that are even faster (pugixml, rapidxml). Rewriting the xml sections to use one of these would be a worthwhile endeavor for the future, as it could lead to another large performance gain for big files.

Link to "torino" map:
https://drive.google.com/drive/folders/13TVY2aOQiS-SN2l3HkUXauy1qlJFKaGa?usp=sharing

Checklist

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by messages.

Signed-off-by: frederik <[email protected]>
@fredmarkus fredmarkus requested a review from marcoag as a code owner January 11, 2024 16:55
@github-actions github-actions bot added 🌱 garden Ignition Garden 🎵 harmonic Gazebo Harmonic labels Jan 11, 2024
@fredmarkus
Copy link
Author

@marcoag @mjcarroll It appears that 3 tests are failing. I am not sure whether this is to do with the modifications done by me as I ran the tests on main branch and they seem to be failing there as well. Do you have an idea?

@bperseghetti
Copy link
Member

bperseghetti commented Jan 11, 2024

@marcoag @mjcarroll It appears that 3 tests are failing. I am not sure whether this is to do with the modifications done by me as I ran the tests on main branch and they seem to be failing there as well. Do you have an idea?

@frede791 Mind splitting out the linting changes from the functional code changes or remove the linting changes completely? Will make it easier to look at the actual changes and also easier to debug.

Copy link
Contributor

@caguero caguero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this patch looks very promising! I was able to reproduce your numbers as well. I had a few very minor changes around the code. Here's my most important question:

Your description mentions that the optimizations are ranked in order of impact. Could you share some extra information about how much each optimization contributes to the speedup? I'm surprised that the optimization (1) is so important as we were mostly passing references.

graphics/src/ColladaLoader.cc Outdated Show resolved Hide resolved
@@ -330,7 +339,7 @@ namespace gz
void hash_combine(std::size_t &_seed, const double &_v)
{
std::hash<double> hasher;
_seed ^= hasher(_v) + 0x9e3779b9 + (_seed << 6) + (_seed >> 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why removing the constant please?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a constant doesn't change the distribution of the hash function but costs about 0.2 seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is unexpected!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I took the code from https://stackoverflow.com/a/2595226. That's a while back. Maybe there's a better way to do it now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation of the hash function accounts for about 0.1 seconds on the Torino map.

graphics/src/ColladaLoader.cc Outdated Show resolved Hide resolved
@fredmarkus
Copy link
Author

fredmarkus commented Jan 11, 2024

@caguero I ran the code throught Intel VTune and then analyzed where the majority of the start up time was spent. By far the largest time save comes from changing the _duplicates and _values objects in the LoadTexCoords() function to shared pointers. I applied this idea to the other load functions which had similar implementations. This alone lowered the time taken to load the torino map to about 20 seconds (with a 35 second drop alone being accounted for by the LoadTexCoords() function). The other improvements are relatively minor in comparison but they still seemed significant enough to improve. I'm sure there are still more improvements that could be done (especially with respect to the GeometryVertices class which is now one of the most significant time factors at about 1 second); however, the time saves here are now minimal and for personally it doesn't matter too much if a map loads in 7 seconds or 10 seconds as for larger maps the biggest bottleneck is now tinyxml2 anyways.
Speed up time improvements on torino map:

  1. ca. 65 seconds
  2. ca. 8 seconds
  3. <3 seconds

@bperseghetti @caguero I will undo/improve on the linting changes and other requests tomorrow.

@mjcarroll
Copy link
Contributor

@marcoag @mjcarroll It appears that 3 tests are failing.

I can take a closer look in the morning.

@fredmarkus fredmarkus force-pushed the collada_acceleration branch from 9516485 to c22b4d9 Compare January 12, 2024 09:21
Signed-off-by: frederik <[email protected]>
@fredmarkus fredmarkus force-pushed the collada_acceleration branch from 1098a84 to f0ff34f Compare January 12, 2024 09:28
@fredmarkus
Copy link
Author

@caguero @bperseghetti I have addressed all issues raised above. Codecheck passes as well.

@fredmarkus fredmarkus force-pushed the collada_acceleration branch from 4cff57c to baa748a Compare January 12, 2024 11:22
@fredmarkus fredmarkus force-pushed the collada_acceleration branch from 1571a65 to 6911db6 Compare January 12, 2024 13:04
@fredmarkus fredmarkus requested a review from caguero January 12, 2024 14:53
@mjcarroll
Copy link
Contributor

mjcarroll commented Jan 12, 2024

So I think the failures are "real" in that those tests are segfaulting.

Edit: and of course they don't reproduce locally.

@@ -2028,10 +2241,10 @@ void ColladaLoader::Implementation::LoadPolylist(
std::string offset = polylistInputXml->Attribute("offset");
if (semantic == "VERTEX")
{
unsigned int count = norms.size();
unsigned int count = (*norms).size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the segfault is happening in CI. Since norms is an uninitialized shared_ptr, norms->size() isn't a valid call at this point. I suppose it depends on your compiler/system if something kinda bad (data corruption) or really bad (segfault) happens at this point, because it seems to only consistently happen on my jammy system.

Copy link
Author

@fredmarkus fredmarkus Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this with the latest commit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining test run failure for the Collada Loader seems to be the result of a mismatch between the actual mesh vertex and normal count and the expected counts

@mjcarroll
Copy link
Contributor

Overall, I am on board with the approach of minimizing copies where possible here, but I'm not sure if switching to shared_ptrs is the exact approach that we want to do yet. I'm going to spend a little more time with this today and see if I can come up with concrete feedback.

@mjcarroll
Copy link
Contributor

There are also exisiting memory issues in some of these tests that ASAN catches, I'm going to try to get those resolved so we have a clean baseline to discuss from.

Make it work right and then make it work fast, right?

@mjcarroll
Copy link
Contributor

Okay, I have addressed any current ASAN issues in the graphics component in #571

@fredmarkus
Copy link
Author

fredmarkus commented Jan 18, 2024

@mjcarroll I am not quite sure why the test failure is happening for the ColladaLoader. CI tells me that there is a mismatch between expected vertex/normal count and actual count but I am not quite sure why this is happening especially since it is only 1/14 tests in the test suite. I could see if the changes from #571 would have an impact. What do you think?

@azeey
Copy link
Contributor

azeey commented Jun 10, 2024

@mjcarroll Would you be able to take another look at this?

@azeey azeey added the beta Targeting beta release of upcoming collection label Jul 29, 2024
Copy link
Contributor

@iche033 iche033 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a comment with a patch that gets the test to pass. It reverts a check but I'm not sure if that change is intended or not.


// create a map of duplicate indices
// create a map of duplicate indices
if((*prev_vec) != vec){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into the test failure and found that this check here results in more unique normals. Not sure if I understand the logic here. A vector is marked as unique if it's not equal to the previous one? So say if we have the following vectors: A = [0, 0, 1], B= [0, 1, 0], C =[0, 0, 1]. I think with this logic, C will be marked as unique because it does not equal to B?

If I remove the extra check with the following patch, the test passes:

collada.patch
diff --git a/graphics/src/ColladaLoader.cc b/graphics/src/ColladaLoader.cc
index b7979a3..c6c231c 100644
--- a/graphics/src/ColladaLoader.cc
+++ b/graphics/src/ColladaLoader.cc
@@ -1562,8 +1562,6 @@ void ColladaLoader::Implementation::LoadPositions(const std::string &_id,
   auto values = toDoubleVec(valueStr, totCount);
 
   gz::math::Vector3d vec;
-  std::shared_ptr<gz::math::Vector3d> prev_vec =
-    std::make_shared<gz::math::Vector3d>(gz::math::Vector3d::Zero);
   if (!_values)
     _values = std::make_shared<std::vector<gz::math::Vector3d>>();
   if (!_duplicates)
@@ -1579,16 +1577,10 @@ void ColladaLoader::Implementation::LoadPositions(const std::string &_id,
     (*_values).emplace_back(vec);
 
     // create a map of duplicate indices
-    if((*prev_vec) != vec){
-      if (unique.find(vec) != unique.end())
-        (*_duplicates)[(*_values).size()-1] = unique[vec];
-      else
-        unique[vec] = (*_values).size()-1;
-    }
+    if (unique.find(vec) != unique.end())
+      (*_duplicates)[(*_values).size()-1] = unique[vec];
     else
       unique[vec] = (*_values).size()-1;
-
-    (*prev_vec) = vec;
   }
 
   this->positionDuplicateMap[_id] = _duplicates;
@@ -1728,8 +1720,6 @@ void ColladaLoader::Implementation::LoadNormals(const std::string &_id,
   auto values = toDoubleVec(valueStr, totCount);
 
   gz::math::Vector3d vec;
-  std::shared_ptr<gz::math::Vector3d> prev_vec =
-    std::make_shared<gz::math::Vector3d>(gz::math::Vector3d::Zero);
   if (!_values)
     _values = std::make_shared<std::vector<gz::math::Vector3d>>();
   if (!_duplicates)
@@ -1747,16 +1737,10 @@ void ColladaLoader::Implementation::LoadNormals(const std::string &_id,
     (*_values).emplace_back(vec);
 
     // create a map of duplicate indices
-    if((*prev_vec) != vec){
-      if (unique.find(vec) != unique.end())
-        (*_duplicates)[(*_values).size()-1] = unique[vec];
-      else
-        unique[vec] = (*_values).size()-1;
-    }
+    if (unique.find(vec) != unique.end())
+      (*_duplicates)[(*_values).size()-1] = unique[vec];
     else
       unique[vec] = (*_values).size()-1;
-
-    (*prev_vec) = vec;
   }
 
   this->normalDuplicateMap[_id] = _duplicates;
@@ -1917,8 +1901,6 @@ void ColladaLoader::Implementation::LoadTexCoords(const std::string &_id,
   auto values = toDoubleVec(valueStr, totCount);
 
   gz::math::Vector2d vec;
-  std::shared_ptr<gz::math::Vector2d> prev_vec =
-    std::make_shared<gz::math::Vector2d>(gz::math::Vector2d::Zero);
   if (!_values)
     _values = std::make_shared<std::vector<gz::math::Vector2d>>();
   if (!_duplicates)
@@ -1933,16 +1915,10 @@ void ColladaLoader::Implementation::LoadTexCoords(const std::string &_id,
     (*_values).emplace_back(vec);
 
     // create a map of duplicate indices
-    if((*prev_vec) != vec){
-      if (unique.find(vec) != unique.end())
-        (*_duplicates)[(*_values).size()-1] = unique[vec];
-      else
-        unique[vec] = (*_values).size()-1;
-    }
+    if (unique.find(vec) != unique.end())
+      (*_duplicates)[(*_values).size()-1] = unique[vec];
     else
       unique[vec] = (*_values).size()-1;
-
-   (*prev_vec) = vec;
   }
 
   this->texcoordDuplicateMap[_id] = _duplicates;

@azeey
Copy link
Contributor

azeey commented Aug 22, 2024

I'll go ahead and remove beta from this PR. I don't think we have enough time to continue iterating on it. Since this will hopefully not break behavior, it can be merged after Ionic is released.

@azeey azeey removed the beta Targeting beta release of upcoming collection label Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🌱 garden Ignition Garden 🎵 harmonic Gazebo Harmonic
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

6 participants