Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Conan create gets stuck generating dependency graph #17612

Open
1 task done
radonish opened this issue Jan 22, 2025 · 13 comments
Open
1 task done

[bug] Conan create gets stuck generating dependency graph #17612

radonish opened this issue Jan 22, 2025 · 13 comments
Assignees

Comments

@radonish
Copy link

radonish commented Jan 22, 2025

Describe the bug

Environment:

  • Conan 2.10.0
  • CMake 3.30.5
  • Red Hat Enterprise Linux

Description:
I've been working on migrating my company to Conan 2.x. At this point I've migrated all recipes (~100) to Conan 2.x. The most complex application with ~55 dependencies successfully builds - let's call it "appx". I'm running into a problem with an application that pulls in a library that the complex application generates - let's call it "utilx". I understand that it's a bit quirky that appx also generates a library for utilx to use.

utilx only depends on appx. The conan create command for utilx gets stuck - the Conan process uses almost all the memory (16GB) and CPU (Intel Xeon with 4 cores @ 3.8GHz) of a Linux workstation. It gets stuck while it's in the "Computing dependency graph" phase. On the other hand, I can do conan graph info for utilx and it generates the HTML graph in a timely manner.

Clearly there is some sort of issue with dependency graph generation since it appears to work for conan graph info but not conan create in this specific case. I'm looking for guidance as to how to debug the conan create command's dependency graph generation.

How to reproduce it

cd utilx
conan create . -pr:b <build profile> -pr:h <host profile>

EDIT: Changed from question format to bug format.

Have you read the CONTRIBUTING guide?

  • I've read the CONTRIBUTING guide
@memsharded memsharded self-assigned this Jan 22, 2025
@memsharded
Copy link
Member

Hi @radonish

Thanks for your question.

The most complex application with ~55 dependencies successfully builds - let's call it "appx". I'm running into a problem with an application that pulls in a library that the complex application generates - let's call it "utilx". I understand that it's a bit quirky that appx also generates a library for utilx to use.

Yes, it seems that it might be trapped in an infinite loop, but I don't know enough details.
It would be great to have:

  • The output of the conan create until it hangs
  • The relationships between the packages, like the exact type of requires() and tool_requires() that the packages have
  • If there is any kind of possible circular dependency, like appx depending on utilx in some context too.
  • The package_type of the packages. Does appx package both the application and the utilx inside its own package
  • If you do the conan create .... -vvv and kill (ctrl+c), do you get any trace? Could you please share the output for that?

@radonish
Copy link
Author

radonish commented Jan 23, 2025

Hello, @memsharded, thanks for your response.

I've attached the conan create log (company packages randomly renamed). I ran it with -vvv - I don't see anything exciting after ctrl+c.

The relationships between the packages, like the exact type of requires() and tool_requires() that the packages have

It's quite a big dependency tree. utilx has a single requires() for appx. General notes that apply to entire dependency tree:

  • All have python_requires() for a conanpytools package
  • All have a tool_requires() for a cmake_utils package within their build_requirements()
  • All dynamically come up with their version using set_version() and the Git branch (why you see a lot of git rev-parse output during dependency graph computation)
  • The rest is standard requires() (note: I had to add what felt like somewhat random transitive_libs=True traits via trial and error as noted in the transitive shared library issue: [question] Conan 2.x Migration - transitive_headers and transitive_libs Trait Defaults #17505)

If there is any kind of possible circular dependency, like appx depending on utilx in some context too.

No circular dependencies that I can tell. I would have thought any issues in that department to have popped up when building appx - what has the dozens of dependencies you can see in the log that are getting pulled in as a result of utilx's requires() of appx.

The package_type of the packages. Does appx package both the application and the utilx inside its own package

I have played with this a bit. Initially both appx and utilx were package_type "application". I removed the package_type altogether - due to the quirk of an application grabbing a lib generated from another - and it did not change things, unfortunately.

conan_create.log

@radonish radonish changed the title [question] Conan Dependency Graph Debug [bug] Conan Dependency Graph Issue Jan 23, 2025
@radonish radonish changed the title [bug] Conan Dependency Graph Issue [bug] Conan create gets stuck generating dependency graph Jan 23, 2025
@memsharded
Copy link
Member

Thanks for the feedback. Some questions:

utilx/1.0.0+feature_tpssw-156-conan2: RUN: git rev-parse --abbrev-ref HEAD

Why the dependencies do this? Once the dependencies are exported to the Cache, the version is metadata, so the set_version() doesn't execute anymore. Are all of these packages editable by any chance? If they are not, could you explain why the git rev-parse?

CONAN_CMAKE_SYSTEM_NAME=Linux

Do you keep the same profiles for Conan 1 and 2? This variable no longer has any effect in new generators (neither Conan 1 nor Conan 2 with generators CMakeToolchain)

I still have no idea where it is getting stuck.
I can try to introduce some logging/printing in the places I think they could give us some information, would you prefer for me to send a patch and you apply it to the Conan source yourself, or maybe would you prefer running Conan from one of my source branches?

@radonish
Copy link
Author

radonish commented Jan 23, 2025

Thanks, @memsharded

Why the dependencies do this? Once the dependencies are exported to the Cache, the version is metadata, so the set_version() doesn't execute anymore. Are all of these packages editable by any chance? If they are not, could you explain why the git rev-parse?

The conanpytools python_requires package I created has a utility function that all recipes call within the Conan requirements() function. It gets what I call a "version modifier" - this is what uses the branch name (with "/" converted to "_" and lowercase conversion) to add a suffix to the version. mylib/1.0.0 becomes mylib/1.0.0+feature_xyz, for example, on a developer feature branch. Long story short, this behavior is due to having a monolithic repo and not currently doing more fine-tuned version management package to package for our internal projects. The branch essentially becomes the version. Once a feature gets merged to the trunk (e.g. develop branch) the version modifier goes away. Consumers of the package simply fetch the latest build of the package so it all just works.

    def requirements(self):
        self.python_requires['conanpytools'].module.init_version_modifier(self, ["develop"])
        self.requires(f"liba/1.0.0{self.package_version_modifier}")
        self.requires(f"libb/1.0.0{self.package_version_modifier}")

So, if appx depends on liba/1.0.0 and libb/1.0.0, it effectively makes appx automagically depend on liba/1.0.0+feature_xyz and libb/1.0.0+feature_xyz if the developer is modifying the code in this monolithic repo on branch feature/xyz.

Do you keep the same profiles for Conan 1 and 2? This variable no longer has any effect in new generators (neither Conan 1 nor Conan 2 with generators CMakeToolchain)

I have updated all of our profiles for Conan 2 but did start with the state of Conan 1 - so, this is just a carry over. I'll remove CONAN_CMAKE_SYSTEM_NAME from them all. Thank you for noting that!

I can try to introduce some logging/printing in the places I think they could give us some information, would you prefer for me to send a patch and you apply it to the Conan source yourself, or maybe would you prefer running Conan from one of my source branches?

That would be great, thank you. Either option should work but the patch may be easier as I've only used official Conan versions installed via pip - so I'd need some guidance to use your branch directly.

Thanks

@memsharded
Copy link
Member

self.python_requires['conanpytools'].module.init_version_modifier(self, ["develop"])

Ok, so it is basically reading the branch of the consumer repository to define the version of the requirement, if I understood correctly, it is not that the set_version() of the dependencies was somehow being executed. That seems indeed too much of coupling between the consumer repository and the dependencies packages, but I understand where you come from, the monolithic repo, etc.

It would be a good starting point if you could test with the following patch:

diff --git a/conans/client/graph/graph_builder.py b/conans/client/graph/graph_builder.py             
index 51ac37afa..7c1e17fee 100644                                                                    
--- a/conans/client/graph/graph_builder.py                                                           
+++ b/conans/client/graph/graph_builder.py                                                           
@@ -53,17 +53,22 @@ class DepsGraphBuilder(object):                                                  
                 (require, node) = open_requires.popleft()                                           
                 if require.override:                                                                
                     continue                                                                        
+                print("Expanding requirement:", require.ref, "from node ", node.ref)                
                 new_node = self._expand_require(require, node, dep_graph, profile_host,             
                                                 profile_build, graph_lock)                          
+                print("    The new node is: ", new_node.ref if new_node else None)                  
                 if new_node and (not new_node.conanfile.vendor                                      
                                  or new_node.recipe == RECIPE_EDITABLE or                           
                                  new_node.conanfile.conf.get("tools.graph:vendor",                  
                                                              choices=("build",))):                  
                     newr = self._initialize_requires(new_node, dep_graph, graph_lock, profile_build,
                                                      profile_host)                                  
+                    print("    New requirements: ", [r.ref for r in newr])                          
                     open_requires.extendleft((r, new_node) for r in reversed(newr))                 
             self._remove_overrides(dep_graph)                                                       
+            print("Checking graph providers")                                                       
             check_graph_provides(dep_graph)                                                         
+            print("Finished checking graph providers")                                              
         except GraphError as e:                                                                     
             dep_graph.error = e                                                                     
         dep_graph.resolved_ranges = self._resolver.resolved_ranges                                  

And see what is the output

@radonish
Copy link
Author

radonish commented Jan 23, 2025

Thanks for the patch, @memsharded. I've attached a new conan create log with the additional debug output.

conan_create_debug.log

As a reference point, running conan graph info for this utilx package completes successfully in about 25 seconds. When I finally hit ctrl+c to stop the process in the stuck conan create case I let it run for about 5 minutes.

@memsharded
Copy link
Member

There is something weird in this log:

Expanding requirement: adc/1.0.0+feature_tpssw-156-conan2 from node  dot_conn/1.0.0+feature_tpssw-156-conan2
    The new node is:  remote/1.0.0+feature_tpssw-156-conan2: RUN: git rev-parse --abbrev-ref HEAD
gpb_out/1.0.0+feature_tpssw-156-conan2: RUN: git rev-parse --abbrev-ref HEAD
ERROR: Exiting with code: 3
adc/1.0.0+feature_tpssw-156-conan2

Expanding a adc/1.0.0 requirement resulted in a new node called remote/1.0.0? This is very surprising.
It also comes with the only ERROR in the log so far, that comes from that git rev-parse of an intermediate gpb_out.
I don't know what is the meaning, but worth double checking those recipes (dot_conn, adc, gpb_out)

Just in case, do you have any kind of [replace_requires] in your profile?

@radonish
Copy link
Author

radonish commented Jan 24, 2025

Expanding a adc/1.0.0 requirement resulted in a new node called remote/1.0.0? This is very surprising.

Unfortunately I think that was a case of stdout getting mixed up when redirecting to the log file. I re-generated a new log (attached below) and that expanding adc/1.0.0 line looks more as expected now. The ERROR is a similar issue - it shows up at the very end as expected when I'm not redirecting stdout/stderr to a log file and just letting it go to the console. Note, for this new log I used the timeout command to have it SIGTERM Conan after 5 minutes (timeout 5m conan create ......) so the error code changed to 5 because of SIGTERM vs. ctrl-c.

conan_create_debug2.log

Just in case, do you have any kind of [replace_requires] in your profile?

No, I do not have any [replace_requires].

client/graph/graph.py

    def propagate_downstream(self, require, node, src_node=None):
        #print("  Propagating downstream ", self, "<-", require)
        assert node is not None
        # This sets the transitive_deps node if it was None (overrides)
        # Take into account that while propagating we can find RUNTIME shared conflicts we
        # didn't find at check_downstream_exist, because we didn't know the shared/static
        existing = self.transitive_deps.get(require)
        if existing is not None and existing.require is not require:
            if existing.node is not None and existing.node.ref != node.ref:
                print("  +++++Runtime conflict!", require, "with", node.ref)
                return True
            print("  ++before aggregate")
            require.aggregate(existing.require)
            print("  --after aggregate")

For what it's worth, once the process has been running for several minutes and the system is bogged down with Conan's high CPU/memory usage, what appears to take a long time is the require.aggregate(existing.require) line above. It will sit at the "++before aggregate" print I added for several minutes before slowly continuing.

@memsharded
Copy link
Member

That is still more surprising 😅

The aggregate() method is very simple, just some boolean operations, should be constant time.
The only explanation is that somehow the machine is resource depleted and just fully stuck.

Also, I am confused, I was expecting some kind of recursive, infinite loop, which means that in 5 minutes, the log would have many thousands of lines. But if I understood correctly, it seems that it is less that 1000 lines, so instead of the dependency loop, we probably need to consider some other hypothesis.

It would seem that evaluating each node might have some operations there that take a lot of time, and cause the very heavy resource consumption. I am having a look again to the self.python_requires['conanpytools'].module.init_version_modifier(self, ["develop"]) and what it is doing. I'd suggest to add some printings, printing the execution time of the init_version_modifier everytime it executes.

Adding these might also help:

diff --git a/conan/internal/methods.py b/conan/internal/methods.py
index b0dbb47b0..e6cb954cc 100644
--- a/conan/internal/methods.py
+++ b/conan/internal/methods.py
@@ -123,11 +123,16 @@ def run_configure_method(conanfile, down_options, profile_options, ref):

     if hasattr(conanfile, "requirements"):
         with conanfile_exception_formatter(conanfile, "requirements"):
+            import time
+            t = time.time()
             conanfile.requirements()
-
+            print(f"Requirements of {conanfile} took: {time.time() - t:.2f}s")
     if hasattr(conanfile, "build_requirements"):
         with conanfile_exception_formatter(conanfile, "build_requirements"):
+            import time
+            t = time.time()
             conanfile.build_requirements()
+            print(f"Tool Requirements of {conanfile} took: {time.time() - t:.2f}s")

@radonish
Copy link
Author

Thanks, @memsharded

You can see in this snippet towards the end of the log that the requirements/build_requirements timings are relatively sane:

Expanding requirement: remote/1.0.0+feature_tpssw-156-conan2 from node  appx/1.0.0+feature_tpssw-156-conan2
remote/1.0.0+feature_tpssw-156-conan2: RUN: git rev-parse --abbrev-ref HEAD
Requirements of remote/1.0.0+feature_tpssw-156-conan2 took: 5.87s
Tool Requirements of remote/1.0.0+feature_tpssw-156-conan2 took: 0.00s
    The new node is:  remote/1.0.0+feature_tpssw-156-conan2
    New requirements:  [tupper/1.0.0+feature_tpssw-156-conan2, chatter/1.0.0+feature_tpssw-156-conan2, gpb_out/1.0.0+feature_tpssw-156-conan2, receipt/1.0.0+feature_tpssw-156-conan2, ip/1.0.0+feature_tpssw-156-conan2, eag_mgg/24.6+feature_tpssw-156-conan2, poco/1.9.0.4+feature_tpssw-156-conan2, utils/1.0.0+feature_tpssw-156-conan2, cmake_utils/1.0.0+feature_tpssw-156-conan2]
Expanding requirement: tupper/1.0.0+feature_tpssw-156-conan2 from node  remote/1.0.0+feature_tpssw-156-conan2
  Existing previous requirements from  cli => tupper/1.0.0+feature_tpssw-156-conan2, Traits: build=False, headers=False, libs=True, run=True, visible=True
Closing a loop from  remote/1.0.0+feature_tpssw-156-conan2 => tupper/1.0.0+feature_tpssw-156-conan2
    The new node is:  None
Expanding requirement: chatter/1.0.0+feature_tpssw-156-conan2 from node  remote/1.0.0+feature_tpssw-156-conan2
  Existing previous requirements from  cli => chatter/1.0.0+feature_tpssw-156-conan2, Traits: build=False, headers=False, libs=True, run=True, visible=True
Closing a loop from  remote/1.0.0+feature_tpssw-156-conan2 => chatter/1.0.0+feature_tpssw-156-conan2
    The new node is:  None
Expanding requirement: gpb_out/1.0.0+feature_tpssw-156-conan2 from node  remote/1.0.0+feature_tpssw-156-conan2
gpb_out/1.0.0+feature_tpssw-156-conan2: RUN: git rev-parse --abbrev-ref HEAD
Requirements of gpb_out/1.0.0+feature_tpssw-156-conan2 took: 0.08s
Tool Requirements of gpb_out/1.0.0+feature_tpssw-156-conan2 took: 0.00s
    The new node is:  gpb_out/1.0.0+feature_tpssw-156-conan2
    New requirements:  [ccp_out/1.0.0+feature_tpssw-156-conan2, chatter/1.0.0+feature_tpssw-156-conan2, eag_mgg/24.6+feature_tpssw-156-conan2, poco/1.9.0.4+feature_tpssw-156-conan2, utils/1.0.0+feature_tpssw-156-conan2, cmake_utils/1.0.0+feature_tpssw-156-conan2, protoc/1.0.1+feature_tpssw-156-conan2]
Expanding requirement: ccp_out/1.0.0+feature_tpssw-156-conan2 from node  gpb_out/1.0.0+feature_tpssw-156-conan2
  Existing previous requirements from  cli => ccp_out/1.0.0+feature_tpssw-156-conan2, Traits: build=False, headers=False, libs=True, run=True, visible=True
Closing a loop from  gpb_out/1.0.0+feature_tpssw-156-conan2 => ccp_out/1.0.0+feature_tpssw-156-conan2

I am having a look again to the self.python_requires['conanpytools'].module.init_version_modifier(self, ["develop"]) and what it is doing.

Here is the source. Yes, I know it's a bit quirky but for now we have to version the mono repo as if the branch is the version.

from conan import ConanFile
from conan.errors import ConanException
from conan.tools.scm import Git

# Initialize the package's version modifier if necessary
def init_version_modifier(self, release_branches):
    if self.package_version_modifier == None or not self.package_version_modifier:
        self.package_version_modifier = get_version_modifier(self, release_branches)

# Determine version modifier to apply to a package
# Note: Version modifiers are used both for setting a package's own version as well as determining the proper version
#       of its requirements.
def get_version_modifier(conanfile_ref, release_branches):
    modifier = ""
    # Use Git branch, if within a repo, to determine the version modifier
    try:
        branch = Git(conanfile_ref).run("rev-parse --abbrev-ref HEAD").replace('/', '_').lower()
        if branch not in release_branches:
            modifier = f"+{branch}"
    # Otherewise use package's version string to determine the version modifier
    # Note: Per semantic versioning, the modifier is assumed to be the "+..." suffix. For example if the version is
    # "1.2.3+feature_tps-12345-abc" the version modifier is "+feature_tps-12345-abc".
    except ConanException:
        if conanfile_ref.version != None and conanfile_ref.version.find("+") >= 0:
            modifier = conanfile_ref.version[conanfile_ref.version.index("+"):]

    return modifier

class ConanPyTools(ConanFile):
    name = "conanpytools"
    version = "1.0"
    package_type = "python-require"

And then as a reminder, it is used like this within each package's requirements() and build_requirements():

    def requirements(self):
        self.python_requires['conanpytools'].module.init_version_modifier(self, ["develop"])
        self.requires(f"liba/1.0.0{self.package_version_modifier}")
        self.requires(f"libb/1.0.0{self.package_version_modifier}")

I will add timings within init_version_modifier() but it should really just be the amount of time it takes the git command to return essentially.

@memsharded
Copy link
Member

You can see in this snippet towards the end of the log that the requirements/build_requirements timings are relatively sane:

Requirements of remote/1.0.0+feature_tpssw-156-conan2 took: 5.87s

Not that much, the time of the requirements() method is expected to be milliseconds, as this method can execute many times, specially for graph expansions that contains multiple repeated nodes, for example when using test_requires("sometest/0.1") and sometest contains transitive requirements, because the test requires are private for each node of the graph. If you have a graph of lets, say, 50 dependencies, and each one has a test_requires("sometest/0.1") and it has, lets say 2 transitive levels of dependencies, then the time would be 50 x 3 levels x 5 seconds = 750 seconds = 12.5 minutes, which is intractable in practice for a graph of just 150 nodes (if having complicated tool-requires or test-requires, graphs can have thousands of nodes).

@radonish
Copy link
Author

radonish commented Jan 24, 2025

Not that much, the time of the requirements() method is expected to be milliseconds

Understood. For reference, that same 5.87s time for remote/1.0.0+feature_tpssw-156-conan2 when doing conan create for utilx only takes 0.01s for its single dependency, appx (that has the dozens of dependencies).

So, that begs the question, why would the remote/1.0.0+feature_tpssw-156-conan2 package requirements() be fast when processed as a direct dependency of appx but slow as a transitive dependency of utilx's direct dependency of appx?

@memsharded
Copy link
Member

Yes, it is certainly a mistery, maybe it depends also on the git repo size, it might be slower.

Maybe it would be useful to trace every requires expansion (something like this, not tested):

                 (require, node) = open_requires.popleft()                                           
                 if require.override:                                                                
                     continue                                                                        
+                print("Expanding requirement:", require.ref, "from node ", node.ref) 
+                import time
+                t = time.time()               
                 new_node = self._expand_require(require, node, dep_graph, profile_host,             
                                                 profile_build, graph_lock)                          
+                print("    Expanded in ", time.time() - t)                 
+                print("    The new node is: ", new_node.ref if new_node else None)                  
                 if new_node and (not new_node.conanfile.vendor                                      
                                  or new_node.recipe == RECIPE_EDITABLE or                           
                                  new_node.conanfile.conf.get("tools.graph:vendor",                  
                                                              choices=("build",))):                  
                     newr = self._initialize_requires(new_node, dep_graph, graph_lock, profile_build,
                                                      profile_host)                                  
+                    print("    New requirements: ", [r.ref for r in newr])                          
                     open_requires.extendleft((r, new_node) for r in reversed(newr))                 
             self._remove_overrides(dep_graph)                                                       
+            print("Checking graph providers")                                                       
             check_graph_provides(dep_graph)                                                         
+            print("Finished checking graph providers")                                              
         except GraphError as e:                                                                     
             dep_graph.error = e                                                                     
         dep_graph.resolved_ranges = self._resolver.resolved_ranges  

And have it run for at least those 5 minutes, to see if there is some pattern we can see in some requirements and not others, or an evolution over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants