Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Corrupted packages in local cache #17622

Open
mrjoel opened this issue Jan 23, 2025 · 9 comments · May be fixed by #17628
Open

[bug] Corrupted packages in local cache #17622

mrjoel opened this issue Jan 23, 2025 · 9 comments · May be fixed by #17628
Assignees

Comments

@mrjoel
Copy link
Contributor

mrjoel commented Jan 23, 2025

Describe the bug

Version: 2.11.0
OS: Debian bookworm

Issue

I'm in a state where running conan cache clean and subsequently conan cache check-integrity results in error messages like the following.

ERROR: Corrupted package 'lmnop/1.2.3@user/channel:2222222222222222222222222222222222222222' without conaninfo.txt in: /home/user/.conan2/p/b/lmnop333333333333/p/conaninfo.txt

I've been working on doing final migration of our remaining Conan 1 recipes. It's entirely expected that some of my local package builds which encountered errors during the process caused the issue. On inspection, the indicated paths don't even exist. I've manually edited the cache.sqlite3 to remove the invalid entries from the packages table. The check integrity only ever shows a single issue.

Expected

Assuming that check-integrity is meant to essentially be an fsck for the cache, I have the following expectations:

  1. Show all invalid entries, don't just stop at the first one. This way at the very least I could awk script a bulk removal using sqlite3.
  2. Provide some mechanism to remove invalid entries without resorting to raw sqlite editing under the covers.

Perhaps there's some other recommended approach to recover from this situation, but I wasn't able to find one. Blowing away a local Conan cache is less than desirable, particularly when I'm operating in a testing mode with no Conan remote and doing all builds locally.

How to reproduce it

No response

@mrjoel mrjoel changed the title [bug] Corrupted package in local cache [bug] Corrupted packages in local cache Jan 23, 2025
@mrjoel
Copy link
Contributor Author

mrjoel commented Jan 24, 2025

After manually removing all of the entries, running conan cache check-integrity "*" starts to process, gives "ok" integrity check for some packages, then encounters some packages missing in an apparently different manner that isn't caught by the first pass check?

I now get errors like the following, where the indicated directory also does not exist at all., but is now looking for conanmanifest.txt instead of conaninfo.txt. I'm continuing on my journey of manual sqlite table edits.

pkgA/1.2.3@user/channel: Integrity checked: ok
pkgA/1.2.3@user/channel:8888888888888888888888888888888888888888: Integrity checked: ok
pkgB/4.5.6@user/channel: Integrity checked: ok
pkgB/4.5.6@user/channel: Integrity checked: ok
ERROR: Traceback (most recent call last):
  File "/home/username/.local/lib/python3.11/site-packages/conan/cli/cli.py", line 307, in main
    cli.run(args)
  File "/home/username/.local/lib/python3.11/site-packages/conan/cli/cli.py", line 192, in run
    command.run(self._conan_api, args[0][1:])
  File "/home/username/.local/lib/python3.11/site-packages/conan/cli/command.py", line 196, in run
    sub.run(conan_api, parser, *args)
  File "/home/username/.local/lib/python3.11/site-packages/conan/cli/command.py", line 214, in run
    info = self._method(conan_api, parent_parser, self._parser, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.local/lib/python3.11/site-packages/conan/cli/commands/cache.py", line 112, in cache_check_integrity
    conan_api.cache.check_integrity(package_list)
  File "/home/username/.local/lib/python3.11/site-packages/conan/api/subapi/cache.py", line 73, in check_integrity
    checker.check(package_list)
  File "/home/username/.local/lib/python3.11/site-packages/conan/internal/cache/integrity_check.py", line 28, in check
    corrupted = self._package_corrupted(pref) or corrupted
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.local/lib/python3.11/site-packages/conan/internal/cache/integrity_check.py", line 55, in _package_corrupted
    read_manifest, expected_manifest = layout.package_manifests()
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.local/lib/python3.11/site-packages/conan/internal/cache/conan_reference_layout.py", line 114, in package_manifests
    readed_manifest = FileTreeManifest.load(package_folder)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.local/lib/python3.11/site-packages/conans/model/manifest.py", line 40, in load
    text = load(os.path.join(folder, CONAN_MANIFEST))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.local/lib/python3.11/site-packages/conans/util/files.py", line 142, in load
    with open(path, 'r', encoding=encoding, newline="") as handle:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/username/.conan2/p/b/pkgb2222222222222/p/conanmanifest.txt'

ERROR: [Errno 2] No such file or directory: '/home/username/.conan2/p/b/pkgb2222222222222/p/conanmanifest.txt'

@mrjoel
Copy link
Contributor Author

mrjoel commented Jan 24, 2025

Subsequently, I run into the following issue with a package_type = "python-require" utility package we use.

pkgx/7.7.7@user/channel: Integrity checked: ok
pkgx/7.7.7@user/channel:1111111111111111111111111111111111111111: Integrity checked: ok
ERROR: pkgy/8.8.8@user/channel: Manifest mismatch
ERROR: Folder: /home/username/.conan2/p/pkgy2222222222222/e
ERROR:     '__pycache__/file_name.cpython-311.pyc' (manifest: None, file: 33333333333333333333333333333333)
pkgy/8.8.8@user/channel: Integrity checked: ok
pkgy/8.8.8@user/channel: Integrity checked: ok
pkgy/8.8.8@user/channel: Integrity checked: ok
ERROR: pkgy/8.8.8@user/channel: Manifest mismatch
ERROR: Folder: /home/username/.conan2/p/pkgy4444444444444/e
ERROR:     '__pycache__/file_name.cpython-311.pyc' (manifest: None, file: 55555555555555555555555555555555)
pkgy/8.8.8@user/channel: Integrity checked: ok
pkgy/8.8.8@user/channel: Integrity checked: ok
pkgy/8.8.8@user/channel: Integrity checked: ok
pkgy/8.8.8@user/channel: Integrity checked: ok

The information presented (reference without package id, rrev, etc.) makes all instances identical, not even providing a copyable or identifiable handle to potentially use for conan remove. To resolve these, the only option I found was to manually remove from the filesystem, then rerunning the integrity check reasonably flagged it as missing conanmanifest.txt for removal from sqlite3 (this time from the recipes table).

@memsharded memsharded self-assigned this Jan 24, 2025
@memsharded
Copy link
Member

Hi @mrjoel

Thanks for your report.
There might be different issues in this report.

The last one, about packages that contain python code.
If the python_requires are used as pure python requires by Conan, they shouldn't generate __pycache__ folders at one. It is expected that the creation of extra files inside the cache lead to corrupted packages. The approach to avoid this would be:

  • For pure python_requires that are used and imported as documented, the pycache folders should never appear, as Conan is deactivating that, so it wouldn't be an issue.
  • If the python_requires contain code that are tools launched in python subprocesses or something like that, then it is likely that it is not a python_requires, but a tool_requires that happens to use Python.
  • For that later case, the approach to avoid the issue is to use the finalize() method, see https://blog.conan.io/2024/09/03/New-finalize-method.html

I'll follow up in a new comment regarding the other points.

@memsharded memsharded linked a pull request Jan 24, 2025 that will close this issue
@memsharded
Copy link
Member

Regarding the other points, the PR #17628 proves that:

  • conan check-integrity does not stop at the first integrity error, but reports all of them
  • conan remove command can remove those corrupted packages without problems, and without having to deal directly with the DB

The main issue seems to be when the conanmanifest.txt is missing. At the moment this was not considered a corrupted package, but a broken one. The corruption is a feature intended to check that:

  • No headers were accidentally changed by a developer incorrectly modifying them from the IDE
  • The execution of tool_requires like python-based tools were creating temporary files that would be uploaded to the server.

So there would be some unknowns that would be great to understand better, for example how is it possible for a package in the cache to end without a conanmanifest.txt. Also the conan cache clean does not touch the folders containing the conanmanifests.txt, it would be good to have a sequence of exact commands that we can reproduce on our end to understand the possible issue.

In any case, I am adding in #17628 checking and protection against those broken packages cases, I understand it can be convenient to be able to remove them with a simple conan remove command.

@mrjoel
Copy link
Contributor Author

mrjoel commented Jan 24, 2025

For pure python_requires that are used and imported as documented, the pycache folders should never appear, as Conan is deactivating that, so it wouldn't be an issue.

I believe that we're using it purely as intended, certainly no subprocess or anything. The python_requires package we have is used as a common utility for handling, including a class FooBase(object): which is used as the python_requires_extend for many of our recipes.

Looking further, we do however do something like the following, which seems to be what causes the separately imported file to be generated in __pycache__.

class FooBase(object):
    def _helper_wrapper(self, arg: str):
        sys.path.append(self.python_requires["foo_util"].path)
        from foo_util_shared import helper_method

        return helper_method(arg)

Obviously the manifest shouldn't cover the __pycache__ content, but would it be reasonable to ignore their presence?

At the moment this was not considered a corrupted package, but a broken one.

Ok, so it sounds like conan cache check-integrity is focused on ensuring each package independently is consistent, but not the soundness of the cache itself. I had assumed the later was included in the scope.

conan check-integrity does not stop at the first integrity error, but reports all of them

Looks like #17628 changes to make that happen, a welcome update, thanks.

conan remove command can remove those corrupted packages without problems, and without having to deal directly with the DB

Yes, with using the wildcard as in the test like t.run("remove pkg2* -c"). However I wasn't wanting to remove all packages of pkg2, not even all at a given version, and in the case of the missing manifest, I didn't see a clear way to get the revision handle to be able to remove just the single broken package. Looks like that is also now added more clearly in that PR, thanks!

@mrjoel
Copy link
Contributor Author

mrjoel commented Jan 24, 2025

So there would be some unknowns that would be great to understand better, for example how is it possible for a package in the cache to end without a conanmanifest.txt. Also the conan cache clean does not touch the folders containing the conanmanifests.txt, it would be good to have a sequence of exact commands that we can reproduce on our end to understand the possible issue.

I'll try to review my changes and see if I can catch what steps led to the situation. Given that it was incremental builds during final remaining transitions to add Conan 2 support, it's likely to have been something ephemeral in an intermediate state which failed.

@memsharded
Copy link
Member

Looking further, we do however do something like the following, which seems to be what causes the separately imported file to be generated in pycache.

yes, that could be related

Obviously the manifest shouldn't cover the pycache content, but would it be reasonable to ignore their presence?

We did ignore their presence in the past in Conan 1, and know what? It was problematic! 😅 because it happened that a few users wanted to actually package pycache and pyc files as "binaries". So even if not great for other cases, we came to the conclusion that Conan shouldn't try to be smart in that sense, what is in the "package" folder is part of the package, and if anything changes there, then it is a corruption. This is also why we created the finalize() method, to be able to decouple the cases that needed temporary files to be created.

Ok, so it sounds like conan cache check-integrity is focused on ensuring each package independently is consistent, but not the soundness of the cache itself. I had assumed the later was included in the scope.

No, only checks the "files" consistency, to prevent against file modification. It doesn't fully cross check the consistency of the DB and the filesystem (there hasn't been a problem there so far either).

Looks like #17628 changes to make that happen, a welcome update, thanks.

Yes, the intent of conan cache check-integrity was never to fail on the first error, but to report all. The problem is that Conan is actually crashing because of the unexpected missing file (broken package, not corrupted package). The changes in #17628 fixes those cases to be processed as corruptions instead of crashing, so it will continue.

Yes, with using the wildcard as in the test like t.run("remove pkg2* -c"). However I wasn't wanting to remove all packages of pkg2, not even all at a given version, and in the case of the missing manifest, I didn't see a clear way to get the revision handle to be able to remove just the single broken package. Looks like that is also now added more clearly in that PR, thanks!

yes, exactly, the conan remove <pattern> allows to specify down to the package revision, so now the error messages in those cases are printing the full reference, so it can be copy&pasted in the conan remove command.

@mrjoel
Copy link
Contributor Author

mrjoel commented Jan 24, 2025

Conan shouldn't try to be smart in that sense, what is in the "package" folder is part of the package.

I can fully get behind that motivation. Now that we know the cause we'll look at adding finalize() in our FooBase base recipe where we use the other constructs.

@mrjoel
Copy link
Contributor Author

mrjoel commented Jan 24, 2025

Now that we know the cause we'll look at adding finalize() in our FooBase base recipe where we use the other constructs.

  1. In my initial look at this it appears concerning in our usage of FooBase as a lightweight helper recipe base. Since FooBase is where the helpers doing the module import are, we need to add finalize() to the base. However that seems to explode the disk footprint when used for recipes like Qt, Boost, and other large packages since the entire child recipe contents will likely end up being copied. I'll experiment a bit more, but if there's a different way to handle that I'd be all ears.

  2. We are using conan cache save/restore to transfer consistent caches between systems. Is that supported in combination with the finalize method? I'd imagine that save needs to export the immutable package directory, and that restore needs to call the finalize method upon importing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants