Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with threads #24

Open
vpavlin opened this issue Sep 8, 2015 · 15 comments
Open

Problems with threads #24

vpavlin opened this issue Sep 8, 2015 · 15 comments

Comments

@vpavlin
Copy link

vpavlin commented Sep 8, 2015

Hi, I wanted to extend your project so I created a Dockerfile to run tests in container. I did some modifications and when I ran tests they failed from time to time. So I reverted my changes and ran it again against current HEAD.

See results here http://pastebin.centos.org/34206/

Simply put, tests for threading randomly fail - in this case in 6th run. I am not very familiar with multi threaded applications, but will be happy to assist with debugging.

@pydanny
Copy link
Owner

pydanny commented Sep 9, 2015

Yikes! I have some questions:

  1. Are you saying cached_property fails on 1 out of every 6 runs on Fedora?
  2. How many attempts have you made?
  3. My docker-fu is weak. How exactly is cached_property being installed. Is it these lines: https://github.com/vpavlin/cached-property/blob/disk-stored-caching/Dockerfile#L6-L8

Tomorrow I'll try and replicate this on Fedora and Ubuntu.

@vpavlin
Copy link
Author

vpavlin commented Sep 9, 2015

  1. No, in this case it was 1 of 6, in other run it was one of 10..it seems to be completely random. I'll let it run for longer time today and see the ratio.
  2. Not many 5, maybe
  3. Yes, ADD line copies repo content to the image, WORKDIR changes working directory so that I don't have to use absolute path for requirements.txt

@pydanny
Copy link
Owner

pydanny commented Sep 9, 2015

@ncoghlan, do you have any idea why this is happening? I can't reproduce it on Mac OS X or on the CI server.

@ncoghlan
Copy link

ncoghlan commented Sep 9, 2015

I was able to reproduce it without Docker on Fedora 22 - same symptoms as @vpavlin, with only 4 entries in the cache rather than the expected 5.

Dependencies were installed in the virtualenv via "pip -r requirements.txt".

Looking at the test case, I'm not seeing anything wrong except that I'm not sure the block comment is right about there being a race condition: run_threads() joins all the created threads before returning, so "cached_total" should always have been updated correctly before the assertions are checked.

@ncoghlan
Copy link

ncoghlan commented Sep 9, 2015

Running py.test --pdb rather than plain py.test so it drops into the debugger when the test fails, I can see that check.cached_total is 10, while check.add_cached is only 9.

The particular case that failed there was the one in test_threads_ttl (using freeze_time). The first failure I saw was in test_threads.

Fedora does patch the threading module, but we shouldn't be hitting that code since we aren't passing a timeout to Thread.wait: http://pkgs.fedoraproject.org/cgit/python.git/tree/00181-allow-arbitrary-timeout-in-condition-wait.patch

@ncoghlan
Copy link

ncoghlan commented Sep 9, 2015

Oh, now I understand that block comment in test_threads - I didn't realise what cached_property actually did :)

In that case, I suppose it's possible that for some reason one of the threads is dragging its feet and not starting running until after one of the other threads has already shadowed the property with a normal value.

@ncoghlan
Copy link

ncoghlan commented Sep 9, 2015

I'm heading out, but I've kicked off a test run using this modified implementation of run_threads in the tests:

    def run_threads(self, num_threads):
        launch = Event()
        def _replace_in_thread(started):
            return lambda: (started.set(), launch.wait(), self.add_cached)
        threads = []
        start_events = []
        for _ in range(num_threads):
            started = Event()
            thread = Thread(target=_replace_in_thread(started))
            thread.start()
            threads.append(thread)
            start_events.append(started)
        for started in start_events:
            started.wait()
        launch.set()
        for thread in threads:
            thread.join()

That means the sleep in the cached property test is only covering the threads reacting to the event being signalled, rather than actually starting them. (There's still be a problem to report to the Fedora python and/or kernel teams though, so it would be weird for 5 threads to be taking an entire second to all start)

@ncoghlan
Copy link

ncoghlan commented Sep 9, 2015

Even with that modified implementation, that still failed within 28 runs of the test suite. I'd suggest pinging the Fedora python-devel list at this point.

@vpavlin
Copy link
Author

vpavlin commented Sep 9, 2015

@ncoghlan Do you want me to do it, or will you? (Anyway thanks for looking at this)

@ncoghlan
Copy link

ncoghlan commented Sep 9, 2015

@vpavlin I was suggesting you do that. I'm now assuming there's something odd in the Fedora Python patches, and I'm the wrong person to ask about that.

@vpavlin
Copy link
Author

vpavlin commented Sep 9, 2015

@rkuska asked me to file a bug against python in Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=1261445

@audreyfeldroy
Copy link
Collaborator

We just moved cached-property's Travis builds to use the container-based infrastructure. There, we ran into what appears to be the same issue: https://travis-ci.org/pydanny/cached-property/jobs/93074100

@cedk
Copy link

cedk commented Nov 20, 2017

I got a similar bug report for the Gentoo Ebuild: https://bugs.gentoo.org/638250

@pydanny
Copy link
Owner

pydanny commented Sep 30, 2018

Is this still an issue? All current tests are passing: https://travis-ci.org/pydanny/cached-property

@DarthGandalf
Copy link

With 1.5.1:

==================== FAILURES ======================
____ TestCachedPropertyWithTTL.test_threads_ttl_expiry _________

self = <tests.test_cached_property.TestCachedPropertyWithTTL testMethod=test_threads_ttl_expiry>

    def test_threads_ttl_expiry(self):
        Check = CheckFactory(self.cached_property_factory(ttl=100000), threadsafe=True)
        check = Check()
        num_threads = 5
    
        # Same as in test_threads
        check.run_threads(num_threads)
        self.assert_cached(check, num_threads)
        self.assert_cached(check, num_threads)
    
        # The cache expires in the future
        with freeze_time("9999-01-01"):
            check.run_threads(num_threads)
>           self.assert_cached(check, 2 * num_threads)

tests/test_cached_property.py:207: 
_ _ _ _ _ _ _ _ 
tests/test_cached_property.py:69: in assert_cached
    self.assertEqual(check.add_cached, expected)
E   AssertionError: 6 != 10
======= 1 failed, 29 passed in 8.89 seconds ==========

All current tests are passing

Perhaps Travis CPU isn't loaded enough for the race to trigger?

gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Aug 17, 2020
* Mask problematic 'test_threads_ttl_expiry' test fails
  with high core count:
    pydanny/cached-property#24

Bug: https://bugs.gentoo.org/638250
Package-Manager: Portage-3.0.3, Repoman-3.0.0
Signed-off-by: David Seifert <[email protected]>
NeddySeagoon pushed a commit to NeddySeagoon/gentoo-arm64 that referenced this issue Aug 18, 2020
* Mask problematic 'test_threads_ttl_expiry' test fails
  with high core count:
    pydanny/cached-property#24

Bug: https://bugs.gentoo.org/638250
Package-Manager: Portage-3.0.3, Repoman-3.0.0
Signed-off-by: David Seifert <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants