Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[not for merge] DESC release branch #2012

Draft
wants to merge 968 commits into
base: master
Choose a base branch
from
Draft

[not for merge] DESC release branch #2012

wants to merge 968 commits into from

Conversation

benclifford
Copy link
Collaborator

@benclifford benclifford commented Apr 20, 2021

This branch is ongoing with various DESC users for using parsl with the LSST DESC workflows. This PR is to get it tested in parsl CI.

This branch is made from an stgit patch stack which lives on @benclifford 's laptop, with regular merges from master. Don't push changes to this branch - rather, get them put on master and poke @benclifford for an update.

I'm migrating patches from this stack over the years onto master and adding new stuff on the end - hopefully eventually the patch stack, and this branch with it, should evaporate entirely.

see issues #2178 for this particular report, 
and #2014 for overview of tasks[id] elimination.
Without this, the tasks continue to run while subsequent local
tests happen. This is at the least confusing in logs - I haven't
checked if it introduces any actual testing problems.
The full repr for an executor is usually very long and includes all
of the configuration information for that exception, so previously
a BadStateException looked like this, which is awkwardly large:

parsl.executors.errors.BadStateException: Executor HighThroughputExecutor(
    address=None,
    address_probe_timeout=None,
    cores_per_worker=1,
    cpu_affinity='none',
    heartbeat_period=30,
    heartbeat_threshold=120,
    interchange_port_range=(55000, 56000),
    label='htex_local',
    launch_cmd='executable_that_hopefully_does_not_exist_1030509.py',
    managed=True,
    max_workers=1,
    mem_per_worker=None,
    poll_period=1,
    prefetch_capacity=0,
    provider=LocalProvider(
        channel=LocalChannel(
            envs={},
            script_dir='/home/runner/work/parsl/parsl/runinfo/030/submit_scripts',
            userhome='/home/runner/work/parsl/parsl'
        ),
        cmd_timeout=30,
        init_blocks=1,
        launcher=SimpleLauncher(debug=True),
        max_blocks=1,
        min_blocks=0,
        move_files=None,
        nodes_per_block=1,
        parallelism=1,
        worker_init=''
    ),
    storage_access=None,
    worker_debug=True,
    worker_logdir_root=None,
    worker_port_range=(54000, 55000),
    worker_ports=None,
    working_dir=None
) failed due to:        STDERR: /home/runner/work/parsl/parsl/runinfo/030/submit_scripts/parsl.localprovider.1642801165.5067718.sh: line 3: executable_that_hopefully_does_not_exist_1030509.py: command not found

After this commit, the exception instead looks like this:

parsl.executors.errors.BadStateException: Executor htex_local failed due to:    STDERR: /home/benc/parsl/src/parsl/runinfo/000/submit_scripts/parsl.localprovider.1643124620.9775066.sh: line 3: executable_that_hopefully_does_not_exist_1030509.py: command not found
Without this, the interchange process continues running for the duration
of the pytest lifetime, overlapping with other tests.

This makes CI hangs harder to debug.
…led test suite rather than a hang"

This reverts commit 1b4adc5.

This commit was causing a different shutdown problem in normal htex operation, so reverting while I debug
… values, rather than repr of unresolved app futures, are recorded in the database
This gave measurable performance improvement under very high load/shared fs
situations, but it is a lot of additional complexity in the code to achieve
that.

Reducing import cost is probably a good goal, but this way is probably not
the way to do it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant