Concurrency implementation using batch processor #106

berland · 2020-03-13T10:57:46Z

Very much work-in-progress.

berland · 2020-04-15T06:25:25Z

Code runs through test. Using concurrency on the test workloads leads to slow-down, the penalty of sending dataframes back and forth between Python processes outweighs the benefit from concurrency.

berland · 2020-04-15T07:56:55Z

Self-assessment, concurrent code is missing in add_from_runpathfile(), apply() and get_smry() in the ScratchEnsemble object.

There is no concurrent code in EnsembleSet. Paralellizing this has very little potential, assuming the there are always more realizations in an ensemble, than ensembles in an ensembleset.

asnyv · 2020-04-15T10:42:44Z

There is no concurrent code in EnsembleSet. Paralellizing this has very little potential, assuming the there are always more realizations in an ensemble, than ensembles in an ensembleset.

Flownet might be an exception to that assumption @anders-kiaer ?

anders-kiaer · 2020-04-15T11:21:21Z

Flownet might be an exception to that assumption @anders-kiaer ?

I actually think @wouterjdb did some test on concurrency earlier, using flownet + fmu-ensemble?

src/fmu/ensemble/__init__.py

wouterjdb · 2020-04-15T12:07:33Z

Self-assessment, concurrent code is missing in add_from_runpathfile(), apply() and get_smry() in the ScratchEnsemble object.

There is no concurrent code in EnsembleSet. Paralellizing this has very little potential, assuming the there are always more realizations in an ensemble, than ensembles in an ensembleset.

This holds for FlowNet just as well.

berland · 2020-04-16T06:16:14Z

This is now potentially done, ~~except for a Py27 blocker as it turned out I have used a Py3-only feature in a part of the code that is only to be run on Py3.~~

asnyv

Looks good 👍 Left some comments

src/fmu/ensemble/ensemble.py

tests/test_batch.py

tests/test_webviz_subsurface_testdata.py

berland · 2020-04-17T14:51:04Z

get_volumetric_rates() needs parallelization.

asnyv · 2020-05-20T11:07:08Z

src/fmu/ensemble/ensemble.py

+                loaded_reals = [
+                    executor.submit(
+                        ScratchRealization,
+                        realdir,
+                        realidxregexp=realidxregexp,
+                        autodiscovery=autodiscovery,
+                        batch=batch,
+                    ).result()
+                    for realdir in globbedpaths
+                ]


Not really concurrent, as the .result() waits for the subprocess to finish for each iteration. Should rather do something like below (which is how it is done in most of the rest of this PR, so this is probably just something you forgot to fix).

Suggested change

loaded_reals = [

executor.submit(

ScratchRealization,

realdir,

realidxregexp=realidxregexp,

autodiscovery=autodiscovery,

batch=batch,

).result()

for realdir in globbedpaths

]

reals_futures = [

executor.submit(

ScratchRealization,

realdir,

realidxregexp=realidxregexp,

autodiscovery=autodiscovery,

batch=batch,

)

for realdir in globbedpaths

]

loaded_reals = [x.result() for x in reals_futures]

asnyv · 2020-05-20T11:08:07Z

src/fmu/ensemble/ensemble.py

+            with ProcessPoolExecutor() as executor:
+                loaded_reals = [
+                    executor.submit(
+                        ScratchRealization,
+                        row.runpath,
+                        index=int(row.index),
+                        autodiscovery=False,
+                        find_files=[row.eclbase + ".DATA", row.eclbase + ".UNSMRY",],
+                        batch=batch,
+                    ).result()
+                    for row in runpath_df.itertuples()
+                ]


Same as above

Suggested change

with ProcessPoolExecutor() as executor:

loaded_reals = [

executor.submit(

ScratchRealization,

row.runpath,

index=int(row.index),

autodiscovery=False,

find_files=[row.eclbase + ".DATA", row.eclbase + ".UNSMRY",],

batch=batch,

).result()

for row in runpath_df.itertuples()

]

with ProcessPoolExecutor() as executor:

reals_futures = [

executor.submit(

ScratchRealization,

row.runpath,

index=int(row.index),

autodiscovery=False,

find_files=[row.eclbase + ".DATA", row.eclbase + ".UNSMRY",],

batch=batch,

).result()

for row in runpath_df.itertuples()

]

loaded_reals = [x.result() for x in reals_futures]

asnyv · 2020-05-20T12:07:04Z

src/fmu/ensemble/realization.py

+        if use_concurrent():
+            # In concurrent mode, caching is not used as
+            # we do not pickle the loaded EclSum objects
+            cache = False


This change is significant. Each call to ScratchRealization.get_eclsum() is quite heavy, and many methods are written with several calls to get_eclsum(), such that the loss of caching might actually increase runtime even with concurrency.

asnyv

Also tested setting lazy_loading=True in the EclSum in get_eclsum to reduce cost of each get_eclsum() call, but that was not a game changer (in some cases slower).

As I see it, all methods that currently run get_eclsum() somewhere in their dependencies are not safe to merge as it is now, without the risk of performance decrease for some users.

tests/test_webviz_subsurface_testdata.py

ertomatic · 2020-12-11T15:12:28Z

Can one of the admins verify this patch?

berland · 2021-02-02T09:27:45Z

Rebased on top of #182

* Batch processing after init on ensembles * Functionality for turning off concurrency * Concurrent apply() * Parallelize add_from_runpathfile * Allow running find_files at init of realizations * Parallelize get_smry()

oysteoh · 2021-02-09T07:40:57Z

src/fmu/ensemble/common.py

+        str(env_var) == "0"
+        or str(env_var).lower() == "false"
+        or str(env_var).lower() == "no"
+    ):


maybe this looks better and reduces the usage of str(..) and lower() ?

env_var = str(os.environ[ENV_NAME]).lower() if( env_var == "0" or env_var == "false" or env_var == "no"): .. ..

berland · 2021-03-17T08:28:39Z

Superseded by #206

berland force-pushed the havbconc branch from 3fc48ae to 36fdcf6 Compare March 24, 2020 10:03

berland force-pushed the havbconc branch 2 times, most recently from 2221324 to 8e8b029 Compare April 14, 2020 20:52

berland requested a review from wouterjdb April 15, 2020 06:24

wouterjdb suggested changes Apr 15, 2020

View reviewed changes

src/fmu/ensemble/__init__.py Outdated Show resolved Hide resolved

berland force-pushed the havbconc branch 2 times, most recently from 7a484fc to c2da73d Compare April 16, 2020 06:09

berland force-pushed the havbconc branch 3 times, most recently from 76ccda8 to 1759358 Compare April 16, 2020 07:15

berland mentioned this pull request Apr 16, 2020

WIP: Concurrent loading of ensemble #77

Closed

berland changed the title ~~WIP: Concurrency implementation using batch processor~~ Concurrency implementation using batch processor Apr 16, 2020

berland force-pushed the havbconc branch 4 times, most recently from 4f7cfbb to 56934c4 Compare April 16, 2020 08:22

berland requested review from asnyv and wouterjdb April 16, 2020 09:06

berland force-pushed the havbconc branch from 56934c4 to eb4fe1d Compare April 16, 2020 18:29

berland requested a review from oysteoh April 17, 2020 07:41

asnyv reviewed Apr 17, 2020

View reviewed changes

berland force-pushed the havbconc branch from eb4fe1d to 6db6234 Compare April 17, 2020 09:17

berland force-pushed the havbconc branch from 24b8438 to 5f81b6d Compare May 12, 2020 15:07

berland mentioned this pull request May 19, 2020

Multiprocessing #11

Open

berland linked an issue May 19, 2020 that may be closed by this pull request

Multiprocessing #11

Open

berland force-pushed the havbconc branch 2 times, most recently from 4555268 to 6ef5556 Compare May 19, 2020 08:17

asnyv reviewed May 20, 2020

View reviewed changes

berland modified the milestones: 1.0, 2.0 Sep 1, 2020

berland added the enhancement New feature or request label Sep 7, 2020

berland force-pushed the havbconc branch from 6ef5556 to f3aa524 Compare December 1, 2020 08:49

oysteoh reviewed Dec 2, 2020

View reviewed changes

tests/test_webviz_subsurface_testdata.py Outdated Show resolved Hide resolved

berland force-pushed the havbconc branch from f3aa524 to 7b26582 Compare February 2, 2021 09:26

berland added 4 commits February 2, 2021 15:12

Refactor to use ecl2df for summary file extraction

7d1d1fc

Move to datetime64 indexed dataframes

9830f63

Concurrency implementation of fmu-ensemble using concurrent.futures

211c3f2

* Batch processing after init on ensembles * Functionality for turning off concurrency * Concurrent apply() * Parallelize add_from_runpathfile * Allow running find_files at init of realizations * Parallelize get_smry()

Avoid eclsum caching, causes segfaults

07c068c

berland force-pushed the havbconc branch from 124144f to 07c068c Compare February 8, 2021 12:30

oysteoh reviewed Feb 9, 2021

View reviewed changes

berland mentioned this pull request Mar 17, 2021

Concurrency for 2.0. Based on ecl2df and no eclsum caching #206

Open

berland closed this Mar 17, 2021

berland removed a link to an issue Mar 18, 2021

Multiprocessing #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency implementation using batch processor #106

Concurrency implementation using batch processor #106

berland commented Mar 13, 2020

berland commented Apr 15, 2020

berland commented Apr 15, 2020

asnyv commented Apr 15, 2020

anders-kiaer commented Apr 15, 2020

wouterjdb commented Apr 15, 2020

berland commented Apr 16, 2020 •

edited

Loading

asnyv left a comment

berland commented Apr 17, 2020

asnyv May 20, 2020

asnyv May 20, 2020 •

edited

Loading

asnyv May 20, 2020

asnyv left a comment •

edited

Loading

ertomatic commented Dec 11, 2020

berland commented Feb 2, 2021

oysteoh Feb 9, 2021

berland commented Mar 17, 2021

Concurrency implementation using batch processor #106

Concurrency implementation using batch processor #106

Conversation

berland commented Mar 13, 2020

berland commented Apr 15, 2020

berland commented Apr 15, 2020

asnyv commented Apr 15, 2020

anders-kiaer commented Apr 15, 2020

wouterjdb commented Apr 15, 2020

berland commented Apr 16, 2020 • edited Loading

asnyv left a comment

Choose a reason for hiding this comment

berland commented Apr 17, 2020

asnyv May 20, 2020

Choose a reason for hiding this comment

asnyv May 20, 2020 • edited Loading

Choose a reason for hiding this comment

asnyv May 20, 2020

Choose a reason for hiding this comment

asnyv left a comment • edited Loading

Choose a reason for hiding this comment

ertomatic commented Dec 11, 2020

berland commented Feb 2, 2021

oysteoh Feb 9, 2021

Choose a reason for hiding this comment

berland commented Mar 17, 2021

berland commented Apr 16, 2020 •

edited

Loading

asnyv May 20, 2020 •

edited

Loading

asnyv left a comment •

edited

Loading