BUG: overriding intermediate data when no outputs asked #25

ankostis · 2019-10-02T01:44:50Z

In the following diagram, all data are given, and asked is different, depending on whether we expicitly ask it in the outputs:

when asked output asked, it checks overriden is already given and dose not recompute it,
without outputs, it overrides it.

Code to reproduce it:

def test_pruning_not_overrides_given_intermediate():
    # Test #25: not overriding intermediate data when an output is not asked
    graph = compose(name="graph")(
        operation(name="unjustly run", needs=["a"], provides=["overriden"])(lambda a: a),
        operation(name="op", needs=["overriden", "c"], provides=["asked"])(add),
    )
    graph.net.plot('t.png')
    assert graph({'a': 5, 'overriden': 1, "c": 2}, ['asked']) == {'asked': 3}  # that's ok
    assert graph({'a': 5, 'overriden': 1, "c": 2}) == {'a': 5, 'overriden': 1, "c": 2, 'asked': 3}  # FAILs

Root cause:

The code that prunes the predecessors of intermediate data runs only when outputs asked.
The existing "pruning" TCs were too simple to catch this.

Note that the pruning code in v1.2.4 is buggy (#24), so it cannot be used as is.

The text was updated successfully, but these errors were encountered:

needed to refactor the new pruning algorithm. - expected 2 TCs fail for yet-to-be-solved yahoo#24 & yahoo#25 bugs.

ankostis · 2019-10-02T12:48:21Z

Another "too pruning" but more complex example is when a multi-output operation MUST NOT run, not to override a given intermediate input, but it MUST run, to provide other outputs.

In the diagram below, must run is needed for e but must not override overriden.

Code to reproduce:

def test_pruning_multiouts_not_override_intermediates():
    # Test #25: v.1.2.4 overrides intermediate data when a previous operation
    # must run for its other outputs (outputs asked or not)
    netop = compose(name="netop")(
        operation(name="must run", needs=["a"], provides=["overriden", "e"])
        (lambda x: (x, 2 * x)),
        operation(name="op1", needs=["overriden", "c"], provides=["d"])(add),
        operation(name="op2", needs=["d", "e"], provides=["asked"])(lambda x, y: x * y),
    )

    # FAILs
    # - on v1.2.4 with KeyError: 'e',
    # - # - on #18(unsatisfied) + #23(ordered-sets) with empty result.
    assert netop({"a": 5, "overriden": 1, "c": 2}, ["asked"]) == {"asked": 3}
    # FAILs
    # - on v1.2.4 with (overriden, asked) = (5, 70) instead of (1, 13)
    # - # - on #18(unsatisfied) + #23(ordered-sets) like v1.2.4.
    assert (
        netop({"a": 5, "overriden": 1, "c": 2})
        ==
        {"a": 5, "overriden": 1, "c": 2, "asked": 3})

ankostis · 2019-10-02T12:59:18Z

Or this even simpler one:

def test_pruning_multiouts_not_override_intermediates1():
    # Test #25: v.1.2.4 overrides intermediate data when a previous operation
    # must run for its other outputs (outputs asked or not)
    netop = compose(name="netop")(
        operation(name="must run", needs=["a"], provides=["overriden", "calced"])
        (lambda x: (x, 2 * x)),
        operation(name="add", needs=["overriden", "calced"], provides=["asked"])(add),
    )
    netop.net.plot('t.png')
    # FAILs
    # - on v1.2.4 with KeyError: 'e',
    # - on #18(unsatisfied) + #23(ordered-sets) with empty result.
    assert netop({"a": 5, "overriden": 1, "c": 2}, ["asked"]) == {"asked": 3}
    # FAILs
    # - on v1.2.4 with (overriden, asked) = (5, 15) instead of (1, 1)
    # - on #18(unsatisfied) + #23(ordered-sets) like v1.2.4.
    assert (
        netop({"a": 5, "overriden": 1})
        ==
        {"a": 5, "overriden": 1, "calced": 10, "asked": 3})

+ Pruning behaves correctly also when outputs given; this happens by breaking incoming provide-links to any given intermedediate inputs. + Unsatisfied detection now includes those without outputs due to broken links (above). + Remove some uneeded "glue" from unsatisfied-detection code, leftover from previous compile() refactoring. + Renamed satisfiable --> satisfied. + Improved unknown output requested raise-message. + x2 TCs, in yahoo#24 and 1st in yahoo#25 now PASS. - 2x TCs in yahoo#25 still FAIL, and need "Pinning" of given-inputs (the operation MUST and MUST NOT run in these cases).

+ Pruning behaves correctly also when outputs given; this happens by breaking incoming provide-links to any given intermedediate inputs. + Unsatisfied detection now includes those without outputs due to broken links (above). + Remove some uneeded "glue" from unsatisfied-detection code, leftover from previous compile() refactoring. + Renamed satisfiable --> satisfied. + Improved unknown output requested raise-message. + x3 TCs PASS, x1 in yahoo#24 and the first x2 in yahoo#25. - 1x TCs in yahoo#25 still FAIL, and need "Pinning" of given-inputs (the operation MUST and MUST NOT run in these cases).

+ Insert "PinInstructions" in the execution-plan to avoid overwritting. + Add `_overwrite_collector` in `compose()` to collect re-calculated values. + FIX the last TC in yahoo#25.

…must run + Insert "PinInstructions" in the execution-plan to avoid overwritting. + Add `_overwrite_collector` in `compose()` to collect re-calculated values. + FIX the last TC in yahoo#25.

…must run - WIP: PARALLEL execution not adding PINS! + Insert "PinInstructions" in the execution-plan to avoid overwritting. + Add `_overwrite_collector` in `compose()` to collect re-calculated values. + FIX the last TC in yahoo#25.

+ ALL TCs OK after ~10 fails with yahoo#25, yahoo#25 bugs, BUT... - WIP: STILL no PIN on PARALLEL. + move check if value in asked outputs before cache-evicting it in build-execution-plan time - compute methods don't need outputs anymore. + test: speed up parallel/multihtread TCs by reducing delays & repetitions. + refact: network rightfully adopted stray functions for parallel processing - they all worke on the net.graph, + upd: networkx api by indexing on `dag.nodes` views. + enh: add log message when deleting in parallel (in par with sequential code). + refact: var-renames, if-then-else simplifications, pythonisms.

ankostis added a commit to ankostis/graphtik that referenced this issue Oct 2, 2019

test(dag,yahoo#25): FAILing TC for overriding intermediate data

63f76ca

ankostis added a commit to ankostis/graphtik that referenced this issue Oct 2, 2019

test(dag,yahoo#25): FAILing TC for overriding intermediate data

fe76b31

ankostis added a commit to ankostis/graphtik that referenced this issue Oct 2, 2019

test(dag,yahoo#25): FAILing TC for overriding intermediate data

b5b9d6f

ankostis added a commit to ankostis/graphtik that referenced this issue Oct 2, 2019

test(dag,yahoo#25): FAILing TC for overriding intermediate data

6913e91

ankostis added a commit to ankostis/graphtik that referenced this issue Oct 2, 2019

test(dag,yahoo#25): FAILing TC for overriding intermediate data

de02885

ankostis added a commit to ankostis/graphtik that referenced this issue Oct 2, 2019

MERGE (prune_unsatified, ordered_sets) into fix-pruning ...

3736738

needed to refactor the new pruning algorithm. - expected 2 TCs fail for yet-to-be-solved yahoo#24 & yahoo#25 bugs.

ankostis mentioned this issue Oct 3, 2019

FIX: refactor Network and DAG SOLVER to fix bad pruning #26

Open

This was referenced Oct 14, 2019

WIP: Pre-review for v1.3.0 release #31

Open

Release v1.3: new DAG solver, better plotting & "sideffect" huyng/graphkit#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: overriding intermediate data when no outputs asked #25

BUG: overriding intermediate data when no outputs asked #25

ankostis commented Oct 2, 2019 •

edited

Loading

ankostis commented Oct 2, 2019

ankostis commented Oct 2, 2019

BUG: overriding intermediate data when no outputs asked #25

BUG: overriding intermediate data when no outputs asked #25

Comments

ankostis commented Oct 2, 2019 • edited Loading

Code to reproduce it:

Root cause:

ankostis commented Oct 2, 2019

Code to reproduce:

ankostis commented Oct 2, 2019

ankostis commented Oct 2, 2019 •

edited

Loading