Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: cyclic pipelines should run stable #171

Merged
merged 22 commits into from
Jan 24, 2025
Merged

fix: cyclic pipelines should run stable #171

merged 22 commits into from
Jan 24, 2025

Conversation

mathislucka
Copy link
Member

@mathislucka mathislucka commented Jan 20, 2025

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Note that the PR does not add any new code in base.py. Check the original branch to see the diff. The PR only removes code from base.py.

Also note, that the AsyncPipeline will not run anymore when this change is merged to haystack main as it relies on some methods in PipelineBase that I removed.

Checklist

@mathislucka mathislucka requested a review from a team as a code owner January 20, 2025 15:26
@mathislucka mathislucka requested review from davidsbatista and removed request for a team January 20, 2025 15:26
@coveralls
Copy link

coveralls commented Jan 20, 2025

Pull Request Test Coverage Report for Build 12952190375

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-14.3%) to 64.394%

Totals Coverage Status
Change from base Build 12948231503: -14.3%
Covered Lines: 1483
Relevant Lines: 2303

💛 - Coveralls

@davidsbatista
Copy link
Contributor

@mathislucka I think this PR should have at least one more reviewer, ideally +2, this is a big change

@mathislucka
Copy link
Member Author

@mathislucka I think this PR should have at least one more reviewer, ideally +2, this is a big change

Yes, @Amnah199 already started reviewing the PR in Haystack (core). @julian-risch will also review.

@davidsbatista
Copy link
Contributor

we need to add Pipeline to the correct __init__.py hierarchy so that after the release users can simply do:

from haystack_experimental import Pipeline

I've been using it to test several pipelines and always have to do a long import, i.e.:

from haystack_experimental.core.pipeline.pipeline import Pipeline

pyproject.toml Outdated Show resolved Hide resolved
@davidsbatista
Copy link
Contributor

regarding the test files - I tried to run them but it failed due to a mixing fixture - fixture 'spying_tracer' not found - then later, talking with @julian-risch, I understood that we want to keep the behavioral tests out of this repo (at least for now).

I would suggest we completely remove them or add them in total/working - just to keep the PR in a clean state - we can then later add the full tests in another PR - so that if we need to rollback for whatever reason things are in a clean state.

@mathislucka
Copy link
Member Author

mathislucka commented Jan 24, 2025

I found another bug that affects non-cyclic pipelines while working on super components.

Consider the pipeline that is created here:

And have a look at the example notebook for the AutoFileConverter.

I found that depending on the insertion and connection order, the tabular_joiner might run with partial inputs which leads to a change in results.

The tabular_joiner will run twice in these cases and it will only ever output either the documents coming from joiner or from the csv converter.

This happens because after running all the file converters, the tabular_joiner can be the only component in the run queue. Once we pop it from the run queue it will be empty.

This is a problem because: https://github.com/deepset-ai/haystack/blob/c989d9c483a4fc74147e17065779fd6103f4a084/haystack/core/pipeline/pipeline.py#L429

Will check the empty run queue (all([])) which is True. The component will then be executed, removing (for example) the output from the csv converter from its inputs. This will lead to incomplete inputs when it runs again.

Edit: To clarify, this bug affects the original logic but probably not this one. We should add a test case though.

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!
We tested many different pipelines and will continue with comparing traces of pipelines running with the old and the new logic after this PR is merged. We're checking for any differences in pipeline execution order, changes in component inputs/outputs, and decreases/increases in speed.
In addition, we're still testing the new logic with a custom component that is greedy variadic and has an optional input as described here. And we'll need to add a test case for the tabular_joiner issue described above.

@julian-risch julian-risch merged commit 03d9fa2 into main Jan 24, 2025
8 checks passed
@julian-risch julian-risch deleted the fix/pipeline_run branch January 24, 2025 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants