Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Vertex AI tests sometimes don't configure requrements correctly #29501

Closed
1 of 16 tasks
tvalentyn opened this issue Nov 20, 2023 · 5 comments
Closed
1 of 16 tasks
Labels
awaiting triage bug done & done Issue has been reviewed after it was closed for verification, followups, etc. P2 python

Comments

@tvalentyn
Copy link
Contributor

What happened?

It appears that some Postcommits fail with errors like:

apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.8/site-packages/apache_beam/internal/dill_pickler.py", line 418, in loads
    return dill.loads(s)
  File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 275, in loads
    return load(file, ignore, **kwds)
  File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 270, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 472, in load
    obj = StockUnpickler.load(self)
  File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 462, in find_class
    return StockUnpickler.find_class(self, module, name)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.8/site-packages/apache_beam/examples/inference/vertex_ai_image_classification.py", line 35, in <module>
    import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'

It seems as though we sometimes execute vertex ai tests without providing proper requirements file , or we run vertex ai tests multiple times and nondeterministically choose the test result, and one of them is configured incorrectly.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@tvalentyn
Copy link
Contributor Author

Successful job:

https://console.cloud.google.com/dataflow/jobs/us-central1/2023-11-19_10_00_34-4583011806972063427

:~$ gsutil cat gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-runner-1119180029-434421-
...
google-cloud-aiplatform>=1.26.0
tensorflow>=2.12.0:~$ 

Failed job has

:~$ gsutil cat gs://temp-storage-for-end-to-end-tests/staging-it/beamapp-jenkins-1120143716-765175-z6xnthva.1700491036.765352/requirements.txt
pyhamcrest!=1.10.0,<2.0.0
mock<3.0.0
parameterized>=0.7.1,<0.8.0

@tvalentyn
Copy link
Contributor Author

i wonder if there is a race in

cp $REQUIREMENTS_FILE postcommit_requirements.txt
that results in some other test suite overriding this file.

@tvalentyn
Copy link
Contributor Author

Let's reopen if this reappears

@github-actions github-actions bot added this to the 2.53.0 Release milestone Nov 21, 2023
@tvalentyn tvalentyn added the done & done Issue has been reviewed after it was closed for verification, followups, etc. label Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting triage bug done & done Issue has been reviewed after it was closed for verification, followups, etc. P2 python
Projects
None yet
Development

No branches or pull requests

1 participant