-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(Unstructured): Pin Client version before breaking changes #1309
base: main
Are you sure you want to change the base?
fix(Unstructured): Pin Client version before breaking changes #1309
Conversation
Hello, @lambda-science! Before pinning the version, I would like to reproduce the bug (so that we can also understand how to update the integration in the future)... Could you share more details/a reproducible example? |
Hi @anakin87
dependencies = [
"haystack-ai==2.9.0",
"unstructured-fileconverter-haystack>=0.4.1",
"unstructured-client>=0.26",
] Run this script next to sample4.docx from haystack_integrations.components.converters.unstructured import UnstructuredFileConverter
UNSTRUCTURED_SETTINGS = {
"skip_infer_table_types": "[]",
"chunking_strategy": "by_title",
"combine_under_n_chars": "1000",
"new_after_n_chars": "1500",
"max_characters": "2000",
"pdf_infer_table_structure": "True",
"languages": ["eng", "fra"],
"strategy": "fast",
}
converter = UnstructuredFileConverter(api_url="http://localhost:8002/general/v0/general",
document_creation_mode="one-doc-per-element",
unstructured_kwargs=UNSTRUCTURED_SETTINGS)
documents = converter.run(paths=["sample4.docx"])
print(documents) Results: Converting files to Haystack Documents: 0it [00:00, ?it/s]WARNING: Unstructured could not process file sample4.docx. Error: 1 validation error for PartitionParameters
skip_infer_table_types
Input should be a valid list [type=list_type, input_value='[]', input_type=str]
For further information visit https://errors.pydantic.dev/2.10/v/list_type
Converting files to Haystack Documents: 1it [00:00, 1.67it/s]
{'documents': []} If you remove the Converting files to Haystack Documents: 0it [00:00, ?it/s]WARNING: Unstructured could not process file sample4.docx. Error: General.partition() takes 1 positional argument but 2 were given
Converting files to Haystack Documents: 1it [00:00, 1.78it/s]
{'documents': []} instead of the list of doc if you pin <0.26 |
Here is the sample docx |
Thanks for the detailed report! I'll take a look... |
@lambda-science
Could you please check? |
I'm on 3.11 I'll check if it's 3.11 specific when I get a moment :) |
Related Issues
There is breaking changed in unstructured client 0.26 that are quite annoying.
See: https://docs.unstructured.io/api-reference/api-services/sdk-python#migration-guide
Mainly we get this error:
ypeError: General.partition() takes 1 positional argument but 2 were given
Proposed Changes:
Solved by pinning client version to latest compatible version.
How did you test it?
In my own project
Notes for the reviewer
Don't know why test don't catch it or if it's only on my local setup with self-hosted latest version unstructured-api (0.82.0 dec 2024)
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.