-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Llama3.2-vision #106
Comments
Interesting. It seems like it's not encoded correctly for llama 3.2. I'm planning to add the llama 3.2 option to the node package as well, so I'll take a look at both implementations and see what the issue is. It'll be a bit different in the python version since that's going through litellm. |
i‘m also trying this and encountered a similar problem. My error output was:
|
I definitely would like to see llama 3.2 support! |
Currently I do not see a parameter called import litellm
litellm.supports_vision = lambda *args, **kwargs: True
litellm.check_valid_key = lambda *args, **kwargs: True Also LiteLLM has bug with vision Ollama models. Fixed here: BerriAI/litellm#6683 This post is really helpful. Hope to see official support in Zerox. |
Here is the code I tried to use llama3.2-vision by Ollama os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"
model = "ollama/llama3.2-vision"
file_path = (
"/private/var/www/html/ArtisanCloud/X/AgentBuddyWorkspace/demo.pdf"
)
## process only some pages or all
select_pages = (
# 1,2,3,4,5 ## None for all, but could be int or list(int) page numbers (1 indexed)
# 1
None
)
output_dir = Path(
"./data/pdf/out_parse_pdf/result"
) ## directory to save the consolidated markdown file
temp_dir = Path(
"./data/pdf/out_parse_pdf/temp"
) ## directory to save the consolidated markdown file
kwargs = {}
# print(output_dir)
result = await zerox(
file_path=file_path,
model=model,
output_dir=output_dir,
custom_system_prompt=custom_system_prompt,
select_pages=select_pages,
# tempDir=temp_dir,
**kwargs
)
return result But I have got the output below: ❯ python ./playground/ocr/zerox/parse_pdf.py
Pls help me! |
Should you read the comment by me above before posting this, you would solve the issue and leave a thumbs up. |
thx, it works with llama-version by Ollama locally. pls also help check another issue here: #131 I had used the gpt-4o-mini and llama-vision:90b, both parsing the 90% info , except the "name" field. if the issue occurs in both gpt-4o-mini and llama-vision:90b models , I think it should be a tricky parameter in the zeorx. |
Using #64 as reference I was able to run
LLama3.2-vision
however the output seems completely unrelated. I have not modified the prompt in anyway and am passing a multipage pdf.Code:
Output:
ZeroxOutput(completion_time=17628.399, file_name='ec1e22d8_b38e_5507_9350_c49e6137f377_d0__2223530', input_tokens=10240, output_tokens=707, pages=[Page(content='It appears that you\'ve provided a base64-encoded image. I\'ll decode it for you.\n\nHere is the decoded image:\n\n![image](https://i.imgur.com/9Jv6uZ3.png)\n\nPlease note that this is a PNG image, and you can save it to your computer by right-clicking on the link and selecting "Save Image As".', content_length=287, page=1), Page(content="It looks like you've shared an image encoded in Base64 format!\n\nIf I decode and render the image, it appears to be a PNG file containing a simple diagram or graphic. However, without further context or information about what this image represents, I won't be able to provide any additional insights or analysis.\n\nWould you like me to:\n\n1. Provide the decoded image data (as text)?\n2. Try to identify the type of diagram or graphic it represents?\n3. Help with something else related to the image?\n\nLet me know how I can assist!", content_length=526, page=2), Page(content='It looks like you\'ve provided a base64-encoded string, which appears to be an image file. The string is quite long and complex, so I\'ll provide some guidance on how to decode and display it.\n\n**Decoding the string**\n\nTo decode the string, you can use a base64 decoder tool or library in your programming language of choice. Here\'s an example using Python:\n
python\nimport base64\n\nencoded_string = """your_encoded_string_here"""\n\ndecoded_bytes = base64.b64decode(encoded_string)\n\nReplace `your_encoded_string_here` with the actual string you provided.\n\n**Displaying the image**\n\nAfter decoding, you should have a bytes object containing the image data. You can then use a library like Pillow (Python Imaging Library) to display the image:\n
python\nfrom PIL import Image\nimport io\n\ndecoded_bytes = base64.b64decode(encoded_string)\nimage_data = io.BytesIO(decoded_bytes)\n\nimg = Image.open(image_data)\nimg.show()\n\nThis will display the image in your default image viewer.\n\n**Note**: The encoded string appears to be a PNG image, but you may need to adjust the decoding and displaying code depending on the actual file format.', content_length=1129, page=3), Page(content='It looks like you\'ve provided a PNG image encoded as a Base64 string. Here\'s how to extract and display the image:\n\n**Python:**\n\n
python\nimport base64\n\n# Load the Base64 string from your input\nimage_data = 'your_base64_string_here'\n\n# Decode the Base64 string\ndecoded_image_data = base64.b64decode(image_data)\n\n# Save the decoded data as a PNG file (replace "output.png" with your desired filename)\nwith open('output.png', 'wb') as f:\n f.write(decoded_image_data)\n\n\n**Online Tool:**\n\nYou can also use an online Base64 decoder tool to extract and view the image. Simply paste the Base64 string into the tool, select PNG as the output format, and click "Decode". The decoded image will be displayed.\n\nOnce you\'ve extracted the image, you should see a PNG file with the original data.', content_length=789, page=4), Page(content="It looks like you've provided a PNG image encoded in Base64 format. Here's the decoded version:\n\nUnfortunately, I don't see any text or meaningful information within this image. It appears to be an abstract art piece with various shapes and colors.\n\nIf you're trying to extract some specific data from this image, please provide more context about what you're looking for (e.g., a logo, a message, or something else).", content_length=417, page=5)])
The text was updated successfully, but these errors were encountered: