Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Consider Adding Batch Processing Support to Reduce Azure AI Costs #91

Open
lambolambert opened this issue Nov 7, 2024 · 3 comments

Comments

@lambolambert
Copy link

lambolambert commented Nov 7, 2024

I'd like to propose adding batch processing capabilities to optimize costs when processing documents through Azure AI services. Currently, it seems each page/document requires individual API calls, which could become costly at scale

Current Challenge
From what I understand, the system processes documents individually, which means:

  • Each document/page requires a separate API call
  • Higher costs due to per-request pricing
  • Potential performance bottlenecks with large volumes

Benefits

  • Lower processing costs through batch operations
  • Improved throughput for large-scale processing
  • More efficient resource utilization
  • Better handling of rate limits

~enhancement ~cost-optimization

@tylermaran
Copy link
Contributor

Hey @lambolambert. Absolutely something we've planned on adding. It will probably start with the OpenAI batch api, and then expand to azure.

Although this would change the implementation quite a bit. It would make a single request with all the documents, and then you would get back a batch id.

Workflow would go something like:

  1. Run zerox with a folder of files in batch mode. You will also need to pass in AWS S3 credentials.
  2. Zerox creates a .jsonl file with the files.
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o", "messages": [{"role": "system", "content": "Markdown pls."},{"role": "user", "content": "https:s3.aws.com/my_file!"}]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o", "messages": [{"role": "system", "content": "Markdown pls."},{"role": "user", "content": "https:s3.aws.com/my_file!"}]}}
  1. Upload that file to OpenAI, return the job response:
{
  "id": "batch_abc123",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "errors": null,
  "input_file_id": "file-abc123",
  "completion_window": "24h",
  "status": "validating",
  "output_file_id": null,
  "error_file_id": null,
  "created_at": 1714508499,
  "in_progress_at": null,
  "expires_at": 1714536634,
  "completed_at": null,
  "failed_at": null,
  "expired_at": null,
  "request_counts": {
    "total": 0,
    "completed": 0,
    "failed": 0
  },
  "metadata": null
}
  1. ??? You have the job id, and results will be delivered within 24 hours. But not sure what the next step would be on the zerox side. Does it make sense to have a pingForResults function? Since we would need to aggregate all the completion responses into the expected markdown format.

@lambolambert
Copy link
Author

Hi @tylermaran Exciting to hear that batch processing is on the roadmap.

Aggregating responses into a unified format (e.g., markdown or JSON) would be essential. Perhaps implementing a handler that can format the results and store them in a standardized way would streamline post-processing. This could also allow for automatic handling of individual file results, error logging, and even partial retry for any failed documents in the batch. This could be using the custom_id so its doc1_page1, doc1_page2 etc that then uses the orchestrator to bring things back all together and outputs in the class?

Having a pingForResults function could work well for managing the asynchronous nature of batch processing, especially for use cases where it’s critical to track the status of each batch job over extended periods (like 24 hours). A pingForResults function could regularly check for updates and retrieve results once the batch is complete.

Thanks for considering this enhancement. Looking forward to seeing how it develops, and happy to help once it’s underway!

@kzbao
Copy link

kzbao commented Nov 14, 2024

Hi @lambolambert, @tylermaran (fellow YC founder here),

We ran into this exact feature request as we're processing thousands of PDF files asynchronously. We initially used zerox and then realized there was a chance to use the new batch APIs so we put together a library to help do so at https://github.com/Summed-AI/parallex. Would love any feedback and Tyler, perhaps a chance to collaborate if that's something you're interested in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants