Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add notebook example for Batch API #77

Merged
merged 4 commits into from
Jan 2, 2025
Merged

Add notebook example for Batch API #77

merged 4 commits into from
Jan 2, 2025

Conversation

boqiny
Copy link
Member

@boqiny boqiny commented Jan 2, 2025

User description

Description

  • Refactor the code in python script to Jupyter notebook for better vis
  • Fix minor type in ReadME
  • Remove unnecessary .py file.

Related Issue

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code refactoring
  • Performance improvement

How Has This Been Tested?

Screenshots (if applicable)

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Notes


PR Type

Documentation, Enhancement


Description

  • Added two new Jupyter notebooks for Batch API examples:

    • parse_batch_fetch.ipynb for fetching responses.
    • parse_batch_upload.ipynb for uploading folders.
  • Updated the JSONL file path in parse_batch_fetch.py.

  • Added a TODO comment for status check in response processing.

  • Improved readability in the README.md documentation.


Changes walkthrough 📝

Relevant files
Enhancement
parse_batch_fetch.py
Update JSONL file path and add TODO comment                           

examples/parse_batch_fetch.py

  • Updated the sample JSONL file path.
  • Added a TODO comment for status check in response processing.
  • +2/-2     
    Documentation
    README.md
    Improve readability in documentation list                               

    README.md

    • Added a blank line for better readability in the list.
    +1/-0     
    parse_batch_fetch.ipynb
    Add notebook for batch API fetch example                                 

    examples/parse_batch_fetch.ipynb

  • Added a new notebook for batch API fetch example.
  • Demonstrated concurrent response processing and markdown retrieval.
  • Included sample outputs and logging configuration.
  • +206/-0 
    parse_batch_upload.ipynb
    Add notebook for batch API upload example                               

    examples/parse_batch_upload.ipynb

  • Added a new notebook for batch API upload example.
  • Demonstrated folder upload and response saving.
  • Included status checking and sample outputs.
  • +205/-0 

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link

    github-actions bot commented Jan 2, 2025

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Missing Status Check

    A TODO comment was added to include a status check in the process_response function, but it has not been implemented. This could lead to incomplete or incorrect processing of responses.

    if markdown:  # TODO: add status check here
    Error Handling

    In the notebook, the process_response function logs errors but does not handle them in a way that ensures the program can recover or retry. This could lead to incomplete processing of responses.

    "def process_response(response):\n",
    "    \"\"\"Process a single response by retrieving markdown content\"\"\"\n",
    "    request_id = response[\"requestId\"]\n",
    "    try:\n",
    "        markdown = ap.batches.retrieve(request_id)\n",
    "        if markdown:\n",
    "            response[\"result\"] = [markdown.result[0] if markdown.result else \"\"]\n",
    "            response[\"requestStatus\"] = \"COMPLETED\"\n",
    "            response[\"completionTime\"] = markdown.completionTime\n",
    "    except Exception as e:\n",
    "        logger.error(f\"Error processing {request_id}: {str(e)}\")\n",
    "        response[\"error\"] = [str(e)]\n",
    "    return response"
    

    Copy link

    github-actions bot commented Jan 2, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    General
    Add error handling for exceptions in threads during concurrent processing

    Add error handling for the concurrent processing block to ensure exceptions in
    threads do not crash the entire program.

    examples/parse_batch_fetch.ipynb [115-124]

     with ThreadPoolExecutor(max_workers=MAX_WORKER) as executor:
         future_to_response = {
             executor.submit(process_response, response): response
             for response in responses
         }
         updated_responses = []
         for future in as_completed(future_to_response):
    -        updated_response = future.result()
    -        updated_responses.append(updated_response)
    +        try:
    +            updated_response = future.result()
    +            updated_responses.append(updated_response)
    +        except Exception as e:
    +            logger.error(f"Error in thread execution: {str(e)}")
    Suggestion importance[1-10]: 10

    Why: The suggestion introduces error handling for exceptions in threads, ensuring that the entire program does not crash due to errors in individual threads. This significantly enhances the robustness and fault tolerance of the concurrent processing block.

    10
    Possible issue
    Validate the existence of the response_file before attempting to open it

    Add a validation step to check if the response_file exists before attempting to open
    it to avoid runtime errors.

    examples/parse_batch_fetch.ipynb [73-75]

     response_file = "./sample_data_20250102103047.jsonl"
    +if not os.path.exists(response_file):
    +    raise FileNotFoundError(f"Response file {response_file} does not exist.")
     with open(response_file, "r") as f:
         responses = [json.loads(line) for line in f]
    Suggestion importance[1-10]: 9

    Why: Adding a validation step to check if the response_file exists before opening it prevents runtime errors and ensures the program handles missing files gracefully. This is a critical improvement for reliability.

    9
    Ensure the markdown object is properly validated before processing

    Add a proper status check for the markdown object to ensure it contains valid and
    complete data before processing it further.

    examples/parse_batch_fetch.py [39-42]

    -if markdown:  # TODO: add status check here
    -    response["result"] = [markdown.result[0] if markdown.result else ""]
    +if markdown and hasattr(markdown, 'result') and markdown.result:
    +    response["result"] = [markdown.result[0]]
         response["requestStatus"] = "COMPLETED"
         response["completionTime"] = markdown.completionTime
    Suggestion importance[1-10]: 8

    Why: The suggestion adds a proper validation check for the markdown object, ensuring it contains valid and complete data before processing. This improves the robustness of the code and prevents potential runtime errors.

    8

    @lingjiekong lingjiekong requested a review from Copilot January 2, 2025 16:10

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

    Comments suppressed due to low confidence (2)

    examples/parse_batch_fetch.ipynb:90

    • [nitpick] The nested if statement could be simplified for better readability.
    if markdown and markdown.result:
    

    examples/parse_batch_fetch.ipynb:43

    • Document the purpose of the MAX_WORKER variable for better understanding.
    MAX_WORKER = 10
    
    Copy link
    Member

    @lingjiekong lingjiekong left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM with comments

    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Shall we remove these .py file and only use ipynb?

    Copy link
    Member Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    removed

    @lingjiekong lingjiekong requested a review from Copilot January 2, 2025 16:12

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

    Copy link
    Member

    @lingjiekong lingjiekong left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    can we merge upload and fetch into a single notebook to ease the use

    @lingjiekong lingjiekong merged commit ca4ac5f into main Jan 2, 2025
    1 check passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants