Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Document Recognition and Indexing in Application #61

Open
casualcomputer opened this issue Jun 22, 2024 · 1 comment
Open
Labels
bug Something isn't working windows Windows related usability issues

Comments

@casualcomputer
Copy link

casualcomputer commented Jun 22, 2024

Issue
The application fails to recognize newly uploaded documents despite logs showing that the documents are indexed.

Troubleshooting
I initiated the user interface and adjusted the settings to retrieve up to 10 documents per process, an increase from the default setting of three. I then uploaded two separate batches of documents for processing. The first batch contained four documents. Post-upload, I queried the LLM (LLAMA3-8B from ollama) about the number of documents uploaded, and it correctly identified all four, which matched my expectations and was confirmed by visible text pre-processing in the logs. I further validated this by requesting summaries of these four documents, which the LLM accurately provided.

Subsequently, I uploaded a second batch consisting of 10 documents. Unlike the first batch, the log indexed these documents swiftly but did not display the pre-processing progress bar observed with the previous upload. When I asked the LLM how many documents had been uploaded, it still responded with the original four. To verify the presence of the new documents, I referenced them by name in my query (these new documents are present in the "data" folder as well), but the LLM did not recognize any of the newly uploaded documents.

How To Reproduce
Launch the application UI.
Change the setting to retrieve 10 documents per process (default is 3).
Upload the first batch of 4 documents:
Observe and confirm via the LLM query and logs that 4 documents are processed and indexed.
Upload a second batch of 10 documents:
Notice the absence of the pre-processing progress bar.
Query the system for the count of uploaded documents; it incorrectly reports only the initial 4 documents.

Expected Behavior
The application should index each new batch of documents and update the document count accordingly. The pre-processing progress bar should appear for each batch, indicating that processing is occurring. Queries about the document count should reflect the total number of documents successfully uploaded and processed.

Desktop (please complete the following information):

  • OS: Windows 11
  • Browser: Chrome
@casualcomputer casualcomputer added the bug Something isn't working label Jun 22, 2024
@jonfairbanks jonfairbanks added the windows Windows related usability issues label Jun 24, 2024
@jonfairbanks
Copy link
Owner

Please checkout the Troubleshooting Guide. See if all of the documents are making it into the documents state.

This may be similar to #48. Streamlit really wants things to run from top to bottom and doesn't like partial changes. The Streamlit cache setup is currently a large pain point for the current setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working windows Windows related usability issues
Projects
None yet
Development

No branches or pull requests

2 participants