This repository has been archived by the owner on Oct 25, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 211
[NeuralChat] Enable RAG's table extraction and summary #1417
Open
xmx-521
wants to merge
13
commits into
main
Choose a base branch
from
manxin/rag_table_summary
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
7781a6e
enable table and table summary for rag pdf
xmx-521 222ee81
fix code format
xmx-521 dc0faf8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d5b93dd
fix environment issue
xmx-521 d8687ef
fix key error
xmx-521 824c48e
fix two parameters
xmx-521 81ca43d
fix line too long
xmx-521 e7e5331
clear code, update README
xmx-521 f71b602
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 39bd8d7
polish pr
ClarkChin08 dddf4de
Merge branch 'main' into manxin/rag_table_summary
xmx-521 460110a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] ee601db
polish readme
ClarkChin08 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+365 KB
intel_extension_for_transformers/neural_chat/assets/docs/LLAMA2_page6.pdf
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the code, seems "fast" table_strategy would only return None instead of table content, is this somewhat unreasonable?
It appears "hq" strategy uses unstructured pkg to extract table, I also used this pkg, and find it actually performed worse than table-transformer.
Also does the "llm" strategy return the reliable table contents? From the code, looks like it uses LLM and a prompt to generate the table summarization of the document, but from my previous experience, such way would generate results that significantly deviate the table content sometimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for insightful comments, my opinion on these issues are as follows:
In fact, by default, our program will use OCR to extract all text information in files including table information, which has been implemented in other PRs. This PR is just to further enhance the understanding of the table, so no content is returned in fast mode (fast mode is also the default mode).
At present, we do use unstructured to extract table information, and the extraction performance is quite satisfactory. We have not tried the table transformer, but it is indeed worth considering.
Your understanding of what llm mode does is correct. It is true that llm's table summary is not completely reliable, but according to the experimental results, there will be much better table QA performance in llm mode overall.