[Question]: RAPTOR Stage Stuck After Document Parsing #4173

mayanlong2020 · 2024-12-23T02:08:28Z

Describe your problem

I started parsing a document 10 hours ago. Initially, everything seemed normal, but coz' I enabled the RAPTOR policy, after the document parsing finished at normal speed, it got stuck at the RAPTOR stage with no progress. Out of the 10 hours, less than 1 hours were spent parsing the document, and the rest of the time (up until now) has been stuck at the RAPTOR stage. The CPU, memory, and disk are all in an idle state, and there are no errors in the logs. What could be the reason?

RAGFLOW version:v0.15.0 full (doc engine: es)

> 
> 开始于:
> Sun, 22 Dec 2024 23:57:01 GMT
> 持续时间:
> 36189.00 s
> 进度:
> Start to do RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval).
> Task has been received.
> Page(1~13): OCR started
> Page(1~13): OCR finished (6.26s)
> Page(1~13): Layout analysis (8.27s)
> Page(1~13): Table analysis (0.00s)
> Page(1~13): Text extraction (0.01s)
> Page(1~13): Start to generate keywords for every chunk ...
> Page(1~13): Keywords generation completed in 55.18s
> Page(1~13): Start to generate questions for every chunk ...
> Page(1~13): Question generation completed in 71.36s
> Page(1~13): Generate 85 chunks
> Page(1~13): Embedding chunks (3.12s)
> Page(1~13): Done (2.58s)
> Task has been received.
> Page(13~25): OCR started
> Page(13~25): OCR finished (6.11s)
> Page(13~25): Layout analysis (7.83s)
> Page(13~25): Table analysis (0.00s)
> Page(13~25): Text extraction (0.01s)
> Page(13~25): Start to generate keywords for every chunk ...
> Page(13~25): Keywords generation completed in 57.58s
> Page(13~25): Start to generate questions for every chunk ...
> Page(13~25): Question generation completed in 80.04s
> Page(13~25): Generate 96 chunks
> Page(13~25): Embedding chunks (2.73s)
> Page(13~25): Done (2.44s)
> Task has been received.
> Page(25~37): OCR started
> Page(25~37): OCR finished (6.12s)
> Page(25~37): Layout analysis (7.96s)
> Page(25~37): Table analysis (0.00s)
> Page(25~37): Text extraction (0.01s)
> Page(25~37): Start to generate keywords for every chunk ...
> Page(25~37): Keywords generation completed in 45.91s
> Page(25~37): Start to generate questions for every chunk ...
> Page(25~37): Question generation completed in 58.98s
> Page(25~37): Generate 74 chunks
> Page(25~37): Embedding chunks (2.10s)
> Page(25~37): Done (1.38s)
> Task has been received.
> Page(37~49): OCR started
> Page(37~49): OCR finished (6.81s)
> Page(37~49): Layout analysis (8.68s)
> Page(37~49): Table analysis (0.00s)
> Page(37~49): Text extraction (0.01s)
> Page(37~49): Start to generate keywords for every chunk ...
> Page(37~49): Keywords generation completed in 51.67s
> Page(37~49): Start to generate questions for every chunk ...
> Page(37~49): Question generation completed in 67.50s
> Page(37~49): Generate 82 chunks
> Page(37~49): Embedding chunks (2.64s)
> Page(37~49): Done (2.83s)
> Task has been received.
> Page(49~59): OCR started
> Page(49~59): OCR finished (5.52s)
> Page(49~59): Layout analysis (6.60s)
> Page(49~59): Table analysis (0.00s)
> Page(49~59): Text extraction (0.01s)
> Page(49~59): Start to generate keywords for every chunk ...
> Page(49~59): Keywords generation completed in 59.46s
> Page(49~59): Start to generate questions for every chunk ...
> Page(49~59): Question generation completed in 70.61s
> Page(49~59): Generate 94 chunks
> Page(49~59): Embedding chunks (3.12s)
> Page(49~59): Done (1.81s)

Additionally, according to the processing flow, RAPTOR should be the last process to be executed. However, in the logs, the RAPTOR logs from the final stage are being inserted at the top of the log, instead of at the end. Is this normal or a bug?

The text was updated successfully, but these errors were encountered:

### What problem does this PR solve? #4173 ### Type of change - [x] Performance Improvement

mayanlong2020 · 2024-12-23T08:06:26Z

thanks & cheers~

mayanlong2020 added the question Further information is requested label Dec 23, 2024

KevinHuSh mentioned this issue Dec 23, 2024

Fetch chunk by batches. #4177

Merged

1 task

KevinHuSh added a commit that referenced this issue Dec 23, 2024

Fetch chunk by batches. (#4177)

31d67c8

### What problem does this PR solve? #4173 ### Type of change - [x] Performance Improvement

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: RAPTOR Stage Stuck After Document Parsing #4173

[Question]: RAPTOR Stage Stuck After Document Parsing #4173

mayanlong2020 commented Dec 23, 2024 •

edited

Loading

mayanlong2020 commented Dec 23, 2024

[Question]: RAPTOR Stage Stuck After Document Parsing #4173

[Question]: RAPTOR Stage Stuck After Document Parsing #4173

Comments

mayanlong2020 commented Dec 23, 2024 • edited Loading

Describe your problem

mayanlong2020 commented Dec 23, 2024

mayanlong2020 commented Dec 23, 2024 •

edited

Loading