Skip to content

Commit

Permalink
docs: add Docling loader docs (#29104)
Browse files Browse the repository at this point in the history
### Description
This adds the docs for the Docling document loader.
[Docling](https://github.com/DS4SD/docling) parses PDF, DOCX, PPTX,
HTML, and other formats into a rich unified representation including
document layout, tables etc., making them ready for generative AI
workflows like RAG.

Some references:
- https://research.ibm.com/blog/docling-generative-AI
-
https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai
- [Docling Technical Report](https://arxiv.org/abs/2408.09869)

The introduced `DoclingLoader` enables users to:
- use various document types in their LLM applications with ease and
speed, and
- leverage Docling's rich representation for advanced, document-native
grounding.

### Issue
Replacing PR #27987 as discussed with @efriis
[here](#27987 (comment)).

### Dependencies
None

---------

Signed-off-by: Panos Vagenas <[email protected]>
  • Loading branch information
vagenas authored Jan 9, 2025
1 parent cc55e32 commit 858f655
Show file tree
Hide file tree
Showing 4 changed files with 621 additions and 0 deletions.
Loading

0 comments on commit 858f655

Please sign in to comment.