HURIDOCS

All

32 repositories

uwazi
Public
Uwazi is a web-based, open-source solution for building and sharing document collections
open-source pdf data-science database ai documents non-profit
TypeScript
•
MIT License
•80•241•441•6•Updated Nov 5, 2024Nov 5, 2024
pdf_metadata_extraction
Public
pdf_information_extraction
Python
•0•4•0•8•Updated Nov 3, 2024Nov 3, 2024
trainable-entity-extractor
Public
Trainable Entity Extractor
Python
•
Apache License 2.0
•0•0•0•7•Updated Nov 1, 2024Nov 1, 2024
pdf-document-layout-analysis
Public
A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The service allows for the segmentation and classification of different parts of PDF pages, identifying the elements such as texts, titles, pictures, tables and so on.
Python
•
Apache License 2.0
•23•168•1•6•Updated Nov 1, 2024Nov 1, 2024
queue-processor
Public
queue-processor
Python
•0•0•0•0•Updated Nov 1, 2024Nov 1, 2024
react-text-selection-handler
Public
text selection handling and highlighting
TypeScript
•
Apache License 2.0
•0•0•6•1•Updated Oct 31, 2024Oct 31, 2024
dummy_extractor_services
Public
Python
•0•0•0•0•Updated Oct 22, 2024Oct 22, 2024
pdf-document-layout-analysis-async
Public
pdf-document-layout-analysis-async
Python
•0•1•0•5•Updated Oct 22, 2024Oct 22, 2024
pdf-labeled-data
Public
TypeScript
•
Apache License 2.0
•0•3•0•0•Updated Oct 17, 2024Oct 17, 2024
uwazi-documentation
Public
HTML
•
MIT License
•3•2•6•0•Updated Oct 16, 2024Oct 16, 2024
docker-translation-service
Public
docker-translation-service
Python
•
Apache License 2.0
•0•0•0•6•Updated Oct 9, 2024Oct 9, 2024
ml-cloud-connector
Public
ml-cloud-connector
Python
•0•0•0•0•Updated Oct 3, 2024Oct 3, 2024
pdf_ocr_service
Public
An http service to OCR PDFs based on a redis queue.
Python
•
MIT License
•0•1•3•0•Updated Sep 23, 2024Sep 23, 2024
convert-to-pdf-service
Public
An http service to convert documents to PDF based on a redis queue.
Python
•
MIT License
•0•0•3•7•Updated Sep 19, 2024Sep 19, 2024
pdf-tokens-type-labeler
Public
Python
•4•3•1•6•Updated Jul 4, 2024Jul 4, 2024
pdf_paragraphs_extraction
Public
Python
•
MIT License
•7•49•1•4•Updated Jul 4, 2024Jul 4, 2024
pdf-table-of-contents-extractor
Public
This project aims to extract Table of Contents (TOC) information from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of identifying and structuring the document's TOC.
Python
•
Apache License 2.0
•1•5•0•0•Updated Jun 10, 2024Jun 10, 2024
pdf-text-extraction
Public
This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging the segmentation and classification capabilities of the underlying analysis tool, this project automates the process of text extraction from PDF files.
Python
•
Apache License 2.0
•0•18•0•0•Updated Jun 4, 2024Jun 4, 2024
pdf-reading-order
Public
Python
•2•11•0•0•Updated Apr 26, 2024Apr 26, 2024
preserve
Public
Preserve is a tool for capturing and saving online digital content. Integrated with Uwazi, Preserve captures content from websites, social media and communication platforms, and archives them with accompanying key metadata to ensure evidentiary value by establishing and demonstrating authenticity and chain of custody.
TypeScript
•
MIT License
•1•6•12•7•Updated Feb 23, 2024Feb 23, 2024
uwazi-design
Public
0•4•6•0•Updated Jul 3, 2023Jul 3, 2023
topic-classification
Public
Python
•
MIT License
•4•5•10•4•Updated May 25, 2023May 25, 2023
twitter_crawler
Public
twitter crawler
Python
•0•1•0•1•Updated Apr 3, 2023Apr 3, 2023
semantic-search
Public
Python
•3•3•1•3•Updated Dec 27, 2022Dec 27, 2022
mock-semantic-ml-server
Public
Mock server that simulates the ML server that processes documents for semantic search
JavaScript
•0•0•0•1•Updated Dec 10, 2022Dec 10, 2022
classification-utils
Public
Python
•2•0•0•3•Updated Nov 21, 2022Nov 21, 2022
uwazi-fixtures
Public archive
Shell
•3•3•0•0•Updated Jul 1, 2022Jul 1, 2022
python_uwazi_API
Public
Python API to interact with Uwazi
Python
•0•2•0•0•Updated Nov 5, 2021Nov 5, 2021
casebox
Public archive
Casebox: Secure all your information and team communication in one place
JavaScript
•
Other
•117•49•0•0•Updated Oct 22, 2020Oct 22, 2020
OpenEvSys
Public archive
OpenEvSys is free open source software designed for use by organisations who need a software tool to manage information on human rights violations
PHP
•
GNU Affero General Public License v3.0
•20•30•0•3•Updated Oct 22, 2020Oct 22, 2020