Name		Name	Last commit message	Last commit date
parent directory ..
imgs		imgs
util		util
weights		weights
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
demo.ipynb		demo.ipynb
omniparser.py		omniparser.py
requirements.txt		requirements.txt
utils.py		utils.py

README.md

Thanks for sharing! I’ll align the OmniParser project details to this template for the README.

OmniParser - Multimodal Document and Screen Parsing Application

This repository features an advanced multimodal parsing application using OmniParser by Microsoft, a robust vision-language model (VLM) developed to interpret complex visual and text elements in documents and screen layouts. OmniParser offers precise, context-aware parsing, making it ideal for applications in automated workflows and agent pipelines.

Overview

OmniParser is a sophisticated model for extracting data from documents, screens, and layouts that integrate both text and visuals. Its specialized architecture allows it to recognize both what an element is and where it is located, significantly enhancing tasks like form automation and screen parsing. OmniParser handles structured, semi-structured, and unstructured data, adapting easily to diverse formats.

This repo includes:

Streamlit applications for image parsing and screen layout analysis.
Simple setup scripts to enable you to deploy OmniParser with minimal configuration.

Tools & Frameworks:

OmniParser: Vision-Language Model specialized for parsing multimodal layouts.
Streamlit: Web app framework to demonstrate OmniParser’s parsing and visual layout capabilities.

Steps to Run

1. Clone the Repository

git clone https://github.com/aryankargwal/genai-tutorials
cd omniparser

2. Install Dependencies

Use the command below to install the necessary requirements:

pip install -e .

3. Download Weights

Then download the model ckpts files in: https://huggingface.co/microsoft/OmniParser, and put them under weights/, default folder structure is: weights/icon_detect, weights/icon_caption_florence, weights/icon_caption_blip2.

4. Run the Streamlit Applications

For document parsing and layout analysis:

streamlit run app.py


## Features

- **Multimodal Parsing**: OmniParser interprets text, tables, forms, and other layout components with high accuracy.
- **Screen Layout Analysis**: Extracts and visualizes text and element positions, ideal for automating software interaction and support workflows.
- **Efficient and Scalable**: Optimized for real-time performance, providing rapid insights even with complex layouts.

## Future Work
- **Enhanced Fine-tuning**: Additional tutorials for fine-tuning OmniParser on specialized datasets.
- **Expanded Use Cases**: Further explorations into applications for customer service, virtual assistance, and automated form-filling.

## License
This project is licensed under the Apache 2.0 License. See the full license [here](LICENSE).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

omniparser

omniparser

README.md

OmniParser - Multimodal Document and Screen Parsing Application

Overview

Tools & Frameworks:

Steps to Run

1. Clone the Repository

2. Install Dependencies

3. Download Weights

4. Run the Streamlit Applications

Files

omniparser

Directory actions

More options

Directory actions

More options

Latest commit

History

omniparser

Folders and files

parent directory

README.md

OmniParser - Multimodal Document and Screen Parsing Application

Overview

Tools & Frameworks:

Steps to Run

1. Clone the Repository

2. Install Dependencies

3. Download Weights

4. Run the Streamlit Applications