Skip to content

Merge pull request #44 from VikParuchuri/dev #17

Merge pull request #44 from VikParuchuri/dev

Merge pull request #44 from VikParuchuri/dev #17

Workflow file for this run

name: Integration test with benchmark
on: [push]
env:
TESSDATA_PREFIX: "/usr/share/tesseract-ocr/4.00/tessdata"
TORCH_DEVICE: "cpu"
OCR_ENGINE: "tesseract" # So we don't have to install ghostscript, which takes a long time
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.11
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Install system dependencies
run: |
sudo apt-get update
cat scripts/install/apt-requirements.txt | xargs sudo apt-get install -y
- name: Show tessdata folders
run: ls /usr/share/tesseract-ocr/
- name: Install python dependencies
run: |
pip install poetry
poetry install
poetry remove torch
poetry run pip install torch --index-url https://download.pytorch.org/whl/cpu
- name: Download benchmark data
run: |
wget -O benchmark_data.zip "https://drive.google.com/uc?export=download&id=1ktVDYPEeyHlKLaF56FnHjI5VjVnYa1xL"
unzip benchmark_data.zip
- name: Run benchmark test
run: |
poetry run python benchmark.py benchmark_data/pdfs benchmark_data/references report.json
poetry run python scripts/verify_benchmark_scores.py report.json