Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marker v2 #116

Merged
merged 42 commits into from
May 10, 2024
Merged

Marker v2 #116

merged 42 commits into from
May 10, 2024

Conversation

VikParuchuri
Copy link
Owner

@VikParuchuri VikParuchuri commented May 9, 2024

Basically a full rewrite!

  • Extracts and saves images
  • Improved table formatting
  • Better markdown wrapping
  • Better reading order on complex docs
  • Improved OCR engine with more language options
  • Simple pip package install (no more required system dependencies), so can be used easily on Windows
  • Can be used commercially (pymupdf and layoutlmv3 dependencies removed)

It takes ~2x as long to run now, but seems like a decent tradeoff.

See README for installation and usage.

@VikParuchuri VikParuchuri merged commit 6f8b239 into master May 10, 2024
1 check passed
@github-actions github-actions bot locked and limited conversation to collaborators May 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants