-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implemented text recognition (ocr) #272
Conversation
I implemented another comparison method based on OCR. This could be a useful addition in cases where modern game rendering and visual effects (clutter) cause difficulties to find good comparison images. It currently depends on pytesseract and Tesseract-OCR but tests with EasyOCR have also been conducted. Both seem to get similar good recognition results. EasyOCR looks like to cause higher CPU load then tesseract. Tesseract on the other hand is an external dependency that needs to be installed seperatly. The text comparison of the expected and recognized string has two modes. A perfect 1:1 match or the levenshtein ratio. I also introduced two new file config options: * Rectangle position (only used for text files) * FPS limit per text or image file Please let me know what you think of this feature.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
I'll look into the code changes after I come back from GDQ, but I love the idea of a comparison method that specializes in text comparison/recognition. Is the per-file FPS limit necessary to the implementation? Could you split it into a different PR? Idk about the rectangle position option, but maybe it'll make sense once I give the implementation a proper look. |
Hi @Avasam first of all have fun and good luck at GDQ. To your question: yes I find the FPS limit necessary to not max out CPU usage too much.
|
Heya! Div2 Content Creator here. As for the user-perspective regarding Div2: With the OCR this issue would probably be solved, and better yet: There's different activities and i would love to split "complete 5 activities" for example. With Text-pop-ups like "Broadcast Restored", "All Hostages Saved" and "Perimeter Secured" one could scan for the "ed" at the end and properly split those, while keeping "Watch Level up" and other false-positives away. :) Hopefully info from the user-perspective is helpful here as well, if not ignore my comment :D Enjoy GDQ and looking forward to testing the new method if it gets approved =) |
Hello again - not meaning to stress, but i'm really, really looking forward to using this method for a variety of auto-split scenarios. |
* rewrite text files to contain the rectangle position * switch to easyocr since there was no way to use pytesseract or tesserocr reliably without PIL * display text that is searched for * set default FPS limit for OCR to 1 * minor fixes
for more information, see https://pre-commit.ci
* switch back to tesseract * ditch all python binding libraries to not include Pillow * call tesseract ourselfs
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I forget, just writing down some ideas:
- We could validate that user-provided characters are all supported by tesseract
- We could let the user provide a set of "characters that could appear on screen", to give tesseract as an allow-list, improving consistency and speed.
@Avasam the |
* moved some code around * implemented fps_limit getter * switch to PATH variable use * minor fixes
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
It's not a "allowed path", it's a "trigger the build on changing these files". README doesn't need to trigger lint/type/build checks. It's just that workflow requires approval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're getting close! Mostly polishing documentation
Hello again :) For Division 2 it was way easier to set up the splits for autosplitting - tho i made some human errors along the way. Regarding the FPS-limit and Tesseract: During missions, a new objective pops up in the middle of the screen for short, travels to top (still centered), stays still shortly, then travels to the left where the other objectives are listed and stays there until it's done. There is a trade-off i think:
Is there any interest or does it even make sense for me to test stuff like this and report on it here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few typos and small improvements to new code.
Here's some more user-feedback and testing notes: (1) False-Positive issue with OCR on few characters Basically when OCR checks again and again, in some images it misreads the number and as there is no hold-variable yet, it simply splits right then. (2) Capture-Device out of bounds Crash on Start-up (3) AutoSplit Integration can't split |
I think that's just gonna be a limitation of the technology. A hold flag (#120) is still the best solution I can think of.
If I understand correctly, this should be easy to replicate by just setting the OCR crop outside of the capture area. Will need to be fixed first. Not certain if I wanna send an error popup (and reset, otherwise you'd be stuck in an error loop) or gracefully handle it. I guess there's no valid reason to change the capture size mid-run, unless you're testing, so we could include that as part of the initial checks, and if it happens-mid-run, then reset AutoSplit.
I can't immediately think of a reason why. From the main logic's PoV, there should be no difference between OCR and regular images (other than for displaying the current split). Will have to test. |
Unsure if this is relevant or i made a mistake, but for testing i used this version until now. It worked great, especially in RDO to figure out a mission start via
It always had a 100% match with strings like "Go to the shack", "Kill or capture Gustavo", "Help deliver the goods to Wallace Station" etc. Yesterday i tried going down this list and downloaded newer versions. 2 Versions (didnt document which one, sorry D: ) crashed upon trying to get it started, and 2 other versions worked. However, those newer versions didnt work on the first two missions i tested, so i checked... Next mission the text "Go to the shack." only got 50% match (and i use a 98% or 100% threshold, cause it worked amazing with the first version i tested). Did something crucial about the OCR-matching change? I went back to the earliest version i used for all the tests and will keep it that way for now... :D |
Probably related to the out of bounds coordinate crash, but when in the settings i'm using my Capture-Card while also having an 2024-02-16.14-14-04.mp4 |
f14aafc
to
a1a26ab
Compare
@ston1th There is now a merge conflict due to moving out the tutorial/user guide into its own file. |
@Avasam Noted. I'll fix this along with the rest once I find some free time again. |
this commit improves the handling of the rectangle coordinates. the new scheme uses the top_left and bottom_right (X/Y) coordinates. the migration from the old scheme works as follows: ``` top_left = [<top_left>, <bottom_left>] bottom_right = [<top_right>, <bottom_right>] old: top_left = 275 top_right = 540 bottom_left = 70 bottom_right = 95 new: top_left = [275, 70] bottom_right = [540, 95] ``` you can now specify multiple matching methods and look for the best `text : method` match: ``` old: method = 0 new: methods = [0] or: methods = [2, 1, 0] ```
Hey @Avasam when you have time, could you please review the latest changes? |
Oh sorry I completely forgot about this!! Thanks for the ping. I'll test the latest changes when I have time (not today), and I think as long as it doesn't break any existing feature, I'll get it in and publish a new release where it's clearly marked as experimental (so I'm allowed to introduce a breaking change for this feature if I wanna change something) |
if is_valid_image(self.split_image.byte_array): | ||
if self.split_image.ocr: | ||
text = "\nor\n".join(self.split_image.texts) | ||
self.current_split_image.setText(f"Looking for OCR text:\n{text}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't block for this with the TODO comment. Just bumping as a reminder we should test it to confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the variable names in the OCR text file and the removal of a potentially redundant 1:1
comparison. I don't think any of my request include functional changes. I might just do the changes directly in your PR and handle linting / type checking fixes to move this along.
I could actually use this to split on Shaman Shop in Pitfall 100%, so I can properly test by dogfooding.
@Avasam I just looked at your changes, may I ask why you changed the rectangle format back to the old one? I think the new one was more explanatory and better understandable using just the X/Y coordinates of two points in the image. |
@ston1th This had been stalling for too long since I forgot about it, and didn't want to keep you waiting any longer. Brought the PR to a state I was happy merging, and it doesn't affect existing functionality, so I did. Feel free to open a follow-up PR for any fix and improvement ! Any follow-up should be much easier and faster to review at this point. As for the coordinates, I found it really odd that this was the only place using two points. Especially since we effectively just immediatly split it up again in code. If you still disagree, I can always put it up to a vote with the users on Discord to see what they think. And once again, thanks a lot for implementing this awesome feature ! |
I implemented another comparison method based on OCR.
This could be a useful addition in cases where modern game rendering and visual effects (clutter) cause difficulties to find good comparison images.
It currently depends on pytesseract and Tesseract-OCR but tests with EasyOCR have also been conducted. Both seem to get similar good recognition results. EasyOCR looks like to cause higher CPU load then tesseract. Tesseract on the other hand is an external dependency that needs to be installed seperatly.
The text comparison of the expected and recognized string has two modes. A perfect 1:1 match or the levenshtein ratio.
I also introduced two new file config options:
Please let me know what you think of this feature.