implemented text recognition (ocr) #272

ston1th · 2024-01-17T20:18:22Z

I implemented another comparison method based on OCR.

This could be a useful addition in cases where modern game rendering and visual effects (clutter) cause difficulties to find good comparison images.

It currently depends on pytesseract and Tesseract-OCR but tests with EasyOCR have also been conducted. Both seem to get similar good recognition results. EasyOCR looks like to cause higher CPU load then tesseract. Tesseract on the other hand is an external dependency that needs to be installed seperatly.

The text comparison of the expected and recognized string has two modes. A perfect 1:1 match or the levenshtein ratio.

I also introduced two new file config options:

Rectangle position (only used for text files)
FPS limit per text or image file

Please let me know what you think of this feature.

I implemented another comparison method based on OCR. This could be a useful addition in cases where modern game rendering and visual effects (clutter) cause difficulties to find good comparison images. It currently depends on pytesseract and Tesseract-OCR but tests with EasyOCR have also been conducted. Both seem to get similar good recognition results. EasyOCR looks like to cause higher CPU load then tesseract. Tesseract on the other hand is an external dependency that needs to be installed seperatly. The text comparison of the expected and recognized string has two modes. A perfect 1:1 match or the levenshtein ratio. I also introduced two new file config options: * Rectangle position (only used for text files) * FPS limit per text or image file Please let me know what you think of this feature.

for more information, see https://pre-commit.ci

Avasam · 2024-01-18T07:58:30Z

I'll look into the code changes after I come back from GDQ, but I love the idea of a comparison method that specializes in text comparison/recognition.

Is the per-file FPS limit necessary to the implementation? Could you split it into a different PR?

Idk about the rectangle position option, but maybe it'll make sense once I give the implementation a proper look.

ston1th · 2024-01-18T08:12:07Z

Hi @Avasam first of all have fun and good luck at GDQ.

To your question: yes I find the FPS limit necessary to not max out CPU usage too much.
I included a note in the README regarding this. A quick FYI:

Note: This method can cause high CPU usage at the standard comparison FPS. You should therefor limit the comparison FPS when you use this method to 1 or 2 FPS using the limit option !1! in the file name.
The size of the selected rectangle can also impact the CPU load (bigger = more CPU load).

realRammbob · 2024-01-19T18:28:26Z

Heya! Div2 Content Creator here.
I may or may not have inspired this OCR-method implementation after finding and falling in love with this Autosplitter. :D

As for the user-perspective regarding Div2:
Div2 has a short mission-description for most Checkpoints in missions, and when doing activities and other stuff it shows a 5-8 second pop-up with Text. The text is completely white, and using that with the current methods causes a false-positive when blinded by a flashbang (white screen).
If one uses the shadow of the text (or puts black pixels where shadow is supposed to be), there is a somewhat working threshold difference to not trigger with flashbang - but depending on weather of open world it still gets false positives here and there. (and lowering threshold below 96 will sometimes not split in those 5-8 sec).

With the OCR this issue would probably be solved, and better yet: There's different activities and i would love to split "complete 5 activities" for example. With Text-pop-ups like "Broadcast Restored", "All Hostages Saved" and "Perimeter Secured" one could scan for the "ed" at the end and properly split those, while keeping "Watch Level up" and other false-positives away. :)

Hopefully info from the user-perspective is helpful here as well, if not ignore my comment :D

Enjoy GDQ and looking forward to testing the new method if it gets approved =)

realRammbob · 2024-01-29T17:54:46Z

Hello again - not meaning to stress, but i'm really, really looking forward to using this method for a variety of auto-split scenarios.
Any update on regarding looking at the code? :)

* rewrite text files to contain the rectangle position * switch to easyocr since there was no way to use pytesseract or tesserocr reliably without PIL * display text that is searched for * set default FPS limit for OCR to 1 * minor fixes

for more information, see https://pre-commit.ci

* switch back to tesseract * ditch all python binding libraries to not include Pillow * call tesseract ourselfs

for more information, see https://pre-commit.ci

Avasam

Before I forget, just writing down some ideas:

We could validate that user-provided characters are all supported by tesseract
We could let the user provide a set of "characters that could appear on screen", to give tesseract as an allow-list, improving consistency and speed.

src/compare.py

ston1th · 2024-02-04T10:19:37Z

@Avasam the README.md file is missing in the allowed paths in the lint-and-build.yml action and thus blocking the build. Could you add it please?

* moved some code around * implemented fps_limit getter * switch to PATH variable use * minor fixes

for more information, see https://pre-commit.ci

Avasam · 2024-02-04T19:24:59Z

lint-and-build.yml

It's not a "allowed path", it's a "trigger the build on changing these files". README doesn't need to trigger lint/type/build checks.

It's just that workflow requires approval.
You can run locally using the scripts/lint.ps1 script (or running the commands found inside individually)

Avasam

We're getting close! Mostly polishing documentation

README.md

src/compare.py

src/AutoSplitImage.py

src/utils.py

src/AutoSplitImage.py

realRammbob · 2024-02-05T03:39:01Z

Hello again :)
Just wanted to drop in with a huge thank you for this!
I went ahead and tested this version today, and it went great for the most part!

For Division 2 it was way easier to set up the splits for autosplitting - tho i made some human errors along the way.
Most notably i could now autosplit events that could either be succesful or fail (only difference being in the text), as well as have one file to basically split all the random activities that i run during the randomizer-thing i'm doing :)
(If you want to take a look at how the Autosplitter did during a Countdown run with the new method, you can see that here )

Regarding the FPS-limit and Tesseract:
Even if the FPS-limit is high, it won't go very high due to the method. If the box i choose in the file is huge (like 1080p fullscreen), it takes way longer than 1 second until the next image is processed.
Optimizing this was quite easier as a user than getting optimal image data with paint.net for me, but it also opened up a question for me:

During missions, a new objective pops up in the middle of the screen for short, travels to top (still centered), stays still shortly, then travels to the left where the other objectives are listed and stays there until it's done.
To have it properly auto-split, one would need it to split when centered - will do tests if it does that 100% of the time later.

There is a trade-off i think:

Only check where it pops up in center: Best FPS-performance, best split-accuracy, highest risk of no split
Only check the vertical part: FPS-sacrifice, long time-frame for low risk of no split, somewhat good split-accuracy still
Check the whole rectangle where the text could be at any given time: Worst FPS, worst split-accuracy, almost impossible to not split
I tried to visualize it here:

Is there any interest or does it even make sense for me to test stuff like this and report on it here?

Avasam

A few typos and small improvements to new code.

src/AutoSplitImage.py

src/error_messages.py

src/AutoSplitImage.py

src/compare.py

src/utils.py

src/compare.py

src/utils.py

scripts/requirements.txt

realRammbob · 2024-02-12T05:59:08Z

Here's some more user-feedback and testing notes:

(1) False-Positive issue with OCR on few characters
I did a Level 31-40 run trying to use OCR on the two Level-digits. The results are:
Duration 2 hours 40 minutes (~9600 OCR-checks)
8 out of 8 Level-ups detected
9 False-Positives

Basically when OCR checks again and again, in some images it misreads the number and as there is no hold-variable yet, it simply splits right then.
If a user is depending on very few characters with this method, the false-positive likelihood is kinda huge (even with a 100% threshold).
(Sidenote: For the run i can use a "Level Up" pop up at center of screen. It will split later, but not have false-positives. So there is a solution, but would've been nice if OCR was able to for example split by just checking the last digit.)
Here's a screen showing the digits at top right that i tried OCR on (while it also had a false-positive)

(2) Capture-Device out of bounds Crash on Start-up
I configured my Autosplit to use a capture device, so it now uses my capture card which should work best as it just contains the footage of the game.
However when starting it up and loading a settings.toml, it chooses the device via device ID which sometimes is swapped out with my Logitech Capture or OBS Virtual Camera device. When it tries to use Logitech Capture, it doesn't have 1080p coordinates, so if there's also an Start_Auto_Splitter image that tries to read out of bounds, it crashes immediately upon start-up.
To fix it, i can delete the settings.toml and create a new one, or edit the existing one with notepad for example and fix either the device-ID (by guessing) or choose a different split folder so it doesn't start OCR out of bounds.

(3) AutoSplit Integration can't split
I tried using the (AutoSplit Integration) to have it start automatically when starting Livesplit with splits & Layout containing the Integration. While it starts up and works fine starting a run, when i tried it with OCR it never split even tho it got beyond the threshold and also showed going on pause. So basically when Autosplit was externally controlled, it wasn't able to progress splits in LiveSplit. (didnt test further, not sure if its an OCR bug or also not working in general)

Avasam · 2024-02-12T18:14:00Z

False-Positive issue with OCR on few characters

I think that's just gonna be a limitation of the technology. A hold flag (#120) is still the best solution I can think of.

out of bounds Crash

If I understand correctly, this should be easy to replicate by just setting the OCR crop outside of the capture area. Will need to be fixed first. Not certain if I wanna send an error popup (and reset, otherwise you'd be stuck in an error loop) or gracefully handle it.

I guess there's no valid reason to change the capture size mid-run, unless you're testing, so we could include that as part of the initial checks, and if it happens-mid-run, then reset AutoSplit.

AutoSplit Integration can't split

I can't immediately think of a reason why. From the main logic's PoV, there should be no difference between OCR and regular images (other than for displaying the current split). Will have to test.

realRammbob · 2024-02-13T18:11:33Z

Unsure if this is relevant or i made a mistake, but for testing i used this version until now. It worked great, especially in RDO to figure out a mission start via

texts = ["go to", "search", "capture", "find", "deliver"]

It always had a 100% match with strings like "Go to the shack", "Kill or capture Gustavo", "Help deliver the goods to Wallace Station" etc.

Yesterday i tried going down this list and downloaded newer versions. 2 Versions (didnt document which one, sorry D: ) crashed upon trying to get it started, and 2 other versions worked. However, those newer versions didnt work on the first two missions i tested, so i checked...

Next mission the text "Go to the shack." only got 50% match (and i use a 98% or 100% threshold, cause it worked amazing with the first version i tested). Did something crucial about the OCR-matching change? I went back to the earliest version i used for all the tests and will keep it that way for now... :D

realRammbob · 2024-02-16T13:04:23Z

Probably related to the out of bounds coordinate crash, but when in the settings i'm using my Capture-Card while also having an Start_Auto_Splitter image (aka Autosplit running) and then opening the settings, it also crashes.
Here's a video showcasing it:

2024-02-16.14-14-04.mp4

Avasam · 2024-03-09T20:55:59Z

@ston1th There is now a merge conflict due to moving out the tutorial/user guide into its own file.

ston1th · 2024-03-11T13:31:56Z

@Avasam Noted. I'll fix this along with the rest once I find some free time again.

this commit improves the handling of the rectangle coordinates. the new scheme uses the top_left and bottom_right (X/Y) coordinates. the migration from the old scheme works as follows: ``` top_left = [<top_left>, <bottom_left>] bottom_right = [<top_right>, <bottom_right>] old: top_left = 275 top_right = 540 bottom_left = 70 bottom_right = 95 new: top_left = [275, 70] bottom_right = [540, 95] ``` you can now specify multiple matching methods and look for the best `text : method` match: ``` old: method = 0 new: methods = [0] or: methods = [2, 1, 0] ```

ston1th · 2024-06-03T13:11:18Z

Hey @Avasam when you have time, could you please review the latest changes?

Avasam · 2024-06-03T17:35:31Z

Oh sorry I completely forgot about this!!

Thanks for the ping. I'll test the latest changes when I have time (not today), and I think as long as it doesn't break any existing feature, I'll get it in and publish a new release where it's clearly marked as experimental (so I'm allowed to introduce a breaking change for this feature if I wanna change something)

src/compare.py

Avasam · 2024-06-15T19:14:45Z

src/AutoSplit.py

-        if is_valid_image(self.split_image.byte_array):
+        if self.split_image.ocr:
+            text = "\nor\n".join(self.split_image.texts)
+            self.current_split_image.setText(f"Looking for OCR text:\n{text}")


I won't block for this with the TODO comment. Just bumping as a reminder we should test it to confirm

src/error_messages.py

src/AutoSplitImage.py

Avasam

Other than the variable names in the OCR text file and the removal of a potentially redundant 1:1 comparison. I don't think any of my request include functional changes. I might just do the changes directly in your PR and handle linting / type checking fixes to move this along.

I could actually use this to split on Shaman Shop in Pitfall 100%, so I can properly test by dogfooding.

for more information, see https://pre-commit.ci

ston1th · 2024-06-17T10:40:00Z

@Avasam I just looked at your changes, may I ask why you changed the rectangle format back to the old one?

I think the new one was more explanatory and better understandable using just the X/Y coordinates of two points in the image.
I wanted to make this fix before people adopt this feature despite it being experimental.

Avasam · 2024-06-17T15:10:12Z

@ston1th This had been stalling for too long since I forgot about it, and didn't want to keep you waiting any longer.

Brought the PR to a state I was happy merging, and it doesn't affect existing functionality, so I did.

Feel free to open a follow-up PR for any fix and improvement ! Any follow-up should be much easier and faster to review at this point.

As for the coordinates, I found it really odd that this was the only place using two points. Especially since we effectively just immediatly split it up again in code.

If you still disagree, I can always put it up to a vote with the users on Discord to see what they think.

And once again, thanks a lot for implementing this awesome feature !

ston1th and others added 7 commits January 17, 2024 21:46

[pre-commit.ci] auto fixes from pre-commit.com hooks

a00dd41

for more information, see https://pre-commit.ci

fixed typo in requirements.txt and make linter happy

ead3463

[pre-commit.ci] auto fixes from pre-commit.com hooks

d36e2d4

for more information, see https://pre-commit.ci

keep pillow on windows and more linter fixes

c9766bf

use pathlib to read file

8998360

fix str None comparison

f25fc7e

This comment was marked as resolved.

Sign in to view

ston1th and others added 2 commits February 3, 2024 16:11

rewrite some stuff

2f8d298

* rewrite text files to contain the rectangle position * switch to easyocr since there was no way to use pytesseract or tesserocr reliably without PIL * display text that is searched for * set default FPS limit for OCR to 1 * minor fixes

[pre-commit.ci] auto fixes from pre-commit.com hooks

fddd0ae

for more information, see https://pre-commit.ci

This comment was marked as resolved.

Sign in to view

ston1th and others added 2 commits February 3, 2024 19:49

switch back to tesseract

cd2c212

* switch back to tesseract * ditch all python binding libraries to not include Pillow * call tesseract ourselfs

[pre-commit.ci] auto fixes from pre-commit.com hooks

80140cb

for more information, see https://pre-commit.ci

Avasam requested changes Feb 3, 2024

View reviewed changes

src/compare.py Outdated Show resolved Hide resolved

src/compare.py Outdated Show resolved Hide resolved

src/compare.py Outdated Show resolved Hide resolved

Merge branch 'dev' into ocr

450b983

ston1th and others added 6 commits February 4, 2024 11:51

internal logic changes

feeb58e

* moved some code around * implemented fps_limit getter * switch to PATH variable use * minor fixes

[pre-commit.ci] auto fixes from pre-commit.com hooks

fb8ed6f

for more information, see https://pre-commit.ci

import subprocess globally

da830c6

[pre-commit.ci] auto fixes from pre-commit.com hooks

5674088

for more information, see https://pre-commit.ci

make linter happy

b5f6639

fixed typo in docstring

f1ba410

Avasam requested changes Feb 4, 2024

View reviewed changes

Avasam reviewed Feb 4, 2024

View reviewed changes

src/AutoSplitImage.py Outdated Show resolved Hide resolved

Avasam requested changes Feb 11, 2024

View reviewed changes

maxbachmann mentioned this pull request Feb 11, 2024

Callable missing type argument, making some methods partially-Unknown rapidfuzz/Levenshtein#60

Closed

Avasam reviewed Feb 11, 2024

View reviewed changes

scripts/requirements.txt Outdated Show resolved Hide resolved

Avasam force-pushed the dev branch 2 times, most recently from f14aafc to a1a26ab Compare March 9, 2024 20:52

Avasam force-pushed the dev branch from a1a26ab to a57b373 Compare March 9, 2024 22:08

ston1th added 3 commits March 21, 2024 20:45

Merge branch 'dev' of https://github.com/Toufool/AutoSplit into ocr

3a3015f

fix ruff linter

7be9a0e

Avasam reviewed Jun 15, 2024

View reviewed changes

Avasam requested changes Jun 15, 2024

View reviewed changes

Avasam mentioned this pull request Jun 15, 2024

Let the user provide possible character sets for OCR #285

Open

Avasam and others added 5 commits June 15, 2024 17:07

Address my own PR comments and fix linting

043b6b5

[pre-commit.ci] auto fixes from pre-commit.com hooks

c0b2920

for more information, see https://pre-commit.ci

Merge branch 'dev' into ocr

f2b407a

STARTUPINFO doesn't exist on Linux

9cd0c2b

More explicit platform check

2f03a90

Avasam approved these changes Jun 16, 2024

View reviewed changes

Fix circular imports, mak OCR as clearly experimental

797492f

Avasam merged commit 4f47abb into Toufool:dev Jun 16, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implemented text recognition (ocr) #272

implemented text recognition (ocr) #272

ston1th commented Jan 17, 2024

Avasam commented Jan 18, 2024

ston1th commented Jan 18, 2024

realRammbob commented Jan 19, 2024

realRammbob commented Jan 29, 2024

This comment was marked as resolved.

This comment was marked as resolved.

Avasam left a comment •

edited

Loading

ston1th commented Feb 4, 2024

Avasam commented Feb 4, 2024

Avasam left a comment

realRammbob commented Feb 5, 2024

Avasam left a comment

realRammbob commented Feb 12, 2024 •

edited

Loading

Avasam commented Feb 12, 2024 •

edited

Loading

realRammbob commented Feb 13, 2024

realRammbob commented Feb 16, 2024 •

edited

Loading

Avasam commented Mar 9, 2024

ston1th commented Mar 11, 2024

ston1th commented Jun 3, 2024

Avasam commented Jun 3, 2024

Avasam Jun 15, 2024

Avasam left a comment

ston1th commented Jun 17, 2024

Avasam commented Jun 17, 2024 •

edited

Loading

implemented text recognition (ocr) #272

implemented text recognition (ocr) #272

Conversation

ston1th commented Jan 17, 2024

Avasam commented Jan 18, 2024

ston1th commented Jan 18, 2024

realRammbob commented Jan 19, 2024

realRammbob commented Jan 29, 2024

This comment was marked as resolved.

This comment was marked as resolved.

Avasam left a comment • edited Loading

Choose a reason for hiding this comment

ston1th commented Feb 4, 2024

Avasam commented Feb 4, 2024

Avasam left a comment

Choose a reason for hiding this comment

realRammbob commented Feb 5, 2024

Avasam left a comment

Choose a reason for hiding this comment

realRammbob commented Feb 12, 2024 • edited Loading

Avasam commented Feb 12, 2024 • edited Loading

realRammbob commented Feb 13, 2024

realRammbob commented Feb 16, 2024 • edited Loading

Avasam commented Mar 9, 2024

ston1th commented Mar 11, 2024

ston1th commented Jun 3, 2024

Avasam commented Jun 3, 2024

Avasam Jun 15, 2024

Choose a reason for hiding this comment

Avasam left a comment

Choose a reason for hiding this comment

ston1th commented Jun 17, 2024

Avasam commented Jun 17, 2024 • edited Loading

Avasam left a comment •

edited

Loading

realRammbob commented Feb 12, 2024 •

edited

Loading

Avasam commented Feb 12, 2024 •

edited

Loading

realRammbob commented Feb 16, 2024 •

edited

Loading

Avasam commented Jun 17, 2024 •

edited

Loading