Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Color and font information for chars, words and boxes #39

Merged
merged 11 commits into from
Mar 15, 2024

Conversation

kreuzberger
Copy link
Contributor

The PR is for discussion of issue #38. The non-stroking color information is stored if all characters in a word have same non-stroking color.

Copy link
Member

@ubmarco ubmarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
Besides my 2 comments, another question: I think it makes sense to extend the ncolor attribute to also show up on HorizontalLine and HorizontalBox in case all words/lines have the same ncolor value.

libpdf/models/horizontal_box.py Outdated Show resolved Hide resolved
libpdf/models/horizontal_box.py Outdated Show resolved Hide resolved
libpdf/models/horizontal_box.py Outdated Show resolved Hide resolved
@ubmarco
Copy link
Member

ubmarco commented Feb 19, 2024

Can we try to merge #36 first to get the linting insights? It's almost done.

@ubmarco
Copy link
Member

ubmarco commented Feb 21, 2024

I merged PR #36, please add modified files to the tox.ini line that specifies the ruff paths, currently poetry run ruff tests/conftest.py tests/test_rects.py

@kreuzberger
Copy link
Contributor Author

I merged PR #36, please add modified files to the tox.ini line that specifies the ruff paths, currently poetry run ruff tests/conftest.py tests/test_rects.py

How can i use different python versions with tox to check? Currently i have 3.10 installed in my system and 3.11 in a local venv. How can is use 3.12 (required for format/lint) in a special local tox environment?
This would help me not to localy modify tox.ini all the time. or could the format/lint be more "Opened" for other python versions?

@ubmarco ubmarco changed the title #38 Storing color information for words if all characters in word have same color Color and font information for chars, words and boxes Feb 21, 2024
Copy link
Member

@ubmarco ubmarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the fixes & some more comments on the new changes.
I had a bad feeling putting the linting fixes of our code onto you, so I fixed all issues in utils.py and horizontal_box.py in #40. Will try to merge this tomorrow after a review. Please rebase your work onto this.

I use https://github.com/pyenv/pyenv to install any Python version I need.
If you cannot do that, just change the basepython in tox.ini to anything you have from 3.8 to 3.12. It should produce the same output. The CI will tell us whether that assumption is valid.

libpdf/models/horizontal_box.py Outdated Show resolved Hide resolved
):
"""Init the class with plain char of a character and its rectangular coordinates."""
self.x0 = x0
self.y0 = y0
self.x1 = x1
self.y1 = y1
self.text = text
self.ncolor = ncolor
self.fontname = fontname
self.bold = True if "Bold" in fontname else False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that logic can break easily or at least produce unexpected results. There is no real standard in font names, I think exposing the fontname is good enough. The bold logic can be determined by the end user (or us in case we write test cases on known PDFs). See here:
image
image

tests/test_word_colors.py Outdated Show resolved Hide resolved
tests/test_word_colors.py Outdated Show resolved Hide resolved
libpdf/models/horizontal_box.py Outdated Show resolved Hide resolved
@ubmarco
Copy link
Member

ubmarco commented Feb 22, 2024

I just merged #40, please rebase your change on top of it. There might some merge conflicts. Btw, if you create your fork under your personal account, you can allow me to push to your PR branch (flag Allow edits by maintainers).

@kreuzberger
Copy link
Contributor Author

I just merged #40, please rebase your change on top of it. There might some merge conflicts. Btw, if you create your fork under your personal account, you can allow me to push to your PR branch (flag Allow edits by maintainers).

Hi! I merged everything some weeks ago, but the PR is still open. Is there something missing? Or Just no time left 😄

Copy link
Member

@ubmarco ubmarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is just one typo in the test case (italc). If that is fixed let's merge the PR!
Thanks for your work on this.

@ubmarco ubmarco self-requested a review March 15, 2024 09:49
Copy link
Member

@ubmarco ubmarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, thanks for your contribution!

@ubmarco ubmarco merged commit 62086d0 into useblocks:master Mar 15, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants