Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Tesseract for testing #216

Closed
nok opened this issue Jul 27, 2019 · 14 comments · Fixed by #217
Closed

Update Tesseract for testing #216

nok opened this issue Jul 27, 2019 · 14 comments · Fixed by #217

Comments

@nok
Copy link
Contributor

nok commented Jul 27, 2019

Hello,

it would be great if we could add the next stable version of Tesseract for testing. In addition we could update the version of the operating system. I mean, why not 😌?

  1. Tesseract (current 3.04.01) to 4.1.0 (ppa ~alex-p/+archive/ubuntu/tesseract-ocr, required)
  2. Ubuntu Ubuntu Xenial 16.04 (current) to Bionic 18.04 (optional)

Reasons:

Solution:

We could use Travis to test pytesseract with Tesseract-OCR 3.04.01 and 4.1.0 on Bionic 18.04 using a simple Dockerfile. I can do that and create a PR, but in this case it would be great to get some opinions first.

Edit:

Fix and change Xenial version from 14.04 to 16.04.

@johnthagen
Copy link
Contributor

johnthagen commented Jul 27, 2019

FYI, Xenial is 16.04, not 18.04: http://releases.ubuntu.com/16.04/

It does look like Bionic was recently added to Travis CI: https://docs.travis-ci.com/user/reference/bionic/

So it should be as simple as bumping up the dist: bionic and making sure everything works.

In general I think testing against a newer tesseract on Bionic makes sense, though first maybe we should target native bionic before adding Docker complexity?

@nok
Copy link
Contributor Author

nok commented Jul 27, 2019

Hmm, do we want to support Tesseract 3? If so, we should use Xenial, because it has still prebuilds of Tesseract 3. Otherwise we have to compile it manually or skip this version. In general I prefer the support of both versions 3 and 4.

Image: ubuntu:bionic:

root@0376b17f81e6:/# apt-cache madison tesseract-ocr
PPA -> tesseract-ocr | 4.1.0-1ppa1~bionic1 | http://ppa.launchpad.net/alex-p/tesseract-ocr/ubuntu bionic/main amd64 Packages
DEF -> tesseract-ocr | 4.00~git2288-10f4998a-2 | http://archive.ubuntu.com/ubuntu bionic/universe amd64 Packages

Image ubuntu:xenial:

root@cea4bffed83f:/# apt-cache madison tesseract-ocr
PPA -> tesseract-ocr | 4.1.0-1ppa1~xenial1 | http://ppa.launchpad.net/alex-p/tesseract-ocr/ubuntu xenial/main amd64 Packages
DEF -> tesseract-ocr |  3.04.01-4 | http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages

Yes, we can try to avoid a Docker image by using further environment variables. The following configuration is a simple draft to demonstrate the idea how to run the unit tests against Tesseract 3 and 4:

language: python

dist: xenial

matrix:
    include:
      - python: 2.7
        env: TOXENV=py27 TESSERACT_VERSION=3.04.01-4
      - python: 3.5
        env: TOXENV=py35 TESSERACT_VERSION=3.04.01-4
      - python: 3.6
        env: TOXENV=py36 TESSERACT_VERSION=3.04.01-4
      - python: 3.7
        env: TOXENV=py37 TESSERACT_VERSION=3.04.01-4
      - python: 3.7
        env: TOXENV=pep8 TESSERACT_VERSION=3.04.01-4
      - python: 2.7
        env: TOXENV=py27 TESSERACT_VERSION=4.1.0-1ppa1~xenial1
      - python: 3.5
        env: TOXENV=py35 TESSERACT_VERSION=4.1.0-1ppa1~xenial1
      - python: 3.6
        env: TOXENV=py36 TESSERACT_VERSION=4.1.0-1ppa1~xenial1
      - python: 3.7
        env: TOXENV=py37 TESSERACT_VERSION=4.1.0-1ppa1~xenial1
      - python: 3.7
        env: TOXENV=pep8 TESSERACT_VERSION=4.1.0-1ppa1~xenial1

before_install:
  - sudo apt-get update
  - sudo install -y software-properties-common
  - sudo add-apt-repository -y ppa:alex-p/tesseract-ocr
  - sudo apt-get update
  - sudo apt-get install -y tesseract-ocr=${TESSERACT_VERSION}
  - sudo apt-get install -y tesseract-ocr-fra

install:
  pip install tox

script:
  tox

notifications:
  email: false

@johnthagen
Copy link
Contributor

What if instead of a custom PPA, we put both xenial and bionic in the matrix and use their native tesseract. Would that cover v3 and v4? That seems like a simple first step.

@nok
Copy link
Contributor Author

nok commented Jul 27, 2019

It's a good idea, but I have never seen a configuration where the distribution is configured by an environment variable form a matrix. And I couldn't find any informations about that.

@johnthagen
Copy link
Contributor

Perhaps this?

https://docs.travis-ci.com/user/multi-os/#example-multi-os-build-matrix

matrix:
  include:
    - os: linux
      dist: trusty
    - os: osx
      osx_image: xcode7.2

Maybe we could do something like:

matrix:
  include:
    - os: linux
      dist: xenial
    - os: linux
      dist: bionic

@nok
Copy link
Contributor Author

nok commented Jul 28, 2019

That looks good! But do we have to combine these?

language: python

matrix:
    include:
      - os: linux
        dist: xenial
        python: 2.7
        env: TOXENV=py27
      - os: linux
        dist: xenial
        python: 3.5
        env: TOXENV=py35
      - os: linux
        dist: xenial
        python: 3.6
        env: TOXENV=py36
      - os: linux
        dist: xenial
        python: 3.7
        env: TOXENV=py37
      - os: linux
        dist: xenial
        python: 3.7
        env: TOXENV=pep8
      - os: linux
        dist: bionic
        python: 2.7
        env: TOXENV=py27
      - os: linux
        dist: bionic
        python: 3.5
        env: TOXENV=py35
      - os: linux
        dist: bionic
        python: 3.6
        env: TOXENV=py36
      - os: linux
        dist: bionic
        python: 3.7
        env: TOXENV=py37
      - os: linux
        dist: bionic
        python: 3.7
        env: TOXENV=pep8

before_install:
  - sudo apt-get install -y tesseract-ocr
  - sudo apt-get install -y tesseract-ocr-fra

install:
  pip install tox

script:
  tox

notifications:
  email: false

Can you try it out on a separate branch? Travis will checkout this new branch and run the tests.

@johnthagen
Copy link
Contributor

If you create a PR, Travis will run with your updates to the Travis config if you want to try it out.

@nok
Copy link
Contributor Author

nok commented Jul 28, 2019

The build looks good (120988151), but we have to open issues:

  1. Travis doesn't support the combination of Bionic and Python 3.5 out of the box, so I removed it.
  2. With Bionic a beta of Tesseract 4 will be installed (tesseract 4.0.0-beta.1). Is that okay?

@bozhodimitrov
Copy link
Collaborator

bozhodimitrov commented Jul 28, 2019

About 2. - I don't think that it will be a big problem if some use cases fail (I don't think that will hit the exotic scenarios). We have support only for the basic functionality

@johnthagen
Copy link
Contributor

@nok

I think it's fine if we don't test Python 3.5 on Bionic as long as it's tested on Xenial. With the different dist's, we're really testing the different tesseract versions, rather than making sure each Python version works on each dist (since that shouldn't affect things).

@johnthagen
Copy link
Contributor

With Bionic a beta of Tesseract 4 will be installed

I think this is fine as well. The fact that we're doing any testing in this area is a huge improvement, so I think it's still a great step forward. I doubt that pytesseract's interactions would be affected going from beta to final.

@nok
Copy link
Contributor Author

nok commented Jul 29, 2019

Yesterday I tested the original images from Ubuntu (https://hub.docker.com/_/ubuntu). By using ubuntu:bionic I could install Tesseract 4 (stable) out of the box (without ppa).

If the beta version is fine, you can merge the PR 😃. Otherwise we can create a small Docker image.

@johnthagen
Copy link
Contributor

@int3l I don't have a strong preference, but I'd recommend we just use the stock beta version to keep the Travis configuration simpler.

@bozhodimitrov
Copy link
Collaborator

Yep, that's fine by me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants