Skip to content
This repository has been archived by the owner on Aug 5, 2020. It is now read-only.

End of file error when running b64filter in Bitextor #6

Open
mbanon opened this issue Aug 3, 2020 · 3 comments
Open

End of file error when running b64filter in Bitextor #6

mbanon opened this issue Aug 3, 2020 · 3 comments

Comments

@mbanon
Copy link
Member

mbanon commented Aug 3, 2020

Hi,
I am running Bitextor for martacrawl es-eu, and it crashed after outputting this:

2020/08/03 10:52:15 b64filter.go:198: writeDocs: written 22200 docs, 2233737 lines in 2m46.222660934s
2020/08/03 10:52:16 b64filter.go:198: writeDocs: written 22300 docs, 2242692 lines in 2m46.230375568s
terminate called after throwing an instance of 'util::EndOfFileException'
  what():  End of file
2020/08/03 10:52:16 b64filter.go:258: error waiting for command: signal: aborted

@lpla told me that this is a b64filter issue, so this is why I'm reporting here :)

Any clue on what's going on? I kinda know which shards are failing, so I can provide them if you need to reproduce the issue.

Thanks!

@kpu
Copy link
Member

kpu commented Aug 4, 2020

I thought this particular repo was dead, replaced by a C++ version written by @jelmervdl?

@jelmervdl
Copy link
Member

jelmervdl commented Aug 4, 2020

I'm not using this version. @mbanon try https://github.com/jelmervdl/bitextor/blob/doctools/document-aligner/b64filter.cpp

That version has more instrumentation around what might go wrong, and will tell you.

(note it's in the doctools branch of my copy of bitextor because… I don't really know where to put it. It borrows code from docalign, but otherwise has nothing to do with it.)

@mbanon
Copy link
Member Author

mbanon commented Aug 4, 2020

To be honest I have no idea of which version of b64filter does my Bitextor use, I posted here because @lpla gave me the URL of this repo when I first reported the error to the UA team ¯_(ツ)_/¯

I'll investigate tomorrow, will keep you tuned.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants