-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rotate / flip PDFs? #554
Comments
Thanks for the suggestion! Yes, I thought about it and it is something I would like to add. But tbh it is not on the top currently, which may change if people want this more than other things…. I would like to be able to remove blank pages and rotate the pdf pages manually. This was one reason for converting everything to pdf, so it can be manipulated later without changing the original. And whatever can be done automatically, should be :-) I would also think that rotation should be possible to automate. |
Another idea could be automatic page trimming. Or automatic blank page detection: I experimented with different methods in my scanner script. In the end i just used a simple threshold on the standard deviation of the image pixels: https://github.com/eresturo/scanadf2docspell/blob/5aa3d05c3669c4715db3b24400226d9db42d1c4f/src/preprocessor.py#L29 |
Yeah that would be nice too. Fortunately my scanner does empty page removal for me so that isn't something I thought about. |
Yes, that would be nice indeed. A step after pdf conversion could do all this. What I think would be nice too, is to be able to split pdfs based on some stamp or sign that indicates the last page (or a separator page). |
this is a really good idea, would be cool!
|
Just a small update: parts of this could now be achieved by using this addon. It is still a feature I would like to have "first class" in docspell, but until this comes the addon is an alternative that can be used right now. |
I either do something wrong or this alternative doesn't solve the problem. I did use the addon you mentioned, however it rotates only the processed/result pdf, not the original. What I'd like to achieve is rotating the original, the re-processing it. |
@dariuszszyc hm, the addon should also overwrite the extracted text in docspell so that you can use fulltext search etc. Does this not work (without an additional reprocess)? The original file will never be touched, though. But the "converted" file should be rotated and the extracted text should be updated as well. |
@eikek didn't work for me. I did few more tests and here are the results:
|
Thank you for these details, @dariuszszyc . I think point 2 is a bug then, I need to look into it. Point 3 and 4: When using JPG, it is often the case that the orientation is stored as metadata (kind of) and viewers will either interpret it or not. Some tools won't really rotate the image, but change the orientation setting only. When you edit the image data somehow (when you added a single dot), then the tool is required to store it anew. Could you maybe send me some example jpg file so I can reproduce this? |
Also maybe we can use a new ticket for this problem here - I just created one docspell/rotate-pdf-addon#1 copying your notes. |
Forgive me stupid question - not an advanced user - how can I share the jpg with you so it's available only to you (and not visible here) ? |
No worries! You can send me an e-mail or chat me private at matrix (see readme) - ofc if you can just create a new file with some sample content, then you could also post it here. |
Thank you @dariuszszyc - I'll test with these. |
Linking #1437 |
I have a bunch of documents which are upside down (but which I didn't realise until Docspell slurped them in and previewed them).
Have you thought yet about adding the option to rotate a PDF. Eventually, you may even be able to use some heuristics to propose flips/rotations automatically (e.g. if you can't OCR much text, but can after rotating, or by training a model to detect upside down text, etc)
The text was updated successfully, but these errors were encountered: