Skip to content

PDF type: scanned or digitally created? #1853

Answered by JorjMcKie
Amitdedhia6 asked this question in Q&A
Discussion options

You must be logged in to vote

This is detectable, sometimes with uncertainties, though.

  • You can have pages with regular text and other pages that are scanned images and may or may not contain text generated by some OCR engine.
  • You can have PDFs consisting exclusively of image pages - no text whatsoever.
  • You can have pages with regular text plus images on the page for which there also exists OCRed text.

Text generated by OCR engines is stored hidden. There again exist several options:

  1. store the text "underneath" the scanned image with otherwise regular properties, i.e. a regular font like Helvetica and color black
  2. store the text "hidden" using a special PDF attribute, "text rendering mode 3" - command 3 Tr.

Some sc…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by Amitdedhia6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
2 participants
Converted from issue

This discussion was converted from issue #1852 on August 03, 2022 06:18.