page.search_for return extra height #2454
Unanswered
ashifaliclientpoint
asked this question in
Q&A
Replies: 1 comment 3 replies
-
This is a typical "Discussions" item. Let me first transfer it. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello
I am using pymypdf(1.19.6) to search string from a pdf file. And doing redaction. But it returning extra height in compare of original pdf file. And also removing the string which overlapped by this string.
Please help me for this strange issue.
I am also attaching the original file and converted file.
DRAFT_Executive.pdf
highlighted_file.pdf
Reproduce step
import re
import fitz
import sys, json
file_path = "DRAFT_Executive.pdf"
pattern = r'[\s*([s|c|d|i|t]):([a-z]):([o|r])\s*]' # Replace with your desired regex pattern
doc = fitz.open(file_path)
resultOutput = []
tagsPerPage = {}
addedTags = set()
for page in doc:
text = page.get_text()
tagsPerPage[page.number]=[]
matches = re.finditer(pattern, text, re.IGNORECASE | re.MULTILINE | re.DOTALL)
if matches:
for match in matches:
start, end = match.span()
coordinates = page.search_for(match.group())
tempDict={}
firstCoordStr = ""
singleTagArr = []
needleStarted=0
for rect in coordinates:
x1, y2, x2, y1 = rect
if tagsPerPage:
for page in doc:
if tagsPerPage[page.number]:
for item in tagsPerPage[page.number]:
currPage= page.number+1
if item['page']==currPage:
y1 = page.rect.height - item['y1']
y2 = page.rect.height - item['y2']
doc.save("highlighted_file.pdf", garbage=3, deflate=True)
doc.close()
Configuration
OS ubuntu
Python 3.8
PyMuPDF 1.19.6
Beta Was this translation helpful? Give feedback.
All reactions