Missing segment if subpath has h operator in the middle #1864
-
Please provide all mandatory information! Describe the bug (mandatory)Hi, I try to extracting drawing data from this pdf but found missing segment at drawing 153. from fitz import Document
document = Document("path/to/file.pdf")
page = document.load_page(0)
drawing = page.get_drawings()[153]
print(drawing) Got 4 segments from the drawing {'items': [('l',
Point(513.6799926757812, 585.3599853515625),
Point(440.0, 585.3599853515625)),
('l',
Point(440.0, 585.3599853515625),
Point(513.6799926757812, 587.9400024414062)),
('l',
Point(440.0, 585.3599853515625),
Point(513.6799926757812, 587.9400024414062)),
('l',
Point(513.6799926757812, 587.9400024414062),
Point(437.41998291015625, 587.9400024414062))],
'closePath': False,
'type': 'fs',
'stroke_opacity': 1.0,
'color': (0.3019999861717224, 0.3019999861717224, 0.3019999861717224),
'width': 0.0,
'lineCap': (1, 1, 1),
'lineJoin': 0.0,
'dashes': '[] 0',
'rect': Rect(437.41998291015625, 585.3599853515625, 513.6799926757812, 587.9400024414062),
'seqno': 154,
'even_odd': False,
'fill_opacity': 1.0,
'fill': (0.3019999861717224, 0.3019999861717224, 0.3019999861717224)} The page content that related to the drawing page.clean_contents()
splitted_contents = page.read_contents().split(b"\n")
# list of painting operators
painting_operators = [b"S", b"s", b"f", b"F", b"f*", b"B", b"B*", b"b", b"b*", b"n"]
painting_counter = -1
seqno = drawing['seqno']
for i, data in enumerate(splitted_contents):
try:
values, cmd = data.rsplit(b" ", 1)
except ValueError:
values = None
cmd = data
if cmd in painting_operators:
painting_counter += 1
if painting_counter == seqno:
break
else:
raise Exception("TARGER NOT FOUND")
print(splitted_contents[i - 20: i + 1]) [
b'0 G',
b'S', # <<< seqno 153
b'Q',
b'EMC',
b'q',
b'/OC /MC2 BDC',
b'1 0 0 1 8328 4044 cm',
b'0 j',
b'0 w',
b'0 0 m', # <<< P1 (0, 0), subpath started
b'-1228 0 l', # <<< P2
b'0 -43 l', # <<< P3
b'h', # <<< P4 (0, 0), subpath closed
b'-1228 0 m', # <<< P5 (-1228, 0), subpath started
b'0 -43 l', # <<< P6
b'-1271 -43 l', # <<< P7
b'h', # <<< P8 (-1228, 0), subpath closed
b'h', # <<< P9 (-1228, 0), should be do nothing because subpath already closed
b'.302 .302 .302 rg',
b'.302 .302 .302 RG',
b'B' # <<< seqno 154
] To Reproduce (mandatory)It's been explained above. Expected behavior (optional)6 segments expected in the drawing items [
(P1, P2),
(P2, P3),
(P3, P4),
(P5, P6),
(P6, P7),
(P7, P8)
] Your configuration (mandatory)import sys, fitz
print(sys.version, "\n", sys.platform, "\n", fitz.__doc__)
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 6 replies
-
I don't understand your argument: But independently from this, I am counting like that:
And the lines being drawn are (p1, p2), (p2, p3), (p2, p3) (again!), (p3, p4). So 4 lines should be drawn, where 2 of them are identical. |
Beta Was this translation helpful? Give feedback.
-
Sorry for my English and thank you for your response. |
Beta Was this translation helpful? Give feedback.
-
I think, we can close this as an issue / bug and convert it to a post in the "Discussions" section. |
Beta Was this translation helpful? Give feedback.
-
No there is not. Except of course you want to (semi-) manually dig your way through the
|
Beta Was this translation helpful? Give feedback.
I don't understand your argument:
In
page.get_drawings()
I am not reading the page/Contents
, but I am accessing general MuPDF interface, which also works for all document types supported. MuPDF does all the abstractions from how whatever document type is doing its rendering.So cannot derive the contents line number from the "seqno" key in a drawing.
But independently from this, I am counting like that:
There are only the following 4 points:
p1 = (0,0)
p2 = (-1228, 0)
p3 = (0, -43)
p4 = (-1271, -43)
And the lines being drawn are (p1, p2), (p2, p3), (p2, p3) (again!), (p3, p4). So 4 lines should be drawn, where 2 of them are identical.