Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain security.txt files can cause parser to hang indefinitely #74

Closed
mxsasha opened this issue Jul 4, 2024 · 1 comment · Fixed by #75
Closed

Certain security.txt files can cause parser to hang indefinitely #74

mxsasha opened this issue Jul 4, 2024 · 1 comment · Fixed by #75

Comments

@mxsasha
Copy link
Collaborator

mxsasha commented Jul 4, 2024

Some specific examples to follow by mail, but certain security.txt files can hang indefinitely in the call to pgpy, specifically in the regex. For example:

  File "/Users/sasha/.virtualenvs/internet.nl/lib/python3.9/site-packages/pgpy/types.py", line 198, in from_blob
    po = obj.parse(bytearray(blob))
  File "/Users/sasha/.virtualenvs/internet.nl/lib/python3.9/site-packages/pgpy/pgp.py", line 1270, in parse
    unarmored = self.ascii_unarmor(packet)
  File "/Users/sasha/.virtualenvs/internet.nl/lib/python3.9/site-packages/pgpy/types.py", line 121, in ascii_unarmor
    m = Armorable.__armor_regex.search(text)
KeyboardInterrupt

Some attempts at isolating this issue to a smaller sample have not shown any particular content that triggers the issue. It may also be dependent on specific Python patch versions or platforms. In some cases, the hang is indefinite, in some cases the parsing is only very slow (several seconds).

Suggestion: consider removing the PGP parsing check until we're certain this issue is resolved.

@bwbroersma
Copy link
Contributor

bwbroersma commented Jul 5, 2024

Some pointers:

The regex might have Catastrophic Backtracking problems?

The regex cleaned of newlines and comment:

(^-{5}BEGIN\ PGP\ SIGNED\ MESSAGE-{5}(?:\r?\n)(Hash:\ (?P<hashes>[A-Za-z0-9\-,]+)(?:\r?\n){2})?(?P<cleartext>(.*\r?\n)*(.*(?=\r?\n-{5})))(?:\r?\n))?^-{5}BEGIN\ PGP\ (?P<magic>[A-Z0-9 ,]+)-{5}(?:\r?\n)(?P<headers>(^.+:\ .+(?:\r?\n))+)?(?:\r?\n)?(?P<body>([A-Za-z0-9+/]{1,76}={,2}(?:\r?\n))+)^=(?P<crc>[A-Za-z0-9+/]{4})(?:\r?\n)^-{5}END\ PGP\ (?P=magic)-{5}(?:\r?\n)?

see this regex101.com, which results in:

Catastrophic backtracking has been detected and the execution of your expression has been halted. To find out more and what this is, please read the following article: Runaway Regular Expressions

Update:
The problem are the six dashes (while only 5 are valid PGP):

------BEGIN PGP SIGNATURE-----
------END PGP SIGNATURE-----

which does not match, and because of some nested * capture, there is a backtracking chaos.

Update 2:
Fixed regex:

(^-{5}BEGIN\ PGP\ SIGNED\ MESSAGE-{5}(?:\r?\n)(Hash:\ (?P<hashes>[A-Za-z0-9\-,]+)(?:\r?\n){2})?(?P<cleartext>((^|(([^-]|- )[^\r\n]*))\r?\n)+)?)^-{5}BEGIN\ PGP\ (?P<magic>[A-Z0-9 ,]+)-{5}(?:\r?\n)(?P<headers>(^[a-zA-Z]+:\ [^\r\n]+(?:\r?\n))+)?(?:\r?\n)(?P<body>([A-Za-z0-9+/]{1,76}={,2}(?:\r?\n))+)^=(?P<crc>[A-Za-z0-9+/]{4})(?:\r?\n)^-{5}END\ PGP\ (?P=magic)-{5}(?:\r?\n)?$

see this regex101.com.

This regex seems to be way faster, from 9.9s to 0.2s on my machine on match and no-match.
In terms of types.py diff:

55c55
<                           (?P<cleartext>(.*\r?\n)*(.*(?=\r?\n-{5})))(?:\r?\n)
---
>                           (?P<cleartext>((^|(([^-]|- )[^\r\n]*))\r?\n)+)?
61c61
<                          (?P<headers>(^.+:\ .+(?:\r?\n))+)?(?:\r?\n)?
---
>                          (?P<headers>(^[a-zA-Z]+:\ [^\r\n]+(?:\r?\n))+)?(?:\r?\n)?
68c68
<                          ^-{5}END\ PGP\ (?P=magic)-{5}(?:\r?\n)?
---
>                          ^-{5}END\ PGP\ (?P=magic)-{5}(?:\r?\n)?$

The last diff also fixed a more than five dash ending.

Update 3: the regex still seems wrong, since RFC 4880 - OpenPGP Message Format § 6.2 Forming ASCII Armor states:

Concatenating the following data creates ASCII Armor:

  • An Armor Header Line, appropriate for the type of data
  • Armor Headers
  • A blank (zero-length, or containing only whitespace) line
  • The ASCII-Armored data
  • An Armor Checksum
  • The Armor Tail, which depends on the Armor Header Line

Optional whitespace is not matched because of (?:\r?\n){2}, resulting in Armor Headers being matched as ASCII-Armored data if there is whitespace in the black line.

Update 4:
Tweaked the regex a bit so it passes all tests (e.g. there is an invalid header Comment0 case that should pass, so I added [0-9]) and the last newline of the cleartext should not be captured.

Created issue and PR upstream:

Update 5:
I informed devolksbank about 4 issues with their security.txt via HackerOne.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants