Skip to content

Commit

Permalink
Add option to pv_uni_display for better tr/// output
Browse files Browse the repository at this point in the history
tr// has a special malformed UTF-8 character as a sentinel;  Teach
pv_uni_display about that.
  • Loading branch information
khwilliamson committed Jan 19, 2025
1 parent bdc8044 commit b2d48c4
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 0 deletions.
14 changes: 14 additions & 0 deletions utf8.c
Original file line number Diff line number Diff line change
Expand Up @@ -4748,7 +4748,13 @@ See also L</sv_uni_display>.
=for apidoc Amnh||UNI_DISPLAY_QQ
=for apidoc Amnh||UNI_DISPLAY_REGEX
=cut
Undocumented is UNI_DISPLAY_TR_ which is used internally to display an operand
of the tr/// operation. These operands have a peculiar, deliberate UTF-8
malformation which this flag enables the proper handling of. It turns on
ISPRINT and BACKSLASH as well.
*/

char *
Perl_pv_uni_display(pTHX_ SV *dsv, const U8 *spv, STRLEN len, STRLEN pvlim,
UV flags)
Expand All @@ -4770,6 +4776,14 @@ Perl_pv_uni_display(pTHX_ SV *dsv, const U8 *spv, STRLEN len, STRLEN pvlim,
break;
}

/* The minus is unambiguously the range indicator within a UTF-8 tr///
* operand */
if (UNLIKELY(flags & UNI_DISPLAY_TR_ && *s == ILLEGAL_UTF8_BYTE)) {
sv_catpvs(dsv, "-");
next_len = 1;
continue;
}

u = utf8_to_uvchr_buf(s, e, &next_len);
assert(next_len > 0);

Expand Down
3 changes: 3 additions & 0 deletions utf8.h
Original file line number Diff line number Diff line change
Expand Up @@ -1318,6 +1318,9 @@ point's representation.
#define UNI_DISPLAY_BACKSLASH 0x0002
#define UNI_DISPLAY_BACKSPACE 0x0004 /* Allow \b when also
UNI_DISPLAY_BACKSLASH */
#define UNI_DISPLAY_TR_ ( 0x0008 \
|UNI_DISPLAY_ISPRINT \
|UNI_DISPLAY_BACKSLASH)
#define UNI_DISPLAY_QQ (UNI_DISPLAY_ISPRINT \
|UNI_DISPLAY_BACKSLASH \
|UNI_DISPLAY_BACKSPACE)
Expand Down

0 comments on commit b2d48c4

Please sign in to comment.