Sanitizer built-ins document #244

otherdaniel · 2024-11-29T14:49:45Z

This is meant as a starting point for our built-ins. It's a "mostly free-form" document, in that it's just text with one line per element, plus markdown-style headings. The idea is to classify elements and attributes into groups.

I started out putting everything into "other", and then moving them into better defined groups. The idea is to work down the "other" list until it's empty.

I copied all the elements from the "proposed allow lists" doc into the "harmless" category. Will do the same for attributes.

Source(s):

Elements & attributes lists from Chrome. (Which may include legacy elements no longer supported.)
"Proposed allow lists -- Sanitizer API - 2024-03" document prepared by Frederik.

Preview | Diff

mozfreddyb · 2024-12-13T08:28:55Z

FYI, this PR is still called "Not ready yet". Let me know when you are seeking review.

otherdaniel · 2024-12-13T15:03:10Z

FYI, this PR is still called "Not ready yet". Let me know when you are seeking review.

About now would be good, so I renamed it. :)

I added more "sections" with spec links, also for attributes. I think that should cover all of HTML; while SVG + MathML coverage is still a bit of a mess. There are a lot more leftover ("other") attributes than there were with elements.

Changes so far are:

I copied the per-element attributes from the HTML spec, for any element that I expect to be default-allowed. (It's a manual process, so I was trying to save myself some time.)
I also copied global HTML attributes + aria attributes from the spec(s), including spec links.
I removed all attributes that are used locally somewhere from the global list. I'm not sure this is quite correct.
All of this was manual, so I wouldn't be surprised if there are some omissions somewhere. I'm unsure how to do QA here.

annevk

Thanks for making this list. Very useful to have something based on the HTML standard. Here's an initial very conservative safelist for "default":

html
head
title
All of Sections (body, article, ...)
All of Grouping Content (p, hr, ...)
Most of Text-level Semantics:
- a attributes target, referrerpolicy, download, and ping I would omit
All of Edits (ins, del)
All of Tabular Data
SVG & MathML: TBD
Global attributes:
- dir
- lang
- title

mozfreddyb · 2024-12-18T09:43:06Z

👍 to what @annevk says, that we should build upon the HTML spec rather than casting the widest net.

As written in #245, I could see us being a bit more iterative by using his relatively small list for now and discussing additions individually as they come up (which they are bound to anyway).

otherdaniel · 2024-12-18T16:27:21Z

Updated list according to Anne's suggestion.
Changed the list format a little, and added a python script that turns it into JSON.
Included this and baseline config from the spec.
Moved the builtin files (json, text, script) to a builtins/ directory.

mozfreddyb · 2024-12-19T12:13:58Z

builtins/safe-baseline-configuration.json

+    "ondragenter",
+    "ondragleave",
+    "ondragover",
+    "ondragstart",
+    "ondrop",
+    "ondurationchange",
+    "onemptied",
+    "onend",
+    "onended",
+    "onerror",
+    "onfocus",
+    "onfocusin",
+    "onfocusout",
+    "onformdata",
+    "ongotpointercapture",
+    "onhashchange",
+    "oninput",
+    "oninvalid",
+    "onkeydown",
+    "onkeypress",
+    "onkeyup",
+    "onlanguagechange",
+    "onload",
+    "onloadeddata",
+    "onloadedmetadata",
+    "onloadstart",
+    "onlostpointercapture",
+    "onmessage",
+    "onmessageerror",
+    "onmousedown",
+    "onmouseenter",
+    "onmouseleave",
+    "onmousemove",
+    "onmouseout",
+    "onmouseover",
+    "onmouseup",
+    "onmousewheel",
+    "onmove",
+    "onoffline",
+    "ononline",
+    "onorientationchange",
+    "onoverscroll",
+    "onpagehide",
+    "onpageshow",
+    "onpaste",
+    "onpause",
+    "onplay",
+    "onplaying",
+    "onpointercancel",
+    "onpointerdown",
+    "onpointerenter",
+    "onpointerleave",
+    "onpointermove",
+    "onpointerout",
+    "onpointerover",
+    "onpointerrawupdate",
+    "onpointerup",
+    "onpopstate",
+    "onprogress",
+    "onratechange",
+    "onrepeat",
+    "onreset",
+    "onresize",
+    "onresolve",
+    "onscroll",
+    "onscrollend",
+    "onscrollsnapchange",
+    "onscrollsnapchanging",
+    "onsearch",
+    "onsecuritypolicyviolation",
+    "onseeked",
+    "onseeking",
+    "onselect",
+    "onselectionchange",
+    "onselectstart",
+    "onshow",
+    "onslotchange",
+    "onstalled",
+    "onstorage",
+    "onsubmit",
+    "onsuspend",
+    "ontimeupdate",
+    "ontimezonechange",
+    "ontoggle",
+    "ontouchcancel",
+    "ontouchend",
+    "ontouchmove",
+    "ontouchstart",
+    "ontransitionend",
+    "onunload",
+    "onvalidationstatuschange",
+    "onvolumechange",
+    "onwaiting",
+    "onwebkitanimationend",
+    "onwebkitanimationiteration",
+    "onwebkitanimationstart",
+    "onwebkitfullscreenchange",
+    "onwebkitfullscreenerror",
+    "onwebkittransitionend",
+    "onwheel"


What should we do here? In spec purity terms, I believe we should stick to those in the HTML standard and make a big note that many engines support non-standardized and add them as a hint or such?
But In reality, I can see this going wrong.

@evilpie: How would we best identify the list of supported event handler attributes in Gecko?

We should probably just check if an attribute is a https://html.spec.whatwg.org/#event-handler-content-attributes. We could then maybe non-normatively list all of them (they're also in an index in HTML). Implementations can do roughly the same thing they do for Trusted Types.

In Gecko, Trusted Types currently uses the EventNameList.h.

I've now removed the list of event handlers, instead adding a rules to remove event-handler-content-attributes. I'm iterating over those, as if they were a list. Not sure if that's legitimate.

I've also added a note and a script that merges in a copy of the event handlers, so it's more easy to see what this does. This should make it easy to modify, and to -- eventually -- just use a list directly derived from the HTML spec text.

Unfortunately, the preview doesn't run the scripts, so that particular bit isn't easy to review.

I think iterating over them is okay. We might have to revisit this when upstreaming.

builtins/safe-baseline-configuration.json

mozfreddyb · 2025-01-13T08:36:58Z

index.bs

+<span class="marker">Note:</span> The [=remove unsafe=] algorithm specifies
+to additionally remove any [=event handler content attributes=], as defined
+in [[HTML]].
+If a [=user agent=] defines extensions to the [[HTML]] spec with additional
+[=event handler content attributes=], it is its responsibility to decide how
+to handle them. Using the current [=event handler content attributes=] list,
+the safe baseline configuration looks effectively like so:


Nice, this is very important.
nit picking, but is there a higher level of severity than a "note"? Can we do warnings? :D

Nice, this is very important. nit picking, but is there a higher level of severity than a "note"? Can we do warnings? :D

The tool offers Notes, Issues, Examples, Advisements

HTML suports highly dramatic warnings, with a few usages in the spec, e.g. here.

If we want to use a warning here, I'd just emulate the CSS in the spec directly.

Guess it's best to do what's easiest and not worth nit picking too hard. :) thank you!

You could add a source comment to upgrade this to class=warning when upstreaming.

Using "advisement" + with label "Warning:" now. Also addrf a comment to use "class=warning" eventually.

There's also a pre-defined CSS style for "annoying-warning", but that sounds.. like too much. :)

I'm happy with this 🥳 thank you!

mozfreddyb · 2025-01-14T08:07:59Z

lgtm 👍

annevk

This seems fine. I'm a bit surprised we block nested documents in baseline though. Did we discuss that and I forgot about it?

otherdaniel · 2025-01-16T11:48:49Z

This seems fine. I'm a bit surprised we block nested documents in baseline though. Did we discuss that and I forgot about it?

We discussed it at end of last year; see minutes of the 2024-12-11 call. At the time, there was a rather large "other" category that we went through. The minutes record "some gut calls during the call", and "no" for frame-related stuff. I do remember this as gut calls indeed, with noone having super strong opinions one way or another.

annevk · 2025-01-16T12:52:41Z

Thanks! Admittedly it's also somewhat hard to do a nested document policy that makes sense, so once there's demand for that we can try to figure out more dedicated syntax I suppose. Or point people towards "unsafe".

otherdaniel · 2025-01-16T14:20:40Z

Landing this. I'll note that the list format makes this really easy to change, spec-wise. :)

This PR was merged into the 6.4 branch. Discussion ---------- [HtmlSanitizer] fix tests | Q | A | ------------- | --- | Branch? | 6.4 | Bug fix? | no | New feature? | no | Deprecations? | no | Issues | | License | MIT The files we used to download are no longer part of the WICG/sanitizer-api repository (see WICG/sanitizer-api#244). Commits ------- 8f06032 fix tests

otherdaniel force-pushed the lists branch from 9842b1c to dc48e59 Compare December 12, 2024 15:55

otherdaniel changed the title ~~Sanitizer built-ins document [Not ready yet]~~ Sanitizer built-ins document Dec 13, 2024

annevk reviewed Dec 18, 2024

View reviewed changes

otherdaniel added 5 commits December 18, 2024 16:51

Initial draft.

f499f7d

Add attributes

4ff9fd5

Rework attributes. More standards references.

94a6ef7

Complete ARIA attributes.

c2f6465

Review feedback.

8ba0ee6

otherdaniel force-pushed the lists branch from d5b402d to 8ba0ee6 Compare December 18, 2024 15:52

Try empty -json file to fix preview.

48aebd1

mozfreddyb reviewed Dec 19, 2024

View reviewed changes

mozfreddyb reviewed Jan 8, 2025

View reviewed changes

builtins/safe-baseline-configuration.json Show resolved Hide resolved

otherdaniel added 3 commits January 8, 2025 16:15

Review feedback: Reference event content handlers; add remove elements

2ea2510

Fix pr-push-yml

d84fe5f

Fix pr-push-yml even harder.

74bfa58

mozfreddyb approved these changes Jan 13, 2025

View reviewed changes

Turn note into warning/advisement.

4f94295

otherdaniel mentioned this pull request Jan 15, 2025

MathML defaults #250

Merged

mozfreddyb requested a review from annevk January 15, 2025 13:17

otherdaniel mentioned this pull request Jan 15, 2025

Proposed API: MathML, SVG support, plus localname case-handling. #103

Open

annevk approved these changes Jan 16, 2025

View reviewed changes

otherdaniel merged commit ebdfd9e into WICG:main Jan 16, 2025
2 checks passed

otherdaniel deleted the lists branch January 16, 2025 14:22

xabbuh mentioned this pull request Jan 16, 2025

[HtmlSanitizer] fix tests symfony/symfony#59531

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanitizer built-ins document #244

Sanitizer built-ins document #244

otherdaniel commented Nov 29, 2024 •

edited by pr-preview bot

Loading

mozfreddyb commented Dec 13, 2024

otherdaniel commented Dec 13, 2024

annevk left a comment

mozfreddyb commented Dec 18, 2024

otherdaniel commented Dec 18, 2024 •

edited

Loading

mozfreddyb Dec 19, 2024

annevk Dec 19, 2024

evilpie Dec 19, 2024 •

edited

Loading

otherdaniel Jan 8, 2025 •

edited

Loading

annevk Jan 10, 2025

mozfreddyb Jan 13, 2025

otherdaniel Jan 13, 2025

mozfreddyb Jan 13, 2025

annevk Jan 13, 2025

otherdaniel Jan 13, 2025

otherdaniel Jan 13, 2025

mozfreddyb Jan 15, 2025

mozfreddyb commented Jan 14, 2025

annevk left a comment

otherdaniel commented Jan 16, 2025

annevk commented Jan 16, 2025

otherdaniel commented Jan 16, 2025

Sanitizer built-ins document #244

Sanitizer built-ins document #244

Conversation

otherdaniel commented Nov 29, 2024 • edited by pr-preview bot Loading

mozfreddyb commented Dec 13, 2024

otherdaniel commented Dec 13, 2024

annevk left a comment

Choose a reason for hiding this comment

mozfreddyb commented Dec 18, 2024

otherdaniel commented Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evilpie Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

otherdaniel Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mozfreddyb commented Jan 14, 2025

annevk left a comment

Choose a reason for hiding this comment

otherdaniel commented Jan 16, 2025

annevk commented Jan 16, 2025

otherdaniel commented Jan 16, 2025

otherdaniel commented Nov 29, 2024 •

edited by pr-preview bot

Loading

otherdaniel commented Dec 18, 2024 •

edited

Loading

evilpie Dec 19, 2024 •

edited

Loading

otherdaniel Jan 8, 2025 •

edited

Loading