Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitizer built-ins document #244

Merged
merged 10 commits into from
Jan 16, 2025
Merged

Sanitizer built-ins document #244

merged 10 commits into from
Jan 16, 2025

Conversation

otherdaniel
Copy link
Collaborator

@otherdaniel otherdaniel commented Nov 29, 2024

This is meant as a starting point for our built-ins. It's a "mostly free-form" document, in that it's just text with one line per element, plus markdown-style headings. The idea is to classify elements and attributes into groups.

I started out putting everything into "other", and then moving them into better defined groups. The idea is to work down the "other" list until it's empty.

I copied all the elements from the "proposed allow lists" doc into the "harmless" category. Will do the same for attributes.

Source(s):

  • Elements & attributes lists from Chrome. (Which may include legacy elements no longer supported.)
  • "Proposed allow lists -- Sanitizer API - 2024-03" document prepared by Frederik.

Preview | Diff

@mozfreddyb
Copy link
Collaborator

FYI, this PR is still called "Not ready yet". Let me know when you are seeking review.

@otherdaniel otherdaniel changed the title Sanitizer built-ins document [Not ready yet] Sanitizer built-ins document Dec 13, 2024
@otherdaniel
Copy link
Collaborator Author

FYI, this PR is still called "Not ready yet". Let me know when you are seeking review.

About now would be good, so I renamed it. :)

I added more "sections" with spec links, also for attributes. I think that should cover all of HTML; while SVG + MathML coverage is still a bit of a mess. There are a lot more leftover ("other") attributes than there were with elements.

Changes so far are:

  • I copied the per-element attributes from the HTML spec, for any element that I expect to be default-allowed. (It's a manual process, so I was trying to save myself some time.)
  • I also copied global HTML attributes + aria attributes from the spec(s), including spec links.
  • I removed all attributes that are used locally somewhere from the global list. I'm not sure this is quite correct.
  • All of this was manual, so I wouldn't be surprised if there are some omissions somewhere. I'm unsure how to do QA here.

Copy link
Collaborator

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this list. Very useful to have something based on the HTML standard. Here's an initial very conservative safelist for "default":

  • html
  • head
  • title
  • All of Sections (body, article, ...)
  • All of Grouping Content (p, hr, ...)
  • Most of Text-level Semantics:
    • a attributes target, referrerpolicy, download, and ping I would omit
  • All of Edits (ins, del)
  • All of Tabular Data
  • SVG & MathML: TBD
  • Global attributes:
    • dir
    • lang
    • title

@mozfreddyb
Copy link
Collaborator

👍 to what @annevk says, that we should build upon the HTML spec rather than casting the widest net.

As written in #245, I could see us being a bit more iterative by using his relatively small list for now and discussing additions individually as they come up (which they are bound to anyway).

@otherdaniel
Copy link
Collaborator Author

otherdaniel commented Dec 18, 2024

  • Updated list according to Anne's suggestion.
  • Changed the list format a little, and added a python script that turns it into JSON.
  • Included this and baseline config from the spec.
  • Moved the builtin files (json, text, script) to a builtins/ directory.

Comment on lines 12 to 145
"ondragenter",
"ondragleave",
"ondragover",
"ondragstart",
"ondrop",
"ondurationchange",
"onemptied",
"onend",
"onended",
"onerror",
"onfocus",
"onfocusin",
"onfocusout",
"onformdata",
"ongotpointercapture",
"onhashchange",
"oninput",
"oninvalid",
"onkeydown",
"onkeypress",
"onkeyup",
"onlanguagechange",
"onload",
"onloadeddata",
"onloadedmetadata",
"onloadstart",
"onlostpointercapture",
"onmessage",
"onmessageerror",
"onmousedown",
"onmouseenter",
"onmouseleave",
"onmousemove",
"onmouseout",
"onmouseover",
"onmouseup",
"onmousewheel",
"onmove",
"onoffline",
"ononline",
"onorientationchange",
"onoverscroll",
"onpagehide",
"onpageshow",
"onpaste",
"onpause",
"onplay",
"onplaying",
"onpointercancel",
"onpointerdown",
"onpointerenter",
"onpointerleave",
"onpointermove",
"onpointerout",
"onpointerover",
"onpointerrawupdate",
"onpointerup",
"onpopstate",
"onprogress",
"onratechange",
"onrepeat",
"onreset",
"onresize",
"onresolve",
"onscroll",
"onscrollend",
"onscrollsnapchange",
"onscrollsnapchanging",
"onsearch",
"onsecuritypolicyviolation",
"onseeked",
"onseeking",
"onselect",
"onselectionchange",
"onselectstart",
"onshow",
"onslotchange",
"onstalled",
"onstorage",
"onsubmit",
"onsuspend",
"ontimeupdate",
"ontimezonechange",
"ontoggle",
"ontouchcancel",
"ontouchend",
"ontouchmove",
"ontouchstart",
"ontransitionend",
"onunload",
"onvalidationstatuschange",
"onvolumechange",
"onwaiting",
"onwebkitanimationend",
"onwebkitanimationiteration",
"onwebkitanimationstart",
"onwebkitfullscreenchange",
"onwebkitfullscreenerror",
"onwebkittransitionend",
"onwheel"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we do here? In spec purity terms, I believe we should stick to those in the HTML standard and make a big note that many engines support non-standardized and add them as a hint or such?
But In reality, I can see this going wrong.

@evilpie: How would we best identify the list of supported event handler attributes in Gecko?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably just check if an attribute is a https://html.spec.whatwg.org/#event-handler-content-attributes. We could then maybe non-normatively list all of them (they're also in an index in HTML). Implementations can do roughly the same thing they do for Trusted Types.

Copy link

@evilpie evilpie Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Gecko, Trusted Types currently uses the EventNameList.h.

Copy link
Collaborator Author

@otherdaniel otherdaniel Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now removed the list of event handlers, instead adding a rules to remove event-handler-content-attributes. I'm iterating over those, as if they were a list. Not sure if that's legitimate.

I've also added a note and a script that merges in a copy of the event handlers, so it's more easy to see what this does. This should make it easy to modify, and to -- eventually -- just use a list directly derived from the HTML spec text.

Unfortunately, the preview doesn't run the scripts, so that particular bit isn't easy to review.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think iterating over them is okay. We might have to revisit this when upstreaming.

index.bs Outdated
Comment on lines 740 to 746
<span class="marker">Note:</span> The [=remove unsafe=] algorithm specifies
to additionally remove any [=event handler content attributes=], as defined
in [[HTML]].
If a [=user agent=] defines extensions to the [[HTML]] spec with additional
[=event handler content attributes=], it is its responsibility to decide how
to handle them. Using the current [=event handler content attributes=] list,
the safe baseline configuration looks effectively like so:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is very important.
nit picking, but is there a higher level of severity than a "note"? Can we do warnings? :D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is very important. nit picking, but is there a higher level of severity than a "note"? Can we do warnings? :D

The tool offers Notes, Issues, Examples, Advisements

HTML suports highly dramatic warnings, with a few usages in the spec, e.g. here.

If we want to use a warning here, I'd just emulate the CSS in the spec directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess it's best to do what's easiest and not worth nit picking too hard. :) thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a source comment to upgrade this to class=warning when upstreaming.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "advisement" + with label "Warning:" now. Also addrf a comment to use "class=warning" eventually.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a pre-defined CSS style for "annoying-warning", but that sounds.. like too much. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this 🥳 thank you!

@mozfreddyb
Copy link
Collaborator

lgtm 👍

Copy link
Collaborator

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine. I'm a bit surprised we block nested documents in baseline though. Did we discuss that and I forgot about it?

@otherdaniel
Copy link
Collaborator Author

This seems fine. I'm a bit surprised we block nested documents in baseline though. Did we discuss that and I forgot about it?

We discussed it at end of last year; see minutes of the 2024-12-11 call. At the time, there was a rather large "other" category that we went through. The minutes record "some gut calls during the call", and "no" for frame-related stuff. I do remember this as gut calls indeed, with noone having super strong opinions one way or another.

@annevk
Copy link
Collaborator

annevk commented Jan 16, 2025

Thanks! Admittedly it's also somewhat hard to do a nested document policy that makes sense, so once there's demand for that we can try to figure out more dedicated syntax I suppose. Or point people towards "unsafe".

@otherdaniel
Copy link
Collaborator Author

Landing this. I'll note that the list format makes this really easy to change, spec-wise. :)

@otherdaniel otherdaniel merged commit ebdfd9e into WICG:main Jan 16, 2025
2 checks passed
@otherdaniel otherdaniel deleted the lists branch January 16, 2025 14:22
chalasr added a commit to symfony/symfony that referenced this pull request Jan 16, 2025
This PR was merged into the 6.4 branch.

Discussion
----------

[HtmlSanitizer] fix tests

| Q             | A
| ------------- | ---
| Branch?       | 6.4
| Bug fix?      | no
| New feature?  | no
| Deprecations? | no
| Issues        |
| License       | MIT

The files we used to download are no longer part of the WICG/sanitizer-api repository (see WICG/sanitizer-api#244).

Commits
-------

8f06032 fix tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants