Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do client side HTML sanitiziation from a string? #10

Open
wader opened this issue Feb 3, 2014 · 9 comments
Open

How to do client side HTML sanitiziation from a string? #10

wader opened this issue Feb 3, 2014 · 9 comments

Comments

@wader
Copy link

wader commented Feb 3, 2014

Hello, should I be able to do safe client side HTML sanitiziation from a string using Sanitize.js? as it seems to use an existing DOM tree should I use innerHTML to create the tree first? this feels unsafe so i did a small test and it seems to be a bad idea.

<html>
<script type="text/javascript" src="sanitize.js"></script>
<script type="text/javascript">
  window.addEventListener("load", function() {
    var cleaner_elm = document.getElementById("cleaner");
    cleaner_elm.innerHTML = "<img src='' onerror='alert()'>";
    var s = new Sanitize({});
    var cleaned_html = s.clean_node(cleaner_elm);
    console.log(cleaned_html);
  });
</script>
<body>
  <div id="cleaner"></div>
</body>
</html>

This will popup and alert dialog.

@wader
Copy link
Author

wader commented Feb 5, 2014

I guess the DOM node being cleaned should be detached or even better be inside a new document created by document.implementation.createHTMLDocument?

@wader
Copy link
Author

wader commented Feb 5, 2014

Would this be a safe way of doing it?

Sanitize.prototype.clean_string = function(string) {
  var elm = document.implementation.createHTMLDocument().body;
  elm.innerHTML = string;
  cleaned_fragment = this.clean_node(elm);
  elm = document.implementation.createHTMLDocument().body;
  elm.appendChild(cleaned_fragment);
  return elm.innerHTML;
};

@jasonseney
Copy link

Here's an example I've been using to go from string --> sanitized string:

var input = '<p id="para1"><b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" /></p>';

// Wrap input in a dummy element (required for Santize)
var dummyInputNode = document.createElement('div');
dummyInputNode.innerHTML = input;

var scrubber = new Sanitizer(this.sanitizeConfig);

// Get a DocumentFragment back after cleaning
var cleanFragment = scrubber.clean_node(dummyInputNode);

// Wrap the fragment in a div in order to generate HTML from fragment
var dummyOutputNode = document.createElement('div');
dummyOutputNode.appendChild(cleanFragment.cloneNode(true));

// Get the html string from inside the div
var cleanHTML = dummyOutputNode.innerHTML;

@wader
Copy link
Author

wader commented Apr 24, 2014

@jasonseney without document.implementation.createHTMLDocument it seems as some browsers might start to load some resources before the dom tree is cleaned. Might be good to know.

@jasonseney
Copy link

@wader - Thank you for tip. Do you happen to have an example handy that replicates this behavior?

@wader
Copy link
Author

wader commented Apr 28, 2014

Sure, take a look at http://jsfiddle.net/sYdJy/

@iliakan
Copy link

iliakan commented Aug 1, 2014

Is there a "recommended" way from the author? I guess, there should be.

@iliakan
Copy link

iliakan commented Aug 1, 2014

P.S. It's not always easy to generate HTML from a string, e.g you can't just insert a <td>..</td> into <body>. Browser will autocorrect this, so you end with the broken markup.

@ninsuo
Copy link

ninsuo commented Nov 10, 2016

The way @wader given works well even with scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants