Skip to content

Commit

Permalink
Fix XSS vulnerability on search results page
Browse files Browse the repository at this point in the history
Pages that are indexed in search results have their entire
contents indexed, including any HTML code snippets. These HTML
snippets would appear in the search results unsanitised, so it
was possible to render arbitrary HTML or run arbitrary scripts:

> ![script being invoked](https://user-images.githubusercontent.com/5111927/230888935-0367b598-eda7-4f67-afb5-799b41684ee3.png)
> ![HTML being rendered](https://user-images.githubusercontent.com/5111927/230888939-f0056edc-6955-4f10-8aee-c93414b1cb69.png)

This is a largely theoretical security issue; to exploit it, an
attacker would need to find a way of committing malicious code
to a page indexed by a site that uses tech-docs-gem (which are
typically not editable by untrusted users). Their code would
also be limited by the relatively short length that's rendered
in the corresponding search result. Nevertheless, the XSS would
then be triggerable by visiting a pre-constructed URL
(`/search/index.html?q=some+search+term`), which users could be
tricked into clicking on through social engineering.

This commit sanitises the HTML before rendering it to the page.
It does so whilst retaining the `<mark data-markjs="true">`
behaviour that highlights the search term in the result:

> ![sanitised HTML with highlights](https://user-images.githubusercontent.com/5111927/230888944-9aaf4920-cddd-43f9-8ef5-17f15785af73.png)

I've used jQuery's `text()` function for sanitisation, as that is
the approach used elsewhere in the project ([1]). Unfortunately
this can't be done in memory, as it requires an existing DOM node
to act upon ([2]). I've therefore had to create a temporary node,
which then gets immediately removed via the `detach` method, which
returns a jQuery object we can then manipulate with `text()` and
`html()` ([3]).

I did consider using native JavaScript (using the same approach as
in Mustache [4]) to avoid this hack, but this itself may contain
bugs and would lead to having two sanitisation approaches to
maintain, so I opted against it. For future reference, the code in
this commit can be swapped out with:

```js
var entityMap = {
  '&': '&amp;',
  '<': '&lt;',
  '>': '&gt;',
  '"': '&quot;',
  "'": '&#39;',
  '/': '&#x2F;',
  '`': '&#x60;',
  '=': '&#x3D;'
};
var sanitizedContent = String(content).replace(/[&<>"'`=\/]/g, function (s) {
  return entityMap[s];
});
```

[1]: https://github.com/alphagov/tech-docs-gem/blob/66cc7ab0a06dc2f1fe89de8cba2270fcf46f6466/lib/assets/javascripts/_modules/search.js#L202-L204
[2]: http://api.jquery.com/text/#text2
[3]: http://api.jquery.com/detach/
[4]: https://github.com/janl/mustache.js/blob/972fd2b27a036888acfcb60d6119317744fac7ee/mustache.js#L60-L75
  • Loading branch information
ChrisBAshton committed Apr 11, 2023
1 parent 7c29396 commit 1d1aca3
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 2 deletions.
5 changes: 3 additions & 2 deletions lib/assets/javascripts/_modules/search.js
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,9 @@

this.processContent = function processContent (content, query) {
var output
content = '<div>' + content + '</div>'
content = $(content).mark(query)
$('html').append('<div id="tmp-div-for-sanitising-html"></div>')
var sanitizedContent = $('#tmp-div-for-sanitising-html').detach().text(content).html()
content = $('<div>' + sanitizedContent + '</div>').mark(query)

// Split content by sentence.
var sentences = content.html().replace(/(\.+|:|!|\?|\r|\n)("*|'*|\)*|}*|]*)/gm, '|').split('|')
Expand Down
6 changes: 6 additions & 0 deletions spec/javascripts/search-spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -99,5 +99,11 @@ describe('Search', function () {
var expectedResults = ' … This is <mark data-markjs="true">test</mark> sentence one … This is <mark data-markjs="true">test</mark> sentence two … This is <mark data-markjs="true">test</mark> sentence three … This is <mark data-markjs="true">test</mark> sentence four … This is <mark data-markjs="true">test</mark> sentence five … '
expect(processedContent).toEqual(expectedResults)
})

it('sanitises HTML in the search results', function () {
processedContent = module.processContent('It will render multiple `<input>` `<script>alert("uhoh")</script>` and its accompanying suggestions and `aria-live` region.', 'multi region')
var expectedResults = ' … It will render <mark data-markjs="true">multi</mark>ple `&lt;input&gt;` `&lt;script&gt;alert("uhoh")&lt;/script&gt;` and its accompanying suggestions and `aria-live` <mark data-markjs="true">region</mark> … '
expect(processedContent).toEqual(expectedResults)
})
})
})

0 comments on commit 1d1aca3

Please sign in to comment.