From 1d1aca36fc67b38fb0a0b2acac6ee76706b9db68 Mon Sep 17 00:00:00 2001 From: ChrisBAshton Date: Mon, 10 Apr 2023 11:55:41 +0100 Subject: [PATCH] Fix XSS vulnerability on search results page Pages that are indexed in search results have their entire contents indexed, including any HTML code snippets. These HTML snippets would appear in the search results unsanitised, so it was possible to render arbitrary HTML or run arbitrary scripts: > ![script being invoked](https://user-images.githubusercontent.com/5111927/230888935-0367b598-eda7-4f67-afb5-799b41684ee3.png) > ![HTML being rendered](https://user-images.githubusercontent.com/5111927/230888939-f0056edc-6955-4f10-8aee-c93414b1cb69.png) This is a largely theoretical security issue; to exploit it, an attacker would need to find a way of committing malicious code to a page indexed by a site that uses tech-docs-gem (which are typically not editable by untrusted users). Their code would also be limited by the relatively short length that's rendered in the corresponding search result. Nevertheless, the XSS would then be triggerable by visiting a pre-constructed URL (`/search/index.html?q=some+search+term`), which users could be tricked into clicking on through social engineering. This commit sanitises the HTML before rendering it to the page. It does so whilst retaining the `` behaviour that highlights the search term in the result: > ![sanitised HTML with highlights](https://user-images.githubusercontent.com/5111927/230888944-9aaf4920-cddd-43f9-8ef5-17f15785af73.png) I've used jQuery's `text()` function for sanitisation, as that is the approach used elsewhere in the project ([1]). Unfortunately this can't be done in memory, as it requires an existing DOM node to act upon ([2]). I've therefore had to create a temporary node, which then gets immediately removed via the `detach` method, which returns a jQuery object we can then manipulate with `text()` and `html()` ([3]). I did consider using native JavaScript (using the same approach as in Mustache [4]) to avoid this hack, but this itself may contain bugs and would lead to having two sanitisation approaches to maintain, so I opted against it. For future reference, the code in this commit can be swapped out with: ```js var entityMap = { '&': '&', '<': '<', '>': '>', '"': '"', "'": ''', '/': '/', '`': '`', '=': '=' }; var sanitizedContent = String(content).replace(/[&<>"'`=\/]/g, function (s) { return entityMap[s]; }); ``` [1]: https://github.com/alphagov/tech-docs-gem/blob/66cc7ab0a06dc2f1fe89de8cba2270fcf46f6466/lib/assets/javascripts/_modules/search.js#L202-L204 [2]: http://api.jquery.com/text/#text2 [3]: http://api.jquery.com/detach/ [4]: https://github.com/janl/mustache.js/blob/972fd2b27a036888acfcb60d6119317744fac7ee/mustache.js#L60-L75 --- lib/assets/javascripts/_modules/search.js | 5 +++-- spec/javascripts/search-spec.js | 6 ++++++ 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/lib/assets/javascripts/_modules/search.js b/lib/assets/javascripts/_modules/search.js index 0fa20ecf..c2fd887e 100644 --- a/lib/assets/javascripts/_modules/search.js +++ b/lib/assets/javascripts/_modules/search.js @@ -169,8 +169,9 @@ this.processContent = function processContent (content, query) { var output - content = '
' + content + '
' - content = $(content).mark(query) + $('html').append('
') + var sanitizedContent = $('#tmp-div-for-sanitising-html').detach().text(content).html() + content = $('
' + sanitizedContent + '
').mark(query) // Split content by sentence. var sentences = content.html().replace(/(\.+|:|!|\?|\r|\n)("*|'*|\)*|}*|]*)/gm, '|').split('|') diff --git a/spec/javascripts/search-spec.js b/spec/javascripts/search-spec.js index dbebe14a..3792fa20 100644 --- a/spec/javascripts/search-spec.js +++ b/spec/javascripts/search-spec.js @@ -99,5 +99,11 @@ describe('Search', function () { var expectedResults = ' … This is test sentence one … This is test sentence two … This is test sentence three … This is test sentence four … This is test sentence five … ' expect(processedContent).toEqual(expectedResults) }) + + it('sanitises HTML in the search results', function () { + processedContent = module.processContent('It will render multiple `` `` and its accompanying suggestions and `aria-live` region.', 'multi region') + var expectedResults = ' … It will render multiple `<input>` `<script>alert("uhoh")</script>` and its accompanying suggestions and `aria-live` region … ' + expect(processedContent).toEqual(expectedResults) + }) }) })