Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

innerText should match the rendered appearance #4114

Closed
mattfysh opened this issue Sep 24, 2024 · 1 comment
Closed

innerText should match the rendered appearance #4114

mattfysh opened this issue Sep 24, 2024 · 1 comment

Comments

@mattfysh
Copy link

import * as cheerio from 'cheerio';

const res = await fetch('https://example.com');
const body = await res.text();

const $ = cheerio.load(body);

console.log(JSON.stringify($('p').text()));

The output is:

This domain is for use in illustrative examples in documents. You may use this\n domain in literature without prior coordination or asking for permission.

The newline character followed by excessive spacing is present only in the HTML source. Generally the innerText will render this property, e.g. use Chrome to visit example.com and select the same element, then try $0.innerText and you'll get:

This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.

@fb55
Copy link
Member

fb55 commented Dec 25, 2024

Both are equivalent when outputting HTML. Cheerio implements the serialisation part of the HTML spec, which leads to code like this. Changing it leads to incompatibilities.

You can manually fix whitespace using .replace(/\s+/g, ' ').

@fb55 fb55 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants