What would it take to support UTF16 encoding? #192

kentookura · 2025-01-12T12:36:01Z

As reported by @Trebor-Huang in Trebor-Huang/vscode-forester#11 (comment), it seems that the VSCode LSP client refuses to work with UTF16 encoded positions. I was wondering if you could say a few words about the trouble you mention here?

Thanks!

favonia · 2025-01-12T13:11:55Z

@kentookura To support UTF-16 efficiently, we need to avoid the recalculation of the byte offset of a UTF-16 unit when files change. This can be done by... (1) inefficient recalculation (oops) or (2) some smart data structure maintaining the mapping.

favonia · 2025-01-12T13:13:09Z

For ASCII printable characters, I believe byte offsets and UTF-16 units coincide.

TOTBWF · 2025-01-12T17:52:42Z

Ugh, VSCode being a bad citizen yet again...

As for the data structure, I think some sort of rope segmented at points where UTF-16 and UTF-8 offsets disagree ought to work. Nodes further up the tree could then store both the UTF-8 and UTF-16 range that the subtree covers, so we'd get efficient queries in both directions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What would it take to support UTF16 encoding? #192

What would it take to support UTF16 encoding? #192

kentookura commented Jan 12, 2025 •

edited

Loading

favonia commented Jan 12, 2025

favonia commented Jan 12, 2025 •

edited

Loading

TOTBWF commented Jan 12, 2025

What would it take to support UTF16 encoding? #192

What would it take to support UTF16 encoding? #192

Comments

kentookura commented Jan 12, 2025 • edited Loading

favonia commented Jan 12, 2025

favonia commented Jan 12, 2025 • edited Loading

TOTBWF commented Jan 12, 2025

kentookura commented Jan 12, 2025 •

edited

Loading

favonia commented Jan 12, 2025 •

edited

Loading