Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use compression to shorten URLs #36

Merged
merged 1 commit into from
Aug 22, 2024

Conversation

j3parker
Copy link
Contributor

@j3parker j3parker commented Aug 21, 2024

Currently the code is put in the URL fragment with URL encoding on key press.

This is very nice but results in long URLs. Really long URLs can be harder to share in some contexts (e.g. Discord is limited to 2000 characters.)

This PR adds support to automatically compress the code. It is backwards compatible with all existing URLs (no links will be broken).

Here's a recent live link from Zulip:

https://live.lean-lang.org/#code=import%20Mathlib.CategoryTheory.Equivalence%0A%0Aopen%20CategoryTheory%0A%0Auniverse%20v%E2%82%81%20v%E2%82%82%20v%E2%82%83%20vm%20u%E2%82%81%20u%E2%82%82%20u%E2%82%83%20u%20um%0Avariable%20%7BB%20%3A%20Type%20u%E2%82%82%20%7D%20%5BCategory.%7Bv%E2%82%82%7D%20B%5D%0Avariable%20%7BM%20%3A%20Type%20vm%20%7D%20%5BCategory.%7Bum%7D%20M%5D%0Aset_option%20linter.longLine%20false%0A%0A%40%5Bext%5D%0Astructure%20Simple%20(F%20%3A%20B%20%E2%A5%A4%20M)%20%20where%0A%20%20pt%20%3A%20M%0A%20%20leg%20c%20%3A%20pt%20%E2%9F%B6%20F.obj%20c%0A%0A%40%5Bext%5D%0Astructure%20SimpleMorphism%20%20%7BF%20%3A%20B%20%E2%A5%A4%20M%7D%20(x%20y%20%3A%20Simple%20F)%20where%0A%20%20hom%20%3A%20x.pt%20%E2%9F%B6%20y.pt%0A%0Avariable%20%7BF%20G%20%3A%20B%20%E2%A5%A4%20M%7D%0Avariable%20(i%3A%20F%20%E2%9F%B6%20G)%0A%0Adef%20fctrObj%20(i%3A%20F%20%E2%9F%B6%20G)%20(w%3A%20Simple%20F)%20%3A%20Simple%20G%20%20where%0A%20%20pt%20%3A%3D%20w.pt%0A%20%20leg%20c%20%3A%3D%20w.leg%20c%20%E2%89%AB%20i.app%20c%0A%0A%0Adef%20fctrMap%20(i%3A%20F%20%E2%9F%B6%20G)%20%7Bx%20y%7D%20(m%20%3A%20SimpleMorphism%20x%20y)%20%3A%20SimpleMorphism%20(fctrObj%20i%20x)%20(fctrObj%20i%20y)%20%3A%3D%0A%20%20%7B%20hom%20%3A%3D%20m.hom%20%7D%20--%20FctrMap%20is%20essentially%20the%20identity%0A%0A--%20I%20essentially%20want%20to%20prove%20(fctrMap%20(%F0%9D%9F%99%20F)%20m)%20%3D%20%20m%2C%20but%20this%20is%20ill-typed%0Adef%20lawBadTypeMismatch%20%7Bx%20y%20%7D%20(m%20%3A%20SimpleMorphism%20x%20y%20)%20%3A%20m%20%3D%20(fctrMap%20(%F0%9D%9F%99%20F)%20m)%20%20%3A%3D%20%20%20%20sorry%0A%0A--%20So%20I%20need%20HEq%0Adef%20law%20%7Bx%20y%20%7D%20(m%20%3A%20SimpleMorphism%20x%20y%20)%20%3A%20HEq%20m%20(fctrMap%20(%F0%9D%9F%99%20F)%20m)%20%20%3A%3D%20sorry%0A%0Adef%20lawTry%20%7Bx%20y%20%7D%20(m%20%3A%20SimpleMorphism%20x%20y%20)%20%3A%20HEq%20m%20(fctrMap%20(%F0%9D%9F%99%20F)%20m)%20%20%3A%3D%0A%20%20heq_of_cast_eq%20hcast%20(by%0A%20%20%20%20apply%20SimpleMorphism.ext%0A%20%20%20%20have%20result%20%3A%20(cast%20hcast%20m).hom%20%20%3D%20(fctrMap%20(%F0%9D%9F%99%20F)%20m).hom%20%3A%3D%0A%20%20%20%20%20%20have%20%3A%20(fctrMap%20(%F0%9D%9F%99%20F)%20m).hom%20%3D%20m.hom%20%3A%3D%20rfl%0A%20%20%20%20%20%20have%20%3A%20(cast%20hcast%20m).hom%20%3D%20m.hom%20%20%3A%3D%20sorry%0A%20%20%20%20%20%20by%0A%20%20%20%20%20%20simp_all%20only%0A%20%20%20%20exact%20result)%0A%0A%20%20where%0A%20%20%20%20%20%20hcast%20%20%7Bx%20y%20%7D%20%3A%20SimpleMorphism%20x%20y%20%3D%20SimpleMorphism%20((fctrObj%20(%F0%9D%9F%99%20F))%20x)%20((fctrObj%20(%F0%9D%9F%99%20F))%20y)%20%3A%3D%20by%0A%20%20%20%20%20%20%20%20have%20idobj%20%7Bx%7D%20%3A%20x%20%3D%20fctrObj%20(%F0%9D%9F%99%20F)%20x%20%3A%3D%20%20by%0A%20%20%20%20%20%20%20%20%20%20apply%20Simple.ext%0A%20%20%20%20%20%20%20%20%20%20%C2%B7%20rfl%0A%20%20%20%20%20%20%20%20%20%20%C2%B7%20apply%20heq_iff_eq.mpr%0A%20%20%20%20%20%20%20%20%20%20%20%20funext%20c%0A%20%20%20%20%20%20%20%20%20%20%20%20have%20%3A%20(fctrObj%20(%F0%9D%9F%99%20F)%20x).leg%20c%20%3D%20x.leg%20c%20%E2%89%AB%20(%F0%9D%9F%99%20F%3A%20F%20%E2%9F%B6%20F).app%20c%20%3A%3D%20rfl%0A%20%20%20%20%20%20%20%20%20%20%20%20simp_all%20only%20%5BCategory.comp_id%2C%20NatTrans.id_app%5D%0A%20%20%20%20%20%20%20%20congr%0A%20%20%20%20%20%20%20%20%C2%B7%20exact%20idobj%0A%20%20%20%20%20%20%20%20%C2%B7%20exact%20idobj%0A

After:

http://live.lean-lang.org/#codez=JYWwDg9gTgLgBAWQIYwBYBtgCMB0BhFAUwHNoBPAFVUPJwFEBHAV2ADcl1CA7AY0ICh+EMNzgEYJclRpQygplzaEoAZ0JxWgQIINgIIINgYIINIOE21M9TQ0xMh+7KMCRZOcAN4AhOAC44FMiJM9AF84AG1xSVkcV1YdEPcAXTskBycXVwRvX391VmMQ8KJSKNcmEBCEJLUYAH1hGGAILjhMLgkoHHQm4gAZYC51ADMONUEAAVDCAA8YKpgoJh4YJih1AGVQMBcACgAxLM9AE0pEAEo4OAB3alX+c7B4HwRblpI4Hiz7uEA38jhdnAgsAArN7jSYzOYLJYrdabTgIaBgVDAFTGNz7HxHRAhbZTOBkLIbcAuXZnK7KATnVAQYw+KY4T4/Mj0mCCeyOZzqVz7ADiBzgxwQQWSqQ5cG2wB8+x+3JOggAJoRBnBBksoAB5IFiiW/b5wGVii4+QlbdQkgmw9S8y7XClwT5eAC8l2Zz04xDe3idF06r3egGsiODAHBIMBgEHyxXK1XIMPiyW6/WuXFkbE0uDGuEIpEouDJs5Gi3wqCI5HGbYq+Ya4HAXNncuqquBvH5h3PVxwKk0p0gHCduAhAC0A9+0ZDgZUcEIKjUbUc6HQ+LQ6mACtnMDk/CHcAAkpPp9wGhwF5ckG04DAIHaoBBWOp6/MY2LAKwbgE3935nEBnJ1wEAAGjgWBMPAaDIuOgbzgO64iHK/AKkq6BIBc7hIHKfgiAgpYoDwqBuMm/ZimmGaEEWJY5nh+Y/nATr3lAj7bK+74/mcnrnOcKjQLIghbmsl67gMhBynAAASjCwZGCEXLheL4dshGFlmpa5tJFEiQwlE0XRDFmp+5yOnA7FQJxYnwYhFCyFJ+KpuaRLEQpZHKVkqnqRWtFjvRb7acxjrPNQDB1IMNQ8EgKi1IQamoEFIVilgcisXAIZbPiREkdmPbTCycWoEgt5wKsKhMOgDxipF8ARcF8Cfr21LnNRLmaR5H4nFVXbPJl2XqD4GluVpjXNVRP59XpUCDOgrWsVlOWdSVHbTZVfbdn1LEGZxcXnDFY1sZsNRHnATQLmN0xIEsuVTgVMCys8ZI3KtM3lecSbSSEBY2Slil4U6yV2WWNGNu575nFMdY/Zqf0kmcZAtgBsU3R27WBnKALAkmT1KU6Lm/T1Sl6VDG2rQlx5ETg6W43FADtuUjSTrHk/j+K+TUwCDAFYU4OAUBU3FgwKOlIIw6tE0dWK6Mg5jgM+u67xOnSboegGoPxj8JLBqGHpDZTfNxSoW07Xt+KFBIxRMjw1JgAzcr/gAcigZmnioOArttoZJHzxtcMQ7N8+Th3HSuiNU17UxHfAvtAvwQA

Benchmarks

I benchmarked this by taking mathlib and concatenating everything with git ls-files | grep '\.lean$' | xargs cat > everything.lean (45mb lol) and then using this script:

const lz = require('lz-string');
const fs = require('fs');

const data = fs.readFileSync('everything.lean').toString();

for(let length = 1; length < 10000; length++) {
        for(let trial = 0; trial < 100; trial++) {
                try {
                       const idx = Math.random()*(data.length - length);
                       const snippet = data.substr(idx, length);
                       const encodedLen = encodeURIComponent(snippet).length;
                       const compressedLen = lz.compressToBase64(snippet).replace(/=*$/, '').length;
                       console.log(`${length},${encodedLen},${compressedLen}`);
                } catch{}
        }
}

and here's what I found:
image

compression is about the same on average until about 30 characters. At about 60 characters it's always better. For longer snippets the gap of course grows quite large.
image

(The reason there's noise in the baseline and why its not 1:1 is that things like whitespace need to be URL encoded)

@@ -7,6 +7,7 @@ import { Menu } from './Navigation'
import { PreferencesContext } from './Popups/Settings'
import { useWindowDimensions } from './utils/window_width'
import LeanLogo from './assets/logo.svg'
import LZString from 'lz-string';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This library has the MIT license.

I considered using CompressionStream with gzip but it only became supported in Safari last year. I figured it would be bad if some live URLs (generated by people with modern browsers) didn't work on browsers that have a non-trivial amount of users left.

I came across lz-stream because it's used in sharplab.io, a C# REPL : https://github.com/ashmind/SharpLab/blob/078c4e2c5a790f6b7203401c2ec63de22913c0d1/source/WebApp/app/features/persistent-state/handlers/url.ts#L1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: sharplab does an extra thing it calls "pre-compression" where it replaces some common substrings with shorter ones (@3, @7 etc.) before sending it to this library: https://github.com/ashmind/SharpLab/blob/main/source/WebApp/app/features/persistent-state/handlers/url/precompressor.ts

I did some quick checks and it wasn't a big deal. If you're interested I could get some actual data using the benchmark process I described in the PR description. The trade-off is extra complexity/.js size for a much smaller reduction in URL size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to further optimise the algorithm, please feel free to do so, I'd happily merge it :)

@@ -286,7 +288,11 @@ function App() {
if (args.code) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "code" argument is still supported so that old URLs continue to work.

Loading a URL like that will cause the URL to be immediately replaced in the address bar with a codez URL (if the code is compressible -- see below.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the docs after I merged your PR; the code argument also has to keep working as is, because Zulip needs it for auto generating links from code blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's cool!

history.replaceState(undefined, undefined!, formatArgs(args))
} else {
let args = {project: _project, url: null, code: fixedEncodeURIComponent(code)}
history.replaceState(undefined, undefined!, formatArgs(args))
const compressed = LZString.compressToBase64(code).replace(/=*$/, '');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trimming trailing = because (A) it meses with the args parsing (B) it's not necessary anyway (these are padding bytes from base64; but the decompression doesn't actually care if they are there)

let args = {project: _project, url: null, code: fixedEncodeURIComponent(code)}
history.replaceState(undefined, undefined!, formatArgs(args))
const compressed = LZString.compressToBase64(code).replace(/=*$/, '');
if(compressed.length < code.length) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this only use compression if its actually a win (see charts in PR description).

For very small character counts it isn't always a win, although it also doesn't really matter anyway.
If you'd like it to always format URLs with codez I can remove this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked in DM; the check here is biased against compression because code is smaller than fixedEncodeURIComponent(code)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in fa4342d

@j3parker j3parker marked this pull request as ready for review August 21, 2024 19:58
@j3parker
Copy link
Contributor Author

FYI @joneugster -- I see you've been modifying this repo lately but the repo doesn't have anyone set to get added to the PRs automatically.

Copy link
Collaborator

@joneugster joneugster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is great!

@@ -7,6 +7,7 @@ import { Menu } from './Navigation'
import { PreferencesContext } from './Popups/Settings'
import { useWindowDimensions } from './utils/window_width'
import LeanLogo from './assets/logo.svg'
import LZString from 'lz-string';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to further optimise the algorithm, please feel free to do so, I'd happily merge it :)

@@ -286,7 +288,11 @@ function App() {
if (args.code) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the docs after I merged your PR; the code argument also has to keep working as is, because Zulip needs it for auto generating links from code blocks.

@joneugster joneugster merged commit 878bbe3 into leanprover-community:main Aug 22, 2024
2 checks passed
@j3parker j3parker deleted the compressed-urls branch August 22, 2024 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants