Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progressive download/parsing/indexing of codepoints #16

Open
msiebuhr opened this issue Jul 9, 2012 · 3 comments
Open

Progressive download/parsing/indexing of codepoints #16

msiebuhr opened this issue Jul 9, 2012 · 3 comments

Comments

@msiebuhr
Copy link
Owner

msiebuhr commented Jul 9, 2012

The application halts slow machines/devices quite a lot, so perhaps we should split the index into smaller parts - possibly with various-size chunks, so we can adapt to faster/slower machines and network connections. Eg. naming data-FROM%-TO%.json:

#5% chunks
data-0-5.json
data-5-10.json
…

#10 % chunks
data-0-10.json
data-10-20.json
…

#25 % chunks
data-0-25.json
data-25-50.json
…

#50 % chunks
data-0-50.json
data-50-100.json

Then the client could start out downloading data-0-10.json and parse it. If that takes to long, degrade to 5% chunks, and if It's fast, upgrade to 25%-chunks.

We'd have to have some more data lying around (about 2MB per size), and - more difficult - figure out a dynamic download client.

@Munter
Copy link
Collaborator

Munter commented Jul 15, 2012

An alternative could be offloading the heavy lifting stuff to web workers to keep the interface responsive.

@msiebuhr
Copy link
Owner Author

Another way could be to include some top percentage of the codes in the initial download.

  • misc_pictographs - 12% of all popups, but 1,8% of all codepoints
  • ascii - 11% of all popups, but 0,3% of all codepoints
  • misc_symbols - 7% of all popups, but 0,6% of all codepoints

Picking all these out from the main data-set weighs in at 11KB gzipped (66KB plain), which would still be quite a win.

jq '[.[] | select(.b == "ascii" or .b == "misc_symbols" or .b == "misc_pictographs")]' -c data.json  | gzip | wc -c

@msiebuhr
Copy link
Owner Author

BTW. misc_pictographs alone would weigh in a 7KB gzipped.

Considering the background image is 11KB compressed and the JS-bundle is 45KB, I think we'd be OK all of the proposed subsets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants