Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept: caching part of the emit_dist computation #263

Closed
wants to merge 1 commit into from

Conversation

brian-pane
Copy link

This is a quick test that isn't suitable for the main codebase, but I'm sharing it to get feedback on the general approach: saving previously computed values to reduce the time spent in emit_dist.

The key thing I noticed is that the first half of emit_dist computes a result that depends only on the ltree and the supplied lc value. I.e., for a given ltree, there are only 256 possible combinations. As a simple test, I added in a cache to cover one of the code paths that uses the STATIC_LTREE, and it seemed to help:

Benchmark 1 (62 runs): ./compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          81.1ms ± 1.21ms    79.3ms … 87.2ms          2 ( 3%)        0%
  peak_rss           26.7MB ± 70.2KB    26.5MB … 26.7MB          0 ( 0%)        0%
  cpu_cycles          292M  ±  577K      291M  …  294M           2 ( 3%)        0%
  instructions        612M  ±  279       612M  …  612M           0 ( 0%)        0%
  cache_references    400K  ± 2.39K      397K  …  411K           1 ( 2%)        0%
  cache_misses        305K  ± 5.88K      285K  …  318K           9 (15%)        0%
  branch_misses      3.05M  ± 3.25K     3.04M  … 3.05M           1 ( 2%)        0%
Benchmark 2 (63 runs): ./target/release/examples/compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          79.5ms ± 1.41ms    77.6ms … 86.1ms          4 ( 6%)        ⚡-  1.9% ±  0.6%
  peak_rss           26.7MB ± 73.4KB    26.5MB … 26.7MB          0 ( 0%)          +  0.0% ±  0.1%
  cpu_cycles          286M  ± 1.12M      284M  …  293M           2 ( 3%)        ⚡-  2.1% ±  0.1%
  instructions        614M  ±  429       614M  …  614M           1 ( 2%)          +  0.2% ±  0.0%
  cache_references    402K  ± 6.69K      397K  …  439K           2 ( 3%)          +  0.6% ±  0.4%
  cache_misses        305K  ± 6.43K      281K  …  320K           8 (13%)          +  0.3% ±  0.7%
  branch_misses      2.86M  ± 6.04K     2.85M  … 2.87M           0 ( 0%)        ⚡-  6.1% ±  0.1%

This patch in its current form produces a small regression for compression levels higher than 1. I could fix that by replacing the cache with a static lookup table to cover the STATIC_LTREE case. Would a caching approach be useful for the code paths that use dynamic trees, though? I don't know enough about how the dynamic trees work.

@brian-pane
Copy link
Author

PR #264 implements the static lookup table approach and seems to perform better.

@brian-pane brian-pane closed this Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants