Optimize for strings without multibyte characters #724

sriedel · 2019-06-16T07:34:51Z

Optimizes finding the character offset for strings that include no multibyte characters.

Note: I'm no expert in string encodings, but my naive assumption is if there are as many bytes in a string as there are characters, the requested character offset must be equal to the supplied byte offset. This assumption should hold for the majority of documentation written in english with UTF-8 encoding.

Motivation: generating ri documentation for the gem crack-0.4.3 took 156.3 seconds on my gen 6 i7 according to the rdoc output. Looking at the process with rubyspy, I saw that most of the time was being burned in RDoc::Markup::Parser#char_pos.

The output of rdoc with the original char_pos method:

~/.rvm/gems/ruby-2.6.3/gems/crack-0.4.3 $ time rdoc --ri
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
Parsing sources...
100% [22/22]  test/xml_test.rb

Generating RI format into /home/sr/.rdoc...

Files:      22

  Classes:     7 ( 6 undocumented)
  Modules:     2 ( 2 undocumented)
  Constants:   3 ( 2 undocumented)
  Attributes:  1 ( 1 undocumented)
  Methods:    11 (11 undocumented)

  Total:      24 (22 undocumented)
    8.33% documented

  Elapsed: 156.3s

 
real	2m36.989s
user	2m35.967s
sys	0m0.217s

With this change, the time to build ri documentation for the above mentioned gem is ~2.4 seconds:

~/.rvm/gems/ruby-2.6.3/gems/crack-0.4.3 $ time rdoc --ri 
Parsing sources...
100% [22/22]  test/xml_test.rb

Generating RI format into /home/sr/.rdoc...

  Files:      22

  Classes:     7 ( 6 undocumented)
  Modules:     2 ( 2 undocumented)
  Constants:   3 ( 2 undocumented)
  Attributes:  1 ( 1 undocumented)
  Methods:    11 (11 undocumented)

  Total:      24 (22 undocumented)
    8.33% documented

  Elapsed: 2.3s


real	0m2.798s
user	0m2.661s
sys	0m0.130s

aycabta

I tried this patch for crack-0.4.3 but I couldn't reproduce the performance...you should delete the output directory before re-run because RDoc uses a cache to generate new files.

Optimize for strings without multibyte characters

4ea4d23

aycabta requested changes Jul 9, 2019

View reviewed changes

nobu added the Feature Request label Feb 8, 2022

st0012 added enhancement and removed Feature Request labels Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize for strings without multibyte characters #724

Optimize for strings without multibyte characters #724

sriedel commented Jun 16, 2019

aycabta left a comment

Optimize for strings without multibyte characters #724

Are you sure you want to change the base?

Optimize for strings without multibyte characters #724

Conversation

sriedel commented Jun 16, 2019

aycabta left a comment

Choose a reason for hiding this comment