cli (tjs idea) #32

Conni2461 · 2021-08-08T16:04:41Z

assuming you have a file called files doing fd --hidden -I > files for example.

hyperfine --warmup 3 "cat files | ./build/cli fzf.h" "cat files | fzf --filter fzf.h"

fzf.h is the "prompt term" here

single threaded

Benchmark #1: cat files | ./build/cli fzf.h
  Time (mean ± σ):     125.7 ms ±   6.1 ms    [User: 107.0 ms, System: 32.3 ms]
  Range (min … max):   116.8 ms … 143.7 ms    25 runs
 
Benchmark #2: cat files | fzf --filter fzf.h
  Time (mean ± σ):      89.8 ms ±   8.9 ms    [User: 133.1 ms, System: 31.6 ms]
  Range (min … max):    72.9 ms … 110.1 ms    34 runs
 
Summary
  'cat files | fzf --filter fzf.h' ran
    1.40 ± 0.16 times faster than 'cat files | ./build/cli fzf.h'

simple multi threaded attempt (add to list/sorting still missing)

Benchmark #1: cat files | ./build/cli fzf.h
  Time (mean ± σ):     174.4 ms ±  14.0 ms    [User: 279.8 ms, System: 157.0 ms]
  Range (min … max):   151.1 ms … 200.5 ms    16 runs
 
Benchmark #2: cat files | fzf --filter fzf.h
  Time (mean ± σ):      94.8 ms ±   8.0 ms    [User: 136.4 ms, System: 31.1 ms]
  Range (min … max):    81.3 ms … 115.4 ms    27 runs
 
Summary
  'cat files | fzf --filter fzf.h' ran
    1.84 ± 0.21 times faster than 'cat files | ./build/cli fzf.h'

Conni2461 · 2021-08-09T18:53:24Z

It turns out that c is pretty fast (thats singlethreaded)

hyperfine --warmup 3 "./build/cli fzf.h < files" "fzf --filter fzf.h < files"
Benchmark #1: ./build/cli fzf.h < files
  Time (mean ± σ):      62.6 ms ±   1.6 ms    [User: 50.6 ms, System: 11.6 ms]
  Range (min … max):    59.6 ms …  66.8 ms    47 runs
 
Benchmark #2: fzf --filter fzf.h < files
  Time (mean ± σ):      70.2 ms ±   4.2 ms    [User: 117.6 ms, System: 19.7 ms]
  Range (min … max):    63.1 ms …  82.6 ms    43 runs
 
Summary
  './build/cli fzf.h < files' ran
    1.12 ± 0.07 times faster than 'fzf --filter fzf.h < files'

Conni2461 · 2021-08-09T19:25:51Z

yeah 65% faster if i read the output correctly 🤣

hyperfine --warmup 3 "./build/cli fzf.h < files" "fzf --filter fzf.h < files"
Benchmark #1: ./build/cli fzf.h < files
  Time (mean ± σ):      45.1 ms ±   1.6 ms    [User: 39.8 ms, System: 4.8 ms]
  Range (min … max):    42.3 ms …  51.8 ms    65 runs
 
Benchmark #2: fzf --filter fzf.h < files
  Time (mean ± σ):      74.5 ms ±   8.9 ms    [User: 120.4 ms, System: 19.3 ms]
  Range (min … max):    64.2 ms … 114.8 ms    42 runs
 
Summary
  './build/cli fzf.h < files' ran
    1.65 ± 0.21 times faster than 'fzf --filter fzf.h < files'

kkharji · 2021-08-09T20:12:20Z

yeah 21% if i read the output correctly 🤣

hyperfine --warmup 3 "./build/cli fzf.h < files" "fzf --filter fzf.h < files"
Benchmark #1: ./build/cli fzf.h < files
  Time (mean ± σ):      45.1 ms ±   1.6 ms    [User: 39.8 ms, System: 4.8 ms]
  Range (min … max):    42.3 ms …  51.8 ms    65 runs
 
Benchmark #2: fzf --filter fzf.h < files
  Time (mean ± σ):      74.5 ms ±   8.9 ms    [User: 120.4 ms, System: 19.3 ms]
  Range (min … max):    64.2 ms … 114.8 ms    42 runs
 
Summary
  './build/cli fzf.h < files' ran
    1.65 ± 0.21 times faster than 'fzf --filter fzf.h < files'

POG!!

fredizzimo · 2021-08-26T21:57:05Z

Even if it's 65% faster, it might still be worth trying to improve.

One fairly simple thing could be to change the scoring calculation from int16_t to float. Floats are almost the same speed as int16_t for single operations, sometimes a bit faster, sometimes a bit slower depending on the operation and processor, but very comparable in speed.

But the difference is that the the compiler can do much more optimization for floats using SIMD extensions like SSE and AVX with the right compiler options, although there are some integer support in modern SIMD instruction sets.

Additionally when using strict aliasing, the floating point type is different from the char and int types for the other arrays, so it doesn't always need to assume that there are aliasing going on, which forces things to be read from memory instead of using cached results in registers. But of course there's still a lot of potential aliasing, and I'm not sure how well the compiler is able to analyse what's going on, so some manual annotation might improve things further.

Using floats will of course cause a bit more memory pressure and therefore potentially more cache misses, which are slow. But the slab is re-used, and unless you match really long strings, everything fits easily into the cache, so that should not be an issue.

But of course there could be other things going on that would make it slower with floats instead of faster, so benchmarking is needed. For example it's possible that the integer versions already uses the simd extensions that are possible, or that the opportunity for optimization does not exits the way the code currently is written. But since the algorithm is essentially doing matrix calculations, it might still be possible to rewrite it to take proper advantage of SIMD.

Conni2461 · 2021-08-27T07:43:33Z

All of this sounds good. I think you are way more qualified doing this than i am. 😆

But we are not stopping making telescope faster. The idea here was having a pipe that allows us to move the score calculation away from the neovim thread. Other idea we have, i talked about this yesterday with tj, was if we bundle fzf-native in telescope core and require people to do make on update, that we make use of this required build step. So that we not just put this submodule in telescope but slowly move some core telescope components to c and write bindings for it. Like right now we do the actual sorting in lua with a linked list that basically only correctly sorts the amount of items that will be displayed. for dropdown ~10 for normal ~50. Thats fine but makes scrolling thought all results a little bit harder because we then need to look through the whole linked list again if we are looking for the best item at index n + 1, etc ...

What i was thinking is having the store actually in c as a score, index tuple (we cant store the actual table reference in c afaik but we can store the index to the table ref that lies at i in the results table i think). We could make a heap in c then and maybe sort the first 500~1000 correctly without having a performance penalty because we have seen a factor 10 performance improvement between the c part and lua (fzy-native vs fzy lua version).

I just need to mention it. I am not sure that a heap is optimal here, i just know that it ended up being faster than the linked list approach i tried first and i had build a heap before so i tried that. And it ended up bringing the time down from 116.8 ms to 45.1 ms.

See nvim-telescope/telescope.nvim#988

matu3ba · 2021-09-08T14:37:09Z

"I just need to mention it. I am not sure that a heap is optimal here"

Optimal is a preallocated heap with offsets and length to objects xor padded objects. The latter looks slower though, since line length can differ alot. The first could be tried, but would be wasteful. We dont know the count of hits, so any estimation might be wrong.

Second-optimal are arena/region-allocators, where you just bump the capacity (pointer) to add more stuff inside instead having the malloc overhead on every call. If the memory page is full, the arena allocator takes the next one.

The index stores the offset to text chunks. The control structure stores pointer to the indexes.

So something like

control_block
 |    |    
 |    |->index0:  | offset0 | offset 1 | ... |
 |------>index1:  | offset0 | offset 1 | ... |

and offset0 -------------> text chunk0  (assume this is a continues memory chunk from region allocator)
    offset1 -------------> text chunk1
           ...                                           ....

Then the problem of cache locality boils down to having the lookup the text defined by offsets. I am not familiar with what is stored and how sorting should work, so I cant tell how the control_block would look exactly.

Note, that looks like a fundamental redesign. So it should be done later and not in this PR as to limit the scope.

Not sure, if adding a arena/region allocator is worth it though. https://github.com/cgaebel/arena_alloc looks good enough for that.

Conni2461 · 2021-09-08T19:08:15Z

I was talking about the data structure heap, like i implemented here in this PR. max-heap( to be more specific). I wasnt talking about allocation. fzf-native actually only calculates the score. Telescope sorts it in a datastructure (currently linked list and we only sort the first n elements, the displayed onces). I was thinking maybe we could improve that and other core elements in telescope. Thats all. But that doesnt affect either this repository.

This PR is just me playing around with a idea that tj mentioned. I dont try to get this merged anytime soon (i could just merge it) but i havent figured what i wanna do with it.

Still thanks for your comments :)

Conni2461 force-pushed the cli branch from 232275a to 2ef82a0 Compare August 9, 2021 18:51

Conni2461 force-pushed the cli branch from 3837812 to 1a6ac67 Compare August 27, 2021 07:19

fdschmidt93 mentioned this pull request Oct 13, 2021

break: c rewrite nvim-telescope/telescope.nvim#1270

Closed

matu3ba mentioned this pull request Oct 15, 2021

WebAssembly plugins system helix-editor/helix#122

Closed

fdschmidt93 mentioned this pull request Jan 11, 2022

Sorter for all lines nvim-telescope/telescope.nvim#1676

Closed

Conni2461 mentioned this pull request Feb 15, 2022

Telescope FZF wont run due to the attempt to call a nil value #55

Closed

Conni2461 force-pushed the cli branch from 1a6ac67 to 20eecbf Compare February 18, 2022 23:03

a simple small cli tool

0de8b50

Conni2461 force-pushed the cli branch from 20eecbf to 0de8b50 Compare February 19, 2022 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cli (tjs idea) #32

cli (tjs idea) #32

Conni2461 commented Aug 8, 2021

Conni2461 commented Aug 9, 2021

Conni2461 commented Aug 9, 2021 •

edited

Loading

kkharji commented Aug 9, 2021

fredizzimo commented Aug 26, 2021

Conni2461 commented Aug 27, 2021

matu3ba commented Sep 8, 2021 •

edited

Loading

Conni2461 commented Sep 8, 2021 •

edited

Loading

cli (tjs idea) #32

Are you sure you want to change the base?

cli (tjs idea) #32

Conversation

Conni2461 commented Aug 8, 2021

Conni2461 commented Aug 9, 2021

Conni2461 commented Aug 9, 2021 • edited Loading

kkharji commented Aug 9, 2021

fredizzimo commented Aug 26, 2021

Conni2461 commented Aug 27, 2021

matu3ba commented Sep 8, 2021 • edited Loading

Conni2461 commented Sep 8, 2021 • edited Loading

Conni2461 commented Aug 9, 2021 •

edited

Loading

matu3ba commented Sep 8, 2021 •

edited

Loading

Conni2461 commented Sep 8, 2021 •

edited

Loading