Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why use kd-tree rather than HNSW? #23

Open
zxch3n opened this issue Jun 1, 2023 · 11 comments
Open

Why use kd-tree rather than HNSW? #23

zxch3n opened this issue Jun 1, 2023 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@zxch3n
Copy link

zxch3n commented Jun 1, 2023

For high-dimensional data, kd-tree uses O(n) time to do the search. And knowing something about the (Euclidian) distance in one dimension says very little about the distance in the full space.

@jlarmstrongiv
Copy link

I explored HNSW a couple months ago. There are currently two HNSW wasm packages:

@DawChihLiou
Copy link
Contributor

Thanks for opening the issue @zxch3n ! Could you tell me more about HNSW and why is it a better option than kd tree? I don't know much about HNSW yet.

@DawChihLiou DawChihLiou added the enhancement New feature or request label Jun 3, 2023
@DawChihLiou
Copy link
Contributor

I explored HNSW a couple months ago. There are currently two HNSW wasm packages:

@jlarmstrongiv That's awesome! What're your thoughts on HNSW?

@jlarmstrongiv
Copy link

jlarmstrongiv commented Jun 3, 2023

@DawChihLiou I thought it was really neat! Both packages are open-source on github and up to date with hnswlib v0.7.0

Importantly:

Has full support for incremental index construction and updating the elements. Has support for element deletions (by marking them in index). Index is picklable.

The only thing I couldn’t figure out with the wasm version yet was importing/exporting the index to/from a file in the browser. Notes to self:

@zxch3n
Copy link
Author

zxch3n commented Jun 3, 2023

You can see the details about HNSW in its paper https://arxiv.org/abs/1603.09320

@DawChihLiou
Copy link
Contributor

@zxch3n thanks for the suggestion. I'll explore HNSW once Voy is stabilized so we'll be able to compare the benchmarks.

A few more resources:

@jlarmstrongiv
Copy link

jlarmstrongiv commented Jun 5, 2023

For high-dimensional data, kd-tree uses O(n) time to do the search. And knowing something about the (Euclidian) distance in one dimension says very little about the distance in the full space.

Ahh, that answers the question I had when reading the kiddo readme and why their demo and benchmarks only had up to 4D:

Kiddo is ideal for super-fast spatial / geospatial lookups and nearest-neighbour / KNN queries for low-ish numbers of dimensions

The example from #21 has a higher dimensionality of 384, not just 4. Unfortunately, most of my data has a much, much, much higher dimensionality than that. It seems like HNSW is a better fit overall.

What are your thoughts on that @DawChihLiou, since most of the embeddings you have planned have high dimensionality?

@DawChihLiou
Copy link
Contributor

I think HNSW is a better fit and worth exploring too. Currently kiddo is capable of handling higher dimensionality and produce quality results. The vectors in the example are in 768d. I'll work on HNSW once Voy's API is more stabilized.

@jlarmstrongiv
Copy link

I'll work on HNSW once Voy's API is more stabilized

I think HNSW may change some of Voy’s APIs, like the type of SerializedIndex among others. I wish I could contribute to this issue, but I’m much more familiar with React and TypeScript

@DawChihLiou
Copy link
Contributor

DawChihLiou commented Jun 12, 2023

@jlarmstrongiv thanks for your support! I really appreciate it. HNSW will be an internal implementation so it'll most likely to have no effect on the API. SerializedIndex is done to communicate between js and wasm.

@DawChihLiou DawChihLiou self-assigned this Jun 13, 2023
@yeus
Copy link

yeus commented May 10, 2024

hey, I have experimented a little bit with hnsw based vector stores in the browser, check out this link:

https://github.com/xyntopia/vexvault

you can find the relevant part here: https://github.com/Xyntopia/vexvault/blob/vexvault/src/modules/localVectorStore.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

4 participants