Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

push out neural sparse two-phase algorithm blog #3173

Merged
merged 8 commits into from
Aug 13, 2024
24 changes: 24 additions & 0 deletions _community_members/congguan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
name: Cong Guan
short_name: congguan
photo: '/assets/media/community/members/congguan.jpg'
title: 'OpenSearch Community Member: Cong Guan'
primary_title: Cong Guan
breadcrumbs:
icon: community
items:
- title: Community
url: /community/index.html
- title: Members
url: /community/members/index.html
- title: 'Cong Guan's Profile'
url: '/community/members/cong-guan.html'
github: conggguan
job_title_and_company: 'Software engineer with the OpenSearch Project'
personas:
- author
permalink: '/community/members/cong-guan.html'
redirect_from: '/authors/congguan/'
---

**Cong Guan** is a software engineer with the OpenSearch Project. His primary work involves developing OpenSearch plugins, RAG systems, and various infrastructure components.
122 changes: 122 additions & 0 deletions _posts/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
---
layout: post
title: Introducing the neural sparse two-phase algorithm
authors:
- zhichaog
- yych
- congguan
date: 2024-08-13
categories:
- technical-posts
has_science_table: false
meta_keywords: neural sparse search, OpenSearch semantic search, neural sparse two phase processor
meta_description: OpenSearch continues to refine neural sparse search and retrieval with a two-phase algorithm that significantly reduces computational load while maintaining the quality of the final ranking.

excerpt: We are excited to announce the release of a new feature in OpenSearch 2.15, a two-phase search pipeline for neural sparse retrieval. In testing, this feature has achieved significant speed improvements.
featured_blog_post: false
featured_image: false # /assets/media/blog-images/__example__image__name.jpg
---

Neural sparse search is a new, efficient method of semantic retrieval introduced in OpenSearch 2.11. Like dense semantic matching, neural sparse search interprets queries using semantic techniques, allowing it to handle terms that traditional lexical search might not understand. While dense semantic models excel at finding semantically similar results, they sometimes miss specific terms, particularly exact matches. Neural sparse search addresses this by introducing sparse representations, which capture both semantic similarities and specific terms. This dual capability enables better explanation and presentation of results through text matching by overcoming the limitations of purely semantic matching and offering a more comprehensive retrieval solution.

Neural sparse search first expands text (either a query or a document) into a larger set of terms, each weighted by its semantic relevance. It then uses Lucene's efficient term vector computation to identify the highest-scoring results. This approach leads to reduced index and memory costs as well as lower computational expenses. For example, while dense encoding using k-NN retrieval increases RAM costs by 7.9% at search time, neural sparse search uses a native Lucene index, avoiding any increase in RAM cost at search time. Moreover, neural sparse search leads to a much smaller index size compared to dense encoding. A document-only model generates an index that is only 10.4% the size of a dense encoding index, and for a bi-encoder, the index size is 7.2% of a dense encoding index.

Given these advantages, we've continued to refine neural sparse retrieval to make it even more efficient. OpenSearch 2.15 introduced a new feature: the two-phase search pipeline. This pipeline splits the neural sparse query terms into two categories: high-scoring tokens that are more relevant to the search and low-scoring tokens that are less relevant. Initially, the algorithm selects documents using the high-scoring tokens and then recalculates the score for those documents by including both high- and low-scoring tokens. This process significantly reduces computational load while maintaining the quality of the final ranking.

## The two-phase algorithm

The two-phase search algorithm operates in two stages:

1. **Initial Phase:** The algorithm uses model inference to quickly select a set of candidate documents using high-scoring tokens from the query. These high-scoring tokens, which constitute a small portion of the total number of tokens, have significant weight---or relevance---allowing for a rapid identification of potentially relevant documents. This process significantly reduces the number of documents that need to be processed, thereby lowering computational costs.

2. **Recalculation Phase:** The algorithm then recalculates the scores for the candidate documents selected in the first phase, this time including both high-scoring and low-scoring tokens from the query. Although low-scoring tokens carry less weight individually, they provide valuable information as part of a comprehensive evaluation, particularly when long-tail terms contribute significantly to the overall score. This allows the algorithm to determine final document scores with greater accuracy.

By processing documents in stages, this approach reduces computational overhead while mainitaining accuracy. The rapid selection in the first phase enhances efficiency, while the more detailed scoring in the second phase ensures accuracy. Even when handling a large number of long-tail terms, the results remain of high quality, with a notable improvement in computational efficiency.

## Performance metrics

We measured the speed and quality of search results using neural sparse search.

### Test environment

Performance was measured on OpenSearch clusters containing 3 m5.4xlarge nodes using [OpenSearch Benchmark](https://opensearch.org/docs/latest/benchmark/). The tests were conducted with 20 simultaneous clients, 50 warmup iterations, and 200 test iterations.

### Test dataset

For search quality, we tested multiple BEIR datasets and measured the relative quality of the results. The following table presents parameter information for these datasets.

| Dataset | BEIR-Name | Queries | Corpus | Rel D/Q |
|----------------|---------------|---------|--------|---------|
| NQ | nq | 3,452 | 2.68M | 1.2 |

Check failure on line 50 in _posts/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm.md#L50

[OpenSearch.Spelling] Error: nq. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: nq. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm.md", "range": {"start": {"line": 50, "column": 20}}}, "severity": "ERROR"}
| HotpotQA | hotpotqa | 7,405 | 5.23M | 2 |

Check failure on line 51 in _posts/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm.md#L51

[OpenSearch.Spelling] Error: hotpotqa. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hotpotqa. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm.md", "range": {"start": {"line": 51, "column": 20}}}, "severity": "ERROR"}
| DBPedia | dbpedia-entity| 400 | 4.63M | 38.2 |

Check failure on line 52 in _posts/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm.md#L52

[Vale.Terms] Use 'DBPedia' instead of 'dbpedia'.
Raw output
{"message": "[Vale.Terms] Use 'DBPedia' instead of 'dbpedia'.", "location": {"path": "_posts/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm.md", "range": {"start": {"line": 52, "column": 20}}}, "severity": "ERROR"}
| FEVER | fever | 6,666 | 5.42M | 1.2 |
| Climate-FEVER | climate-fever | 1,535 | 5.42M | 3 |

### p99 latency

The two-phase algorithm maintains the same inference time cost as the existing neural sparse search algorithm. To provide a clearer comparison of acceleration in the search phase, we excluded the inference step from latency calculations because inference is significantly affected by hardware type. The latency benchmark provided in this post uses raw vector search and excludes any additional impact resulting from inference time.

#### Doc-only mode

In doc-only mode, the two-phase processor can significantly decrease query latency, as shown in the following figure.

![Two-Phase Doc Model P99 Latency](/assets/media/blog-images/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm/two-phase-doc-model-p99-latency.jpg)

The following is the **average latency**:

* Without the two-phase algorithm: **198 ms**
* With the two-phase algorithm: **124 ms**

Depending on the data distribution, the two-phase processor achieved an **increase in speed ranging from 1.22x to 1.78x**.

#### Bi-encoder mode

In bi-encoder mode, the two-phase algorithm can significantly decrease query latency, as shown in the following figure.

![Two-Phase Bi-Encoder P99 Latency](/assets/media/blog-images/2024-08-07-Introducing-a-neural-sparse-two-phase-algorithm/two-phase-bi-encoder-p99-latency.jpg)

The following is the **average latency**:

* Without the two-phase algorithm: **617 ms**
* With the two-phase algorithm: **122 ms**

Depending on the data distribution, the two-phase processor achieved an **increase in speed ranging from 4.15x to 6.87x**.

## Try it out

To try the two-phase processor, follow these steps.

### Step 1: Set up a `neural_sparse_two_phase_processor`

First, configure a `neural_sparse_two_phase_processor` with the default parameters:

```json
PUT /_search/pipeline/<custom-pipeline-name>
{
"request_processors": [
{
"neural_sparse_two_phase_processor": {
"tag": "neural-sparse",
"description": "This processor creates a neural sparse two-phase processor, which can speed up neural sparse queries!"
}
}
]
}
```

### Step 2: Set the default search pipeline to `neural_sparse_two_phase_processor`

Assuming that you already have a neural sparse index, set the index's `index.search.default_pipeline` to the pipeline created in the previous step:

```json
PUT /<your-index-name>/_settings
{
"index.search.default_pipeline" : "<custom-pipeline-name>"
}
```

## Next steps

For more information about the two-phase processor, see [Neural sparse query two-phase processor](https://opensearch.org/docs/latest/search-plugins/search-pipelines/neural-sparse-query-two-phase-processor/).

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/media/community/members/congguan.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading