Skip to content

Latest commit

 

History

History
118 lines (102 loc) · 4.25 KB

25_Boosting_clauses.asciidoc

File metadata and controls

118 lines (102 loc) · 4.25 KB

Boosting query clauses

Of course, the bool query isn’t restricted to combining simple one-word match queries — it can combine any other query, including other bool queries. It is commonly used to fine-tune the relevance _score for each document by combining the scores from several distinct queries.

Imagine that we want to search for documents about "full text search" but we want to give more weight to documents which also mention "Elasticsearch" or "Lucene". By `more weight'' we mean that documents which mention "Elasticsearch" or "Lucene" will receive a higher relevance `_score than those that don’t, which means that they will appear higher in the list of results.

A simple bool query allows us to write this fairly complex query as follows:

GET /_search
{
    "query": {
        "bool": {
            "must": {
                "match": {
                    "content": { (1)
                        "query":    "full text search",
                        "operator": "and"
                    }
                }
            },
            "should": [ (2)
                { "match": { "content": "Elasticsearch" }},
                { "match": { "content": "Lucene"        }}
            ]
        }
    }
}
  1. The content field must contain all of the words full, text and search.

  2. If the content field also contains Elasticsearch or Lucene then the document will receive a higher _score.

The more should clauses that match, the more relevant the document. So far so good.

But what if we want to give more weight to the docs which contain "Lucene" and even more weight to the docs containing "Elasticsearch"?

We can control the relative weight of any query clause by specifying a boost value, which defaults to 1. A boost value greater than 1 increases the relative weight of that clause. So we could rewrite the above query as follows:

GET /_search
{
    "query": {
        "bool": {
            "must": {
                "match": {  (1)
                    "content": {
                        "query":    "full text search",
                        "operator": "and"
                    }
                }
            },
            "should": [
                { "match": {
                    "content": {
                        "query": "Elasticsearch",
                        "boost": 3 (2)
                    }
                }},
                { "match": {
                    "content": {
                        "query": "Lucene",
                        "boost": 2 (3)
                    }
                }}
            ]
        }
    }
}
  1. This clauses use the default boost of 1.

  2. This clause is the most important, as it has the highest boost.

  3. This clause is more important than the default, but not as important as the "Elasticsearch" clause.

The boost parameter is used to increase the relative weight of a clause (with a boost greater than 1) or decrease the relative weight (with a boost between 0 and 1), but the increase or decrease is not linear. In other words, a boost of 2 does not result in double the _score.

Instead, the new score is _normalized after the boost is applied. Each type of query has its own normalization algorithm and the details are beyond the scope of this book. Suffice to say that a higher boost value results in a higher _score.

If you are implementing your own scoring model not based on TF/IDF and you need more control over the boosting process, you can use the {ref}/query-dsl-function-score-query.html#_boost_factor[boost_factor] parameter in the function_score query to perform simple multiplication without the normalization step.

We will discuss other ways of combining queries in the next chapter: [multi-field-search]. But first, let’s take a look at the other important feature of queries: text analysis.