Elasticsearch 2.1: Result window is too large (index.max_result_window)

Elasticsearch

Elasticsearch Problem Overview


We retrieve information from Elasticsearch 2.1 and allow the user to page thru the results. When the user requests a high page number we get the following error message:

> Result window is too large, from + size must be less than or equal > to: [10000] but was [10020]. See the scroll api for a more efficient > way to request large data sets. This limit can be set by changing the > [index.max_result_window] index level parameter

The elastic docu says that this is because of high memory consumption and to use the scrolling api:

> Values higher than can consume significant chunks of heap memory per > search and per shard executing the search. It’s safest to leave this > value as it is an use the scroll api for any deep scrolling https://www.elastic.co/guide/en/elasticsearch/reference/2.x/breaking_21_search_changes.html#_from_size_limits

The thing is that I do not want to retrieve large data sets. I only want to retrieve a slice from the data set which is very high up in the result set. Also the scrolling docu says:

> Scrolling is not intended for real time user requests https://www.elastic.co/guide/en/elasticsearch/reference/2.2/search-request-scroll.html

This leaves me with some questions:

  1. Would the memory consumption really be lower (any if so why) if I use the scrolling api to scroll up to result 10020 (and disregard everything below 10000) instead of doing a "normal" search request for result 10000-10020?

  2. It does not seem that the scrolling API is an option for me but that I have to increase "index.max_result_window". Does anyone have any experience with this?

  3. Are there any other options to solve my problem?

Elasticsearch Solutions


Solution 1 - Elasticsearch

If you need deep pagination, one possible solution is to increase the value max_result_window. You can use curl to do this from your shell command line:

curl -XPUT "http://localhost:9200/my_index/_settings" -H 'Content-Type: application/json' -d '{ "index" : { "max_result_window" : 500000 } }'

I did not notice increased memory usage, for values of ~ 100k.

Solution 2 - Elasticsearch

The right solution would be to use scrolling.
However, if you want to extend the results search returns beyond 10,000 results, you can do it easily with Kibana:

Go to Dev Tools and just post the following to your index (your_index_name), specifing what would be the new max result window

enter image description here

PUT your_index_name/_settings
{ 
  "max_result_window" : 500000 
}

If all goes well, you should see the following success response:

{
  "acknowledged": true
}

Solution 3 - Elasticsearch

The following pages in the elastic documentation talk about deep paging:

https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html https://www.elastic.co/guide/en/elasticsearch/guide/current/_fetch_phase.html

> Depending on the size of your documents, the number of shards, and the > hardware you are using, paging 10,000 to 50,000 results (1,000 to > 5,000 pages) deep should be perfectly doable. But with big-enough from > values, the sorting process can become very heavy indeed, using vast > amounts of CPU, memory, and bandwidth. For this reason, we strongly > advise against deep paging.

Solution 4 - Elasticsearch

Use the Scroll API to get more than 10000 results.

https://stackoverflow.com/questions/31327814/scroll-example-in-elasticsearch-nest-api

I have used it like this:

private static Customer[] GetCustomers(IElasticClient elasticClient)
{
    var customers = new List<Customer>();
    var searchResult = elasticClient.Search<Customer>(s => s.Index(IndexAlias.ForCustomers())
                          .Size(10000).SearchType(SearchType.Scan).Scroll("1m"));

    do
    {
        var result = searchResult;
        searchResult = elasticClient.Scroll<Customer>("1m", result.ScrollId);
        customers.AddRange(searchResult.Documents);
    } while (searchResult.IsValid && searchResult.Documents.Any());

    return customers.ToArray();
}

Solution 5 - Elasticsearch

If you want more than 10000 results then in all the data nodes the memory usage will be very high because it has to return more results in each query request. Then if you have more data and more shards then merging those results will be inefficient. Also es cache the filter context, hence again more memory. You have to trial and error how much exactly you are taking. If you are getting many requests in small window you should do multiple query for more than 10k and merge it by urself in the code, which is supposed to take less application memory then if you increase the window size.

Solution 6 - Elasticsearch

  1. It does not seem that the scrolling API is an option for me but that I have to increase "index.max_result_window". Does anyone have any experience with this?

--> You can define this value in index templates , es template will be applicable for new indexes only ,so you either have to delete old indexes after creating template or wait for new data to be ingested in elasticsearch .

{ "order": 1, "template": "index_template*", "settings": { "index.number_of_replicas": "0", "index.number_of_shards": "1", "index.max_result_window": 2147483647 },

Solution 7 - Elasticsearch

In my case it looks like reducing the results via the from & size prefixes to the query will remove the error as we don't need all the results:

GET widgets_development/_search
{
  "from" : 0, 
  "size": 5,
  "query": {
    "bool": {}
  },
  "sort": {
    "col_one": "asc"
  }
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRonaldView Question on Stackoverflow
Solution 1 - ElasticsearchAndrey MorozovView Answer on Stackoverflow
Solution 2 - ElasticsearchGuy DubrovskiView Answer on Stackoverflow
Solution 3 - ElasticsearchRonaldView Answer on Stackoverflow
Solution 4 - ElasticsearchMorten HolmgaardView Answer on Stackoverflow
Solution 5 - ElasticsearchamritoitView Answer on Stackoverflow
Solution 6 - ElasticsearchSindhuView Answer on Stackoverflow
Solution 7 - ElasticsearchFlimFlam VirView Answer on Stackoverflow