elasticsearch filtering by the size of a field that is an array

Elasticsearch

Elasticsearch Problem Overview


How can I filter documents that have a field which is an array and has more than N elements?

How can I filter documents that have a field which is an empty array?

Is facets the solution? If so, how?

Elasticsearch Solutions


Solution 1 - Elasticsearch

I would have a look at the script filter. The following filter should return only the documents that have at least 10 elements in the fieldname field, which is an array. Keep in mind that this could be expensive depending on how many documents you have in your index.

"filter" : {
    "script" : {
        "script" : "doc['fieldname'].values.length > 10"
    }
}

Regarding the second question: do you really have an empty array there? Or is it just an array field with no value? You can use the missing filter to get documents which have no value for a specific field:

"filter" : {
    "missing" : { "field" : "user" }
}

Otherwise I guess you need to use scripting again, similarly to what I suggested above, just with a different length as input. If the length is constant I'd put it in the params section so that the script will be cached by elasticsearch and reused, since it's always the same:

"filter" : {
    "script" : {
        "script" : "doc['fieldname'].values.length > params.param1"
        "params" : {
            "param1" : 10
        }
    }
}

Solution 2 - Elasticsearch

javanna's answer is correct on Elasticsearch 1.3.x and earlier, since 1.4 the default scripting module has changed to groovy (was mvel).

To answer OP's question.

On Elasticsearch 1.3.x and earlier, use this code:

"filter" : {
    "script" : {
        "script" : "doc['fieldname'].values.length > 10"
    }
}

On Elasticsearch 1.4.x and later, use this code:

"filter" : {
    "script" : {
        "script" : "doc['fieldname'].values.size() > 10"
    }
}

Additionally, on Elasticsearch 1.4.3 and later, you will need to enable the dynamic scripting as it has been disabled by default, because of security issue. See: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/modules-scripting.html

Solution 3 - Elasticsearch

Still posting to here for who stuck same situation with me. Let's say your data look like this:

{
    "_source": {
        "fieldName" : [
            {
                "f1": "value 11",
                "f2": "value 21"
            },
            {
                "f1": "value 12",
                "f2": "value 22"
            }
        ]
    }
}

Then to filter fieldName with length > 1 for example:

"query": {
    "bool" : {
        "must" : {
            "script" : {
                "script" : {
                    "inline": "doc['fieldName.f1'].values.length > 1",
                    "lang": "painless"
                 }
            }
        }
    }
}

The script syntax is as ES 5.4 documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-query.html.

Solution 4 - Elasticsearch

Imho the correct way of filtering arrays by size using scripting is :

"filter" : {
    "script" : {
        "script" : "_source.fieldName.size() > 1"
    }
}

If I do that as @javanna suggests it throws exception groovy.lang.MissingPropertyException: No such property: length for class: java.lang.String

Solution 5 - Elasticsearch

If you have an array of objects that aren't mapped as nested, keep in mind that Elastic will flatten them into:

attachments: [{size: 123}, {size: 456}] --> attachments.size: [123, 456]

So you want to reference your field as doc['attachments.size'].length, not doc['attachments'].length, which is very counter-intuitive.

Same for doc.containsKey(attachments.size).

The .values part is deprecated and no longer needed.

Solution 6 - Elasticsearch

Based on this: https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/collect/RegularImmutableList.java?r=707f3a276d4ea8e9d53621d137febb00cd2128da

And on lisak's answer here.

There is size() function which returns the length of list:

"filter" : {
    "script" : {
        "script" : "doc['fieldname'].values.size() > 10"
    }
}

Solution 7 - Elasticsearch

Easiest way to do this is to "denormalize" your data so that you have a property that contains the count and a boolean if it exists or not. Then you can just search on those properties.

For example:

{
   "id": 31939,
   "hasAttachments": true,
   "attachmentCount": 2,
   "attachments": [
      {
         "type": "Attachment",
         "name": "txt.txt",
         "mimeType": "text/plain"
      },
      {
         "type": "Inline",
         "name": "jpg.jpg",
         "mimeType": "image/jpeg"
      }
   ]  
}

Solution 8 - Elasticsearch

When you need to find documents which contains some field which size/length should be larger then zero @javanna gave correct answer. I only wanted to add if your field is text field and you want to find documents which contains some text in that field you can't use same query. You will need to do something like this:

GET index/_search 
{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "FIELD_NAME": {
                            "gt": 0
                        }
                    }
                }
            ]
        }
    }
}

This is not exact answer to this question because answer already exists but solution for similar problem which I had so maybe somebody will find it useful.

Solution 9 - Elasticsearch

a suggestion about the second question:

> How can I filter documents that have a field which is an empty array?

{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "fieldname"
        }
      }
    }
  }
}

will return docs with empty fieldname: [] arrays. must (rather than must_not will return the opposite).

Solution 10 - Elasticsearch

For version 7+:

"filter": {
    "script": {
        "script": {
          "source": "doc['fieldName.keyword'].length > 10",
          "lang": "painless"
        }
    }
}

Ref. https://medium.com/@felipegirotti/elasticsearch-filter-field-array-more-than-zero-8d52d067d3a0

Solution 11 - Elasticsearch

Here is what worked for me:

GET index/search {
    "query": {
       "bool": {
          "filter" : {
             "script" : {
                "script" : "doc['FieldName'].length > 10"
                }
            }
         }
       }
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestioneranView Question on Stackoverflow
Solution 1 - ElasticsearchjavannaView Answer on Stackoverflow
Solution 2 - ElasticsearchMicroAleXView Answer on Stackoverflow
Solution 3 - ElasticsearchThangTDView Answer on Stackoverflow
Solution 4 - ElasticsearchlisakView Answer on Stackoverflow
Solution 5 - Elasticsearchuser1306787View Answer on Stackoverflow
Solution 6 - ElasticsearchtrekuView Answer on Stackoverflow
Solution 7 - ElasticsearchsuperlogicalView Answer on Stackoverflow
Solution 8 - ElasticsearchLuka LopusinaView Answer on Stackoverflow
Solution 9 - ElasticsearchItayBView Answer on Stackoverflow
Solution 10 - ElasticsearchrahulbhondaveView Answer on Stackoverflow
Solution 11 - ElasticsearchGuy GView Answer on Stackoverflow