Remove a field from a Elasticsearch document

Elasticsearch

Elasticsearch Problem Overview


I need to remove a field in all the documents indexed to Elasticsearch. How can I do it?

Elasticsearch Solutions


Solution 1 - Elasticsearch

What @backtrack told is true , but then there is a very convenient way of doing this in Elasticsearch. Elasticsearch will abstract out the internal complexity of the deletion. You need to use update API to achieve this -

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
    "script" : "ctx._source.remove(\"name_of_field\")"
}'

You can find more documentation here.

Note: As of Elastic Search 6 you are required to include a content-type header:

-H 'Content-Type: application/json'

Solution 2 - Elasticsearch

Elasticsearch added update_by_query in 2.3. This experimental interface allows you to do the update against all the documents that match a query.

Internally elasticsearch does a scan/scroll to collect batches of documents and then update them like the bulk update interface. This is faster than doing it manually with your own scan/scroll interface due to not having the overhead of network and serialization. Each record must be loaded into ram, modified and then written.

Yesterday I removed a large field from my ES cluster. I saw sustained throughput of 10,000 records per second during the update_by_query, constrained by CPU rather than IO.

Look into setting conflicts=proceed if the cluster has other update traffic, or the whole job will stop when it hits a ConflictError when one of the records is updated underneath one of the batches.

Similarly setting wait_for_completion=false will cause the update_by_query to run via the tasks interface. Otherwise the job will terminate if the connection is closed.

url:

http://localhost:9200/INDEX/TYPE/_update_by_query?wait_for_completion=false&conflicts=proceed

POST body:

{
  "script": "ctx._source.remove('name_of_field')",
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "name_of_field"
          }
        }
      ]
    }
  }
}

As of Elasticsearch 1.43, inline groovy scripting is disabled by default. You'll need to enable it for an inline script like this to work by adding script.inline: true to your config file.

Or upload the groovy as a script and use the "script": { "file": "scriptname", "lang": "groovy"} format.

Solution 3 - Elasticsearch

You can use _update_by_query

Example 1

index: my_index

field: user.email

POST my_index/_update_by_query?conflicts=proceed
{
    "script" : "ctx._source.user.remove('email')",
    "query" : {
        "exists": { "field": "user.email" }
    }
}

Example 2

index: my_index

field: total_items

POST my_index/_update_by_query?conflicts=proceed
{
    "script" : "ctx._source.remove('total_items')",
    "query" : {
        "exists": { "field": "total_items" }
    }
}

Solution 4 - Elasticsearch

The previous answers did'nt worked for me.

I had to add the keyword "inline":

POST /my_index/_update_by_query
{
  "script": {
    "inline": "ctx._source.remove(\"myfield\")"
  },
  "query" : {
      "exists": { "field": "myfield" }
  }
}

Solution 5 - Elasticsearch

By default it's not possible, because right now Lucene doesn't support that. Basically you can only put or remove whole Lucene documents from Lucene indices.

  1. Get the first version of your doc
  2. remove the field
  3. push this new version of your doc

This answer is valid for version < ES 5.

Solution 6 - Elasticsearch

For those who stick to bulk API, the alternative to achieve deletion on field(s) of document(s) is to provide extra script in the update action payload of a bulk API call.

The command part is the same as described in official documentation :

curl -s -H "Content-Type: application/x-ndjson"  -H "Accept: application/json; indent=4;" \
     --data-binary   '@es_bulk_edit_data.json'  --request POST \
     "http://YOUR_ELASTICSEARCH_HOST:PORT_NUM/OPTIONAL_INDEX/OPTIONAL_TYPE/_bulk?pretty"

In the request body file, you may need to use 2 payloads for the same document, one of them is for creating, updating fields, the other is for deleting fields by script, which may be like this:

// assume you attempt to add one field `artist`, update one field `num_views`,
// and delete one field `useless` in the document with type t1 and ID 123
{"update": {"_type": "t1", "_id": "123"}}
{"doc": {"artist": "new_artist", "num_views": 67}}
{"update": {"_type": "t1", "_id": "123"}}
{"script": {"source": "ctx._source.remove(params.del_field_name)", "lang":"painless", "params":{"del_field_name": "useless"}}}

Note :

  • In bulk API, doc section cannot be placed with script section in the same payload, ElasticSearch seems to refuse to process such payload structure and return error response 400 bad request and reason message would be Validation Failed: 1: can't provide both script and doc;. That is why I separate deletion and all other operations in 2 payloads.

  • these are tested on version 5.6 and 6.6, should also get the same result in latest version (v7.10)

Solution 7 - Elasticsearch

PUT /products/_update/1
{
  "docs" :{
    "price": 12,
    "quantity": 3,
    "in_stock": 6
  }
}

Now if I need to remove "quantity" then:

POST products/_update/1
{
  "script": {
    "source": "ctx._source.remove(\"quantity\")"
  }
}

Solution 8 - Elasticsearch

I would like to add to the previous answers that after deleting the field, the size of the index will not change. will have to create a new index or use _reindex api.

curl -X POST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
 "source": {
   "index": "old-index"
 },
 "dest": {
   "index": "new-index"
}}

'

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJalalView Question on Stackoverflow
Solution 1 - ElasticsearchVineeth MohanView Answer on Stackoverflow
Solution 2 - ElasticsearchspazmView Answer on Stackoverflow
Solution 3 - ElasticsearchThiago FalcaoView Answer on Stackoverflow
Solution 4 - ElasticsearchDavidBuView Answer on Stackoverflow
Solution 5 - ElasticsearchbacktrackView Answer on Stackoverflow
Solution 6 - ElasticsearchHanView Answer on Stackoverflow
Solution 7 - ElasticsearchAkshay KhuleView Answer on Stackoverflow
Solution 8 - ElasticsearchPax ExterminatusView Answer on Stackoverflow