{{tag>troubleshooting elasticsearch graylog}} ====== Elasticsearch troubleshooting ====== If graylog stops showing message streams it could be an issue with indexes. ===== checksum failed ===== Health status will report **red** in the web UI or with API: # curl -X GET "localhost:9200/_cluster/health?pretty" { "cluster_name" : "elasticsearch", "status" : "red", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 97, "active_shards" : 97, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 1, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 98.9795918367347 } Check the shards status on the server with: # curl -X GET "localhost:9200/_cat/shards?v" index shard prirep state docs store ip node ... myindex_154 2 p UNASSIGNED ... Check the reason for the unassigned index: # curl -XGET localhost:9200/_cluster/allocation/explain?pretty { "index" : "myindex_154", "shard" : 2, "primary" : true, "current_state" : "unassigned", "unassigned_info" : { "reason" : "CLUSTER_RECOVERED", "at" : "2022-01-12T13:15:29.713Z", "last_allocation_status" : "no_valid_shard_copy" }, "can_allocate" : "no_valid_shard_copy", "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt", "node_allocation_decisions" : [ { "node_id" : "yusko5YtSRSs9QOtVjHutg", "node_name" : "yusko5Y", "transport_address" : ":9300", "node_decision" : "no", "store" : { "in_sync" : true, "allocation_id" : "Mr0BOW0RQqKdh-iTqDkjBw", "store_exception" : { "type" : "corrupt_index_exception", "reason" : "failed engine (reason: [merge failed]) (resource=preexisting_corruption)", "caused_by" : { "type" : "i_o_exception", "reason" : "failed engine (reason: [merge failed])", "caused_by" : { "type" : "corrupt_index_exception", "reason" : "checksum failed (hardware problem?) : expected=6fb91e47 actual=9406d419 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/var/lib/elasticsearch/nodes/0/indices/Qlh2XztFSIWu65B7nhmktQ/2/index/_3nhz.cfs\") [slice=_3nhz.fdt]))" } } } } } ] } You could delete the index but some messages will be lost! curl -XDELETE 'localhost:9200/myindex_154' Then you might have to recalculate the index ranges (''System > Indices > index set > Maintenance > Recalculate index ranges'') and/or manually rotate the write index (''System > Indices > index set > Maintenance > Rotate active write index'') ===== Tested on ===== * Graylog 3.3.16 * Debian 9.13 Stretch ===== Unable to write to elasticsearch ===== Trying to GET some data from the ES works but POST does not. The issue could be that the ES was put into read-only mode. It does this if the free space on server starts getting low. In that case you'll get this warning: [4:39 PM] { "error" : { "root_cause" : [ { "type" : "cluster_block_exception", "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];" } ], "type" : "cluster_block_exception", "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];" }, "status" : 403 } If you are running in docker you might see a less useful message like: [2022-04-21T13:26:04,269][INFO ][o.e.c.r.a.DiskThresholdMonitor] [ddbAopn] low disk watermark [85%] exceeded on [ddbAopnMTL2VKLZs_zM6bQ][ddbAopn][/usr/share/elasticsearch/data/nodes/0] free: 117.5gb[12.9%], replicas will not be assigned to this node Free some disk space for example delete an old index (see howto for [[wiki:graylog_troubleshooting#elasticsearch_nodes_disk_usage_above_low_watermark|graylog index management]]) curl -X DELETE -u undefined:$ESPASS "localhost:9200/my-index?pretty" and run this: curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}' You can also change the watermark threshold e.g. curl -X PUT -u undefined:$ESPASS "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d' { "transient": { "cluster.routing.allocation.disk.watermark.low": "100gb", "cluster.routing.allocation.disk.watermark.high": "50gb", "cluster.routing.allocation.disk.watermark.flood_stage": "10gb", "cluster.info.update.interval": "1m" } }' [[https://www.elastic.co/guide/en/elasticsearch/reference/6.2/disk-allocator.html|Check]] the docs for more info. ===== snapshot missing exception ===== If you get an error like: "snapshot_missing_exception" Delete the snapshot repo curl -X DELETE -u undefined:$ESPASS "localhost:9200/_snapshot/es_backup?pretty" and try listing again. ===== index ... is the write index for the datastream ===== When trying to delete the index like curl -XDELETE 'localhost:9200/.ds-.logs-deprecation.elasticsearch-default-2022.11.15-000001?pretty' you get { "error" : { "root_cause" : [ { "type" : "illegal_argument_exception", "reason" : "index [.ds-.logs-deprecation.elasticsearch-default-2022.11.15-000001] is the write index for data stream [.logs-deprecation.elasticsearch-default] and cannot be deleted" } ], "type" : "illegal_argument_exception", "reason" : "index [.ds-.logs-deprecation.elasticsearch-default-2022.11.15-000001] is the write index for data stream [.logs-deprecation.elasticsearch-default] and cannot be deleted" }, "status" : 400 } you need to rollover to the new index, e.g. curl -s -X POST "localhost:9200/.logs-deprecation.elasticsearch-default/_rollover" and run delete command again. ===== Tested on ===== * Debian 10 * Elastic search docker container ver. 6.8.16 ====== See also ====== * [[wiki:graylog_troubleshooting|Graylog troubleshooting]] * [[wiki:elasticsearch_commands|Elasticsearch commands]] * [[wiki:kibana_troubleshooting|Kibana troubleshooting]] ====== References ====== * https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/ * https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-shards.html#reason-unassigned * https://discuss.elastic.co/t/restore-of-elasticsearch-data-fails-with-corruptindexexception-checksum-failed-hardware-problem/261619/3 * https://stackoverflow.com/questions/50609417/elasticsearch-error-cluster-block-exception-forbidden-12-index-read-only-all * https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-rollover-index.html