{{tag>troubleshooting elasticsearch opensearch graylog}}
====== Elasticsearch/Opensearch troubleshooting ======
If graylog stops showing message streams it could be an issue with indexes.
===== checksum failed =====
Health status will report **red** in the web UI or with API:
# curl -X GET "localhost:9200/_cluster/health?pretty"
{
"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 97,
"active_shards" : 97,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 1,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 98.9795918367347
}
Check the shards status on the server with:
# curl -X GET "localhost:9200/_cat/shards?v"
index shard prirep state docs store ip node
...
myindex_154 2 p UNASSIGNED
...
Check the reason for the unassigned index:
# curl -XGET localhost:9200/_cluster/allocation/explain?pretty
{
"index" : "myindex_154",
"shard" : 2,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2022-01-12T13:15:29.713Z",
"last_allocation_status" : "no_valid_shard_copy"
},
"can_allocate" : "no_valid_shard_copy",
"allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
"node_allocation_decisions" : [
{
"node_id" : "yusko5YtSRSs9QOtVjHutg",
"node_name" : "yusko5Y",
"transport_address" : ":9300",
"node_decision" : "no",
"store" : {
"in_sync" : true,
"allocation_id" : "Mr0BOW0RQqKdh-iTqDkjBw",
"store_exception" : {
"type" : "corrupt_index_exception",
"reason" : "failed engine (reason: [merge failed]) (resource=preexisting_corruption)",
"caused_by" : {
"type" : "i_o_exception",
"reason" : "failed engine (reason: [merge failed])",
"caused_by" : {
"type" : "corrupt_index_exception",
"reason" : "checksum failed (hardware problem?) : expected=6fb91e47 actual=9406d419 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/var/lib/elasticsearch/nodes/0/indices/Qlh2XztFSIWu65B7nhmktQ/2/index/_3nhz.cfs\") [slice=_3nhz.fdt]))"
}
}
}
}
}
]
}
You could delete the index but some messages will be lost!
curl -XDELETE 'localhost:9200/myindex_154'
Then you might have to recalculate the index ranges (''System > Indices > index set > Maintenance > Recalculate index ranges'') and/or manually rotate the write index (''System > Indices > index set > Maintenance > Rotate active write index'')
===== Tested on =====
* Graylog 3.3.16
* Debian 9.13 Stretch
===== Unable to write to elasticsearch =====
Trying to GET some data from the ES works but POST does not. The issue could be that the ES was put into read-only mode. It does this if the free space on server starts getting low. In that case you'll get this warning:
[4:39 PM] {
"error" : {
"root_cause" : [
{
"type" : "cluster_block_exception",
"reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
}
],
"type" : "cluster_block_exception",
"reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
},
"status" : 403
}
If you are running in docker you might see a less useful message like:
[2022-04-21T13:26:04,269][INFO ][o.e.c.r.a.DiskThresholdMonitor] [ddbAopn] low disk watermark [85%] exceeded on [ddbAopnMTL2VKLZs_zM6bQ][ddbAopn][/usr/share/elasticsearch/data/nodes/0] free: 117.5gb[12.9%], replicas will not be assigned to this node
Free some disk space for example delete an old index (see howto for [[wiki:graylog_troubleshooting#elasticsearch_nodes_disk_usage_above_low_watermark|graylog index management]])
curl -X DELETE -u undefined:$ESPASS "localhost:9200/my-index?pretty"
and run this:
curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'
You can also change the watermark threshold e.g.
curl -X PUT -u undefined:$ESPASS "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "100gb",
"cluster.routing.allocation.disk.watermark.high": "50gb",
"cluster.routing.allocation.disk.watermark.flood_stage": "10gb",
"cluster.info.update.interval": "1m"
}
}'
[[https://www.elastic.co/guide/en/elasticsearch/reference/6.2/disk-allocator.html|Check]] the docs for more info.
===== snapshot missing exception =====
If you get an error like:
"snapshot_missing_exception"
Delete the snapshot repo
curl -X DELETE -u undefined:$ESPASS "localhost:9200/_snapshot/es_backup?pretty"
and try listing again.
===== index ... is the write index for the datastream =====
When trying to delete the index like
curl -XDELETE 'localhost:9200/.ds-.logs-deprecation.elasticsearch-default-2022.11.15-000001?pretty'
you get
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "index [.ds-.logs-deprecation.elasticsearch-default-2022.11.15-000001] is the write index for data stream [.logs-deprecation.elasticsearch-default] and cannot be deleted"
}
],
"type" : "illegal_argument_exception",
"reason" : "index [.ds-.logs-deprecation.elasticsearch-default-2022.11.15-000001] is the write index for data stream [.logs-deprecation.elasticsearch-default] and cannot be deleted"
},
"status" : 400
}
you need to rollover to the new index, e.g.
curl -s -X POST "localhost:9200/.logs-deprecation.elasticsearch-default/_rollover"
and run delete command again.
===== curl (52) empty reply from server =====
Happened with OpenSearch docker compose installation trying this:
curl -u admin:Antekante_1 -XGET "http://localhost:9200/_cluster/health?pretty"
It needs the certificate file in the command but if you are testing, easiest is just to disable the ssl. Add the following line in ''docker-compose.yml''
- plugins.security.ssl.http.enabled=false
and rerun
docker-compose up -d
===== Tested on =====
* Debian 10
* Elastic search docker container ver. 6.8.16
====== See also ======
* [[wiki:graylog_troubleshooting|Graylog troubleshooting]]
* [[wiki:elasticsearch_commands|Elasticsearch commands]]
* [[wiki:kibana_troubleshooting|Kibana troubleshooting]]
====== References ======
* https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/
* https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-shards.html#reason-unassigned
* https://discuss.elastic.co/t/restore-of-elasticsearch-data-fails-with-corruptindexexception-checksum-failed-hardware-problem/261619/3
* https://stackoverflow.com/questions/50609417/elasticsearch-error-cluster-block-exception-forbidden-12-index-read-only-all
* https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-rollover-index.html