If graylog stops showing message streams it could be an issue with indexes.
Health status will report red in the web UI or with API:
# curl -X GET "localhost:9200/_cluster/health?pretty"
{
"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 97,
"active_shards" : 97,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 1,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 98.9795918367347
}
Check the shards status on the server with:
# curl -X GET "localhost:9200/_cat/shards?v" index shard prirep state docs store ip node ... myindex_154 2 p UNASSIGNED ...
Check the reason for the unassigned index:
# curl -XGET localhost:9200/_cluster/allocation/explain?pretty
{
"index" : "myindex_154",
"shard" : 2,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2022-01-12T13:15:29.713Z",
"last_allocation_status" : "no_valid_shard_copy"
},
"can_allocate" : "no_valid_shard_copy",
"allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
"node_allocation_decisions" : [
{
"node_id" : "yusko5YtSRSs9QOtVjHutg",
"node_name" : "yusko5Y",
"transport_address" : "<some_public_ip>:9300",
"node_decision" : "no",
"store" : {
"in_sync" : true,
"allocation_id" : "Mr0BOW0RQqKdh-iTqDkjBw",
"store_exception" : {
"type" : "corrupt_index_exception",
"reason" : "failed engine (reason: [merge failed]) (resource=preexisting_corruption)",
"caused_by" : {
"type" : "i_o_exception",
"reason" : "failed engine (reason: [merge failed])",
"caused_by" : {
"type" : "corrupt_index_exception",
"reason" : "checksum failed (hardware problem?) : expected=6fb91e47 actual=9406d419 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/var/lib/elasticsearch/nodes/0/indices/Qlh2XztFSIWu65B7nhmktQ/2/index/_3nhz.cfs\") [slice=_3nhz.fdt]))"
}
}
}
}
}
]
}
You could delete the index but some messages will be lost!
curl -XDELETE 'localhost:9200/myindex_154'
Then you might have to recalculate the index ranges (System > Indices > index set > Maintenance > Recalculate index ranges) and/or manually rotate the write index (System > Indices > index set > Maintenance > Rotate active write index)
Trying to GET some data from the ES works but POST does not. The issue could be that the ES was put into read-only mode. It does this if the free space on server starts getting low. In that case you'll get this warning:
[4:39 PM] {
"error" : {
"root_cause" : [
{
"type" : "cluster_block_exception",
"reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
}
],
"type" : "cluster_block_exception",
"reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
},
"status" : 403
}
If you are running in docker you might see a less useful message like:
[2022-04-21T13:26:04,269][INFO ][o.e.c.r.a.DiskThresholdMonitor] [ddbAopn] low disk watermark [85%] exceeded on [ddbAopnMTL2VKLZs_zM6bQ][ddbAopn][/usr/share/elasticsearch/data/nodes/0] free: 117.5gb[12.9%], replicas will not be assigned to this node
Free some disk space for example delete an old index (see howto for graylog index management)
curl -X DELETE -u undefined:$ESPASS "localhost:9200/my-index?pretty"
and run this:
curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'
You can also change the watermark threshold e.g.
curl -X PUT -u undefined:$ESPASS "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "100gb",
"cluster.routing.allocation.disk.watermark.high": "50gb",
"cluster.routing.allocation.disk.watermark.flood_stage": "10gb",
"cluster.info.update.interval": "1m"
}
}'
Check the docs for more info.
If you get an error like:
"snapshot_missing_exception"
Delete the snapshot repo
curl -X DELETE -u undefined:$ESPASS "localhost:9200/_snapshot/es_backup?pretty"
and try listing again.
When trying to delete the index like
curl -XDELETE 'localhost:9200/.ds-.logs-deprecation.elasticsearch-default-2022.11.15-000001?pretty'
you get
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "index [.ds-.logs-deprecation.elasticsearch-default-2022.11.15-000001] is the write index for data stream [.logs-deprecation.elasticsearch-default] and cannot be deleted"
}
],
"type" : "illegal_argument_exception",
"reason" : "index [.ds-.logs-deprecation.elasticsearch-default-2022.11.15-000001] is the write index for data stream [.logs-deprecation.elasticsearch-default] and cannot be deleted"
},
"status" : 400
}
you need to rollover to the new index, e.g.
curl -s -X POST "localhost:9200/.logs-deprecation.elasticsearch-default/_rollover"
and run delete command again.
Happened with OpenSearch docker compose installation trying this:
curl -u admin:Antekante_1 -XGET "http://localhost:9200/_cluster/health?pretty"
It needs the certificate file in the command but if you are testing, easiest is just to disable the ssl. Add the following line in docker-compose.yml
- plugins.security.ssl.http.enabled=false
and rerun
docker-compose up -d