If graylog stops showing message streams it could be an issue with indexes.
Health status will report red in the web UI or with API:
# curl -X GET "localhost:9200/_cluster/health?pretty" { "cluster_name" : "elasticsearch", "status" : "red", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 97, "active_shards" : 97, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 1, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 98.9795918367347 }
Check the shards status on the server with:
# curl -X GET "localhost:9200/_cat/shards?v" index shard prirep state docs store ip node ... myindex_154 2 p UNASSIGNED ...
Check the reason for the unassigned index:
# curl -XGET localhost:9200/_cluster/allocation/explain?pretty { "index" : "myindex_154", "shard" : 2, "primary" : true, "current_state" : "unassigned", "unassigned_info" : { "reason" : "CLUSTER_RECOVERED", "at" : "2022-01-12T13:15:29.713Z", "last_allocation_status" : "no_valid_shard_copy" }, "can_allocate" : "no_valid_shard_copy", "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt", "node_allocation_decisions" : [ { "node_id" : "yusko5YtSRSs9QOtVjHutg", "node_name" : "yusko5Y", "transport_address" : "<some_public_ip>:9300", "node_decision" : "no", "store" : { "in_sync" : true, "allocation_id" : "Mr0BOW0RQqKdh-iTqDkjBw", "store_exception" : { "type" : "corrupt_index_exception", "reason" : "failed engine (reason: [merge failed]) (resource=preexisting_corruption)", "caused_by" : { "type" : "i_o_exception", "reason" : "failed engine (reason: [merge failed])", "caused_by" : { "type" : "corrupt_index_exception", "reason" : "checksum failed (hardware problem?) : expected=6fb91e47 actual=9406d419 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/var/lib/elasticsearch/nodes/0/indices/Qlh2XztFSIWu65B7nhmktQ/2/index/_3nhz.cfs\") [slice=_3nhz.fdt]))" } } } } } ] }
You could delete the index but some messages will be lost!
curl -XDELETE 'localhost:9200/myindex_154'
Then you might have to recalculate the index ranges (System > Indices > index set > Maintenance > Recalculate index ranges
) and/or manually rotate the write index (System > Indices > index set > Maintenance > Rotate active write index
)
Trying to GET some data from the ES works but POST does not. The issue could be that the ES was put into read-only mode. It does this if the free space on server starts getting low. In that case you'll get this warning:
[4:39 PM] { "error" : { "root_cause" : [ { "type" : "cluster_block_exception", "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];" } ], "type" : "cluster_block_exception", "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];" }, "status" : 403 }
If you are running in docker you might see a less useful message like:
[2022-04-21T13:26:04,269][INFO ][o.e.c.r.a.DiskThresholdMonitor] [ddbAopn] low disk watermark [85%] exceeded on [ddbAopnMTL2VKLZs_zM6bQ][ddbAopn][/usr/share/elasticsearch/data/nodes/0] free: 117.5gb[12.9%], replicas will not be assigned to this node
Free some disk space for example delete an old index (see howto for graylog index management)
curl -X DELETE -u undefined:$ESPASS "localhost:9200/my-index?pretty"
and run this:
curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'
You can also change the watermark threshold e.g.
curl -X PUT -u undefined:$ESPASS "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d' { "transient": { "cluster.routing.allocation.disk.watermark.low": "100gb", "cluster.routing.allocation.disk.watermark.high": "50gb", "cluster.routing.allocation.disk.watermark.flood_stage": "10gb", "cluster.info.update.interval": "1m" } }'
Check the docs for more info.
If you get an error like:
"snapshot_missing_exception"
Delete the snapshot repo
curl -X DELETE -u undefined:$ESPASS "localhost:9200/_snapshot/es_backup?pretty"
and try listing again.
When trying to delete the index like
curl -XDELETE 'localhost:9200/.ds-.logs-deprecation.elasticsearch-default-2022.11.15-000001?pretty'
you get
{ "error" : { "root_cause" : [ { "type" : "illegal_argument_exception", "reason" : "index [.ds-.logs-deprecation.elasticsearch-default-2022.11.15-000001] is the write index for data stream [.logs-deprecation.elasticsearch-default] and cannot be deleted" } ], "type" : "illegal_argument_exception", "reason" : "index [.ds-.logs-deprecation.elasticsearch-default-2022.11.15-000001] is the write index for data stream [.logs-deprecation.elasticsearch-default] and cannot be deleted" }, "status" : 400 }
you need to rollover to the new index, e.g.
curl -s -X POST "localhost:9200/.logs-deprecation.elasticsearch-default/_rollover"
and run delete command again.
Happened with OpenSearch docker compose installation trying this:
curl -u admin:Antekante_1 -XGET "http://localhost:9200/_cluster/health?pretty"
It needs the certificate file in the command but if you are testing, easiest is just to disable the ssl. Add the following line in docker-compose.yml
- plugins.security.ssl.http.enabled=false
and rerun
docker-compose up -d