This is an old revision of the document!

Elasticsearch troubleshooting

If graylog stops showing message streams it could be an issue with indexes.

checksum failed

Health status will report red in the web UI or with API:

# curl -X GET "localhost:9200/_cluster/health?pretty"

{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 97,
  "active_shards" : 97,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 1,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 98.9795918367347
}

Check the shards status on the server with:

# curl -X GET "localhost:9200/_cat/shards?v"

index              shard prirep state         docs   store ip              node
...
myindex_154        2     p      UNASSIGNED                                 
...

Check the reason for the unassigned index:

# curl -XGET localhost:9200/_cluster/allocation/explain?pretty

{
  "index" : "myindex_154",
  "shard" : 2,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "CLUSTER_RECOVERED",
    "at" : "2022-01-12T13:15:29.713Z",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
  "node_allocation_decisions" : [
    {
      "node_id" : "yusko5YtSRSs9QOtVjHutg",
      "node_name" : "yusko5Y",
      "transport_address" : "<some_public_ip>:9300",
      "node_decision" : "no",
      "store" : {
        "in_sync" : true,
        "allocation_id" : "Mr0BOW0RQqKdh-iTqDkjBw",
        "store_exception" : {
          "type" : "corrupt_index_exception",
          "reason" : "failed engine (reason: [merge failed]) (resource=preexisting_corruption)",
          "caused_by" : {
            "type" : "i_o_exception",
            "reason" : "failed engine (reason: [merge failed])",
            "caused_by" : {
              "type" : "corrupt_index_exception",
              "reason" : "checksum failed (hardware problem?) : expected=6fb91e47 actual=9406d419 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/var/lib/elasticsearch/nodes/0/indices/Qlh2XztFSIWu65B7nhmktQ/2/index/_3nhz.cfs\") [slice=_3nhz.fdt]))"
            }
          }
        }
      }
    }
  ]
}

You could delete the index but some messages will be lost!

curl -XDELETE 'localhost:9200/myindex_154'

Then you might have to recalculate the index ranges (System > Indices > index set > Maintenance > Recalculate index ranges) and/or manually rotate the write index (System > Indices > index set > Maintenance > Rotate active write index)

Tested on

Graylog 3.3.16
Debian 9.13 Stretch

antisaWiki

Table of Contents

Elasticsearch troubleshooting

checksum failed

Tested on

See also

References