Table of Contents

, , , ,

Logstash setup

Download and install

wget https://artifacts.opensearch.org/logstash/logstash-oss-with-opensearch-output-plugin-8.9.0-linux-x64.tar.gz
tar xf logstash-oss-with-opensearch-output-plugin-8.9.0-linux-x64.tar.gz
cd logstash-8.9.0/ && bin/logstash-plugin install logstash-output-opensearch

Examples

nginx logs using regular indices

input {
   file {
     path => "/var/log/nginx/nginx_logs*_access.log"
   }
 }

filter {
    grok {
      patterns_dir => "/etc/logstash.d/patterns"
      match => { "message" => "%{NGINX_ACCESS}" }
      remove_field => ["message"]
    }

    useragent {
      source => "user_agent"
      target => "useragent"
      remove_field => "user_agent"
    }
}

output {
  opensearch {
    hosts       => "https://{{ opensearch_host }}:9200"
    user        => "logstash"
    password    => "mypassword"
    index       => "logstash-nginx-access-logs-${HOSTNAME}"
    manage_template => true
    template_overwrite => true
    template => "/etc/logstash.d/nginx_access_index_map.json"
    ssl_certificate_verification => false
  }
}

You can also use multiple file inputs like so:

input {
   file { path => [
     "/var/log/nginx/nginx_logs*_access.log",
     "/var/log/nginx/some_other_web*_access.log"
     ]
   }
}
...

Above we're using a grok pattern named NGINX_ACCESS stored in patterns directory. Example of pattern:

METHOD (OPTIONS|GET|HEAD|POST|PUT|DELETE|TRACE|CONNECT)
CACHED (HIT|MISS|BYPASS|EXPIRED|STALE|UPDATING|REVALIDATED)
NGINX_ACCESS "%{HTTPDATE:time_local}" client=%{IP:client} country=%{GREEDYDATA:country} method=%{METHOD:method} request="%{METHOD} %{URIPATHPARAM:request} HTTP/%{BASE16FLOAT:http_version}" request_length=%{INT:request_length} status=%{INT:status} bytes_sent=%{INT:bytes_sent} body_bytes_sent=%{INT:body_bytes_sent} referer=(%{URI:referer}|-) user_agent=%{GREEDYDATA:user_agent} upstream_addr=(%{HOSTPORT:upstream_addr}|-) upstream_status=(%{INT:upstream_status}|-) request_time=(%{ISO8601_SECOND:request_time}|-) upstream_response_time=(%{ISO8601_SECOND:upstream_response_time}|-) upstream_connect_time=(%{ISO8601_SECOND:upstream_connect_time}|-) upstream_header_time=(%{ISO8601_SECOND:upstream_header_time}|-) upstream_cache_status=(%{CACHED:upstream_cache_status}|-) is_bot=%{INT:is_bot} cookie_mbbauth_present=(%{GREEDYDATA:cookie_mbbauth_present}|-)

For testing the pattern use http://grokconstructor.appspot.com. Copy a few log lines there and adjust above pattern until you get a match.

manage_template ⇒ true creates the pattern in opensearch (elasticsearch) DB automatically, just make sure the logstash user has the permissions to create the indices with specified names. Index pattern is described in template ⇒ “/etc/logstash.d/nginx_access_index_map.json”. This must match the grok pattern above i.e.

{
  "version" : 50001,
  "template" : "logstash-nginx-access*",
  "settings" : {
    "index" : {
      "refresh_interval" : "5s"
    }
  },
  "mappings" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date"
        },
        "client": {
          "type" : "ip"
        },
        "country" : {
          "type" : "keyword"
        },
        "method" : {
          "type" : "keyword"
        },
        "request" : {
          "type" : "keyword"
        },
        "request_length" : {
          "type" : "integer"
        },
        "status" : {
          "type" : "integer"
        },
        "bytes_sent": {
          "type" : "integer"
        },
        "body_bytes_sent": {
          "type" : "integer"
        },
        "referer" : {
          "type" : "keyword"
        },
        "useragent" : {
          "dynamic" : true,
          "properties" : {
            "device" : {
               "properties" : {
                 "name" : {
                   "type" : "keyword"
                 }
               }
            },
            "name" : {
              "type" : "keyword"
            },
            "os" : {
              "properties" : {
                "name" : {
                  "type" : "keyword"
                },
                "version" : {
                  "type" : "keyword"
                },
                "full" : {
                  "type" : "keyword"
                }
              }
            },
            "version" : {
              "type" : "keyword"
            }
          }
        },
        "upstream_addr" : {
          "type" : "keyword"
        },
        "upstream_status" : {
          "type" : "keyword"
        },
        "request_time" : {
          "type" : "float"
        }
      }
  },
  "aliases" : {}
}

${HOSTNAME} is an environment variable which must be defined (via .bashrc or with systemd unit which starts logstash service etc.)

# /etc/systemd/system/logstash.service
[Unit]
Description=Massage various logs and forward them to opensearch/elasticsearch
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/logstash -f /etc/logstash.d/nginx_logs.conf
Environment="HOSTNAME={{ hostname }}"
Restart=on-failure

[Install]
WantedBy=multi-user.target

referer=(%{URI:referer}|-) construct means the referer in this case might be empty (-).

nginx logs using datastreams

This is perhaps a better approach, using datastreams as this will do some automatic rollover to new index.

Change the logstash output part to this

...
output {
  opensearch {
    hosts       => "https://{{ opensearch_host }}:9200"
    user        => "logstash"
    password    => "mypassword"
    ssl_certificate_verification => false
    action      => "create" 
    index       => "whatever-pattern-abc"
  }
}...

index needs to be set to the name of the datastream you defined in Opensearch and add action directive.

In Opensearch you need to create a datastream, but first you need to create a template.

1. Go to Index Management > Templates > Create template

2. Add template name, select type “Data streams” and put in the Time field (@timestamp in this example). Index pattern should match the pattern name for which logstash user has rights to write to and this will be the name of datastream used later.

3. You can add an alias if you want, replicas are set to 0 here to save some space.

4. In field mappings you need to map the fields sent by logstash. Easiest is to c/p the json into json editor from existing index, like the one that would be created by logstash using the regular index (see above).

5. When creating datastream the name must match the pattern from step 2 above, but it doesn't have to be exactly the same (so here it should be “whatever-pattern-abc” to match logstash config). The rest should be autofilled.

Tested on

See also

References