Importing IIS logs into Elasticsearch with Logstash

Logstash is a tool for processing log files that tries to make it easy to import files of varying formats and writing them to external systems (other formats, databases, etc). Logstash is made by the same company as Elasticsearch:

http://logstash.net/

Logstash makes it very easy to process text-based logs and import the data into Elasticsearch so we can easily search them and provide exploratory options using Kibana. This post will describe how to get IIS set up to write logs in the format we want, and how to configure logstash to process them into Elasticsearch.

A Logstash limitation

While logstash is pretty good at processing logs, it does have one limitation that arises when working with IIS. IIS provides the option to have a different log directory for each website, but it will re-use the filename within that directory. That really isn't a problem, and is in fact desired in many ways, however Logstash isn't able to process these files because its implementation can't handle the same filename located in different directories. I can't fully explain the root cause, but there are issues raised in their Jira instance describing the problem:

https://logstash.jira.com/browse/LOGSTASH-1584

…so what this means is that we'll need to configure IIS to use a single log file for the entire server, and then we can use a more advanced tool to parse the logs later on.

Configuring IIS to use a single log directory

Changing IIS to use a single log system-wide is pretty easy, and is achieved using the Logging panel within the IIS Manager for the server itself:

Once you have the logging panel open we'll change the drop down to select server-wide logs, and update the W3C options to include all available fields:

With that done we can switch over and get Logstash installed and configured to run on the server. Note, with this change made we don't need to configure any logs for the individual sites since they'll all use this configuration.

Installing Logstash and configuring it for IIS

Logstash is a java based application so first you need to have a current version of the Java run-time engine installed. Search google for "jre8" and navigate to the current download page for the Java SE Runtime Environment 8 installation for Windows x64. Work through the installation and then set the JAVA_HOME environment variable to the installation path:

Once Java is ready to go, download the latest version of Logstash to the server and extract it to the C:\Program Files\Elastic directory:

Once there we need to make one optional change to Logstash's run script, and set up our config file to import IIS logs. First let's review the run file for Logstash to see if we need to change the memory that should be allocated to it. To do so, open up the logstash.bat script in an editor and review the following:

Once you have the run script set up we need to provide Logstash a config file that will tell it how to process our logs. To do this we need to create a new folder called conf in the logstash directory. Then we'll create a file in it that Logstash will use when it starts up. For IIS we're create one like so:

That file contains the following:

################################################################
## This file was built with the help of this tutorial:
##   https://adammills.wordpress.com/2014/02/21/logstash-and-iis/
##
## The full logstash docs are here: http://logstash.net/docs/1.4.2/
#

## We have IIS configured to use a single log file for all sites
#   because logstash can't handle parsing files in different
#   directories if they have the same name.
#
input {  
  file {
    type => "iis-w3c"
    path => "C:/inetpub/logs/LogFiles/W3SVC*/*.log"
  }

}

filter {  
  ## Ignore the comments that IIS will add to the start of the W3C logs
  #
  if [message] =~ "^#" {
    drop {}
  }

  grok {
    ## Very helpful site for building these statements:
    #   http://grokdebug.herokuapp.com/
    #
    # This is configured to parse out every field of IIS's W3C format when
    #   every field is included in the logs
    #
    match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:serviceName} %{WORD:serverName} %{IP:serverIP} %{WORD:method} %{URIPATH:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:protocolVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:requestHost} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:bytesSent} %{NUMBER:bytesReceived} %{NUMBER:timetaken}"]
  }

  ## Set the Event Timesteamp from the log
  #
  date {
    match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
      timezone => "Etc/UTC"
  }

  ## If the log record has a value for 'bytesSent', then add a new field
  #   to the event that converts it to kilobytes
  #
  if [bytesSent] {
    ruby {
      code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
    }
  }


  ## Do the same conversion for the bytes received value
  #
  if [bytesReceived] {
    ruby {
      code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
    }
  }

  ## Perform some mutations on the records to prep them for Elastic
  #
  mutate {
    ## Convert some fields from strings to integers
    #
    convert => ["bytesSent", "integer"]
    convert => ["bytesReceived", "integer"]
    convert => ["timetaken", "integer"]

    ## Create a new field for the reverse DNS lookup below
    #
    add_field => { "clientHostname" => "%{clientIP}" }

    ## Finally remove the original log_timestamp field since the event will
    #   have the proper date on it
    #
    remove_field => [ "log_timestamp"]
  }


  ## Do a reverse lookup on the client IP to get their hostname.
  #
  dns {
    ## Now that we've copied the clientIP into a new field we can
    #   simply replace it here using a reverse lookup
    #
    action => "replace"
    reverse => ["clientHostname"]
  }

  ## Parse out the user agent
  #
    useragent {
        source=> "useragent"
        prefix=> "browser"
    }

}

## We're only going to output these records to Elasticsearch so configure
#   that.
#
output {  
  elasticsearch {
    embedded => false
    host => "localhost"
    port => 9200
    protocol => "http"
    #
    ## Log records into month-based indexes
    #
    index => "%{type}-%{+YYYY.MM}"
  }

  ## stdout included just for testing
  #
  #stdout {codec => rubydebug}
}

The main sections of that file tell Logstash where to pull the log files from (aka, the input), how to process each line it encounters (aka, the filters) and finally where to write each processed log to (aka, the output). In our case we're pulling in the IIS logs from files, massaging them to include some additional fields and do some DNS lookups, and finally writing them to the local Elasticsearch instance using indexes by month.

Now that we have Logstash configured it's time to get it set up as a windows service. Unlike Elasticsearch itself, Logstash doesn't include a script to install a service, so we'll use another tool to get this done - the Non-Sucking Service Manager. To get started download NSSM and extract its 64-bit executable into the bin folder of the logstash directory:

Once that's there open a powershell window and switch into this bin directory. When there run this command to get NSSM ready to add a service for us:

$ .\nssm.exe install Logstash

When the window opens configure the attributes of the service, primarily what script to run when the service starts up:

When you're satisfied with the service configuration, click the button to install it and that's it. You can then view the services to confirm that it was installed and is running properly:

Note, it's a good idea to make sure your logstash config is working properly before you start, so you can run Logstash in standalone mode from the command line if you wish. Just make sure you update the output section of your config file, and then run it like this:

$ .\logstash.bat agent -f ..\conf

As long as you configured it to output to stdout you should see the log messages as they're processed.

Validating things are working

Using Logstash to process IIS logs can be funny at times because IIS doesn’t immediately write logs out to disk. However, if you kick off enough requests and wait a bit you should see them appear. In this case we should then see Logstash write them to our Elasticsearch instance like so:

If you see that then you're all set!

comments powered by Disqus