Skip to content
This repository has been archived by the owner on Mar 18, 2024. It is now read-only.

Latest commit

 

History

History
306 lines (229 loc) · 8.33 KB

README.md

File metadata and controls

306 lines (229 loc) · 8.33 KB

thumper

A poor-man's watcher, an alert system which watches for user-defined events happening in an elk stack.

Runtime configuration

thumper's runtime parameters are specified either on the command-line, in the environment, or in a configuration file. These parameters will include things like the elasticsearch address, api keys for pagerduty, etc...

Alert configuration

Another configuration file (or set of configuration files) is also used, this one being required and containing all data processing which should be performed. A single "alert" encompasses all of the following:

  • A time interval to perform the alert check at
  • A search to be performed against elasticsearch
  • A lua script which processes the search results and decides what actions to take

thumper runs with one or more alerts defined in its configuration, each one operating independant of the others.

The file(s)

Alert configuration is defined in one or more yaml files. Each file contains an array of alerts, like so

# thumper.yml
- name: alert_foo
  # other alert parameters

- name: alert_bar
  # other alert parameters

> thumper -a thumper.yaml

OR they could be defined in separate files within the same directory, with that directory being passed into thumper:

# thumper.d/foo.yml
- name: alert_foo
  # other alert parameters

- name: alert_foo2
  # other alert parameters

and

# thumper.d/bar.yml
- name: alert_bar
  # other alert parameters

- name: alert_bar2
  # other alert parameters

> thumper -a thumper.d

Alert document

A single alert has the following fields in its document (all are required):

- name: something_unique
  interval: "5 * * * *"
  search_index: # see the search subsection
  search_type:  # see the search subsection
  search:       # see the search subsection
  process:      # see the process subsection

name

This is an arbitrary string to identify the alert. It must be unique amongst all of the defined alerts.

interval

A cron-style interval string describing when the search should be run and have the process run on the results.

search

The search which should be performed against elasticsearch. The results are simply held onto for the process step, nothing else is done with them at this point.

search_index: logstash-{{.Format "2006.01.02"}}
search_type: logs
# conveniently, json is valid yaml
search: {
        "query": {
            "query_string": {
                "query":"severity:fatal"
            }
        }
}

See the query dsl docs for more on how to formulate query objects. See the query string docs for more on how to formulate query strings.

In the above examples you can see that the search_index fields uses a go template to generate a date specific index. All three fields (search_index, search_type, and search) can have go templating applied. See the alert context subsection for more information on what fields/methods are available to use.

process

Once the search is performed the results are kept in the context, which is then passed into this step. The process lua script then checks these results against whatever conditions are desired, and may optionally return a list of actions to take. See the alert context section for all available fields in ctx.

process:
    lua_file: ./foo-process.yml

OR

process:
    lua_inline: |
        if ctx.HitCount > 10 then
            return {
                {
                    type = "log",
                    message = "got " .. ctx.HitCount .. " hits",
                }
            }
        end
        -- To indicate no actions, you can return an empty table, nil, or simply
        -- don't return at all
        return {}
actions

The table returned by process is a list of actions which should be taken. Each action has a type and subsequent fields based on that type.

log

Simply logs an INFO message to the console. Useful if you're testing an alert and don't want to set up any real actions yet

{
    type = "log",
    message = "Performing action for alert " .. ctx.Name,
}
http

Create and execute an http command. A warning is logged if anything except a 2xx response code is returned.

Example:

{
    type = "http",
    method = "POST", -- optional, defaults to GET
    url = "http://example.com/some/endpoint?ARG1=foo",
    headers = { -- optional
        "X-FOO" = "something",
    },
    body = "some body for " .. ctx.Name, -- optional
}
pagerduty

Triggers an event in pagerduty. The --pagerduty-key param must be set in the runtime configuration in order to use this action type.

Example:

{
    type = "pagerduty",

    -- optional, defaults to alert's name, used to de-duplicate triggers on
    -- pagerduty's end
    incident_key = "something",

    -- While it's possible to use templated terms in here, it makes the most
    -- sense to have this be a static key and use the details dict for dynamic
    -- data
    description = "A short message about the error",

    -- Optional table of extra contextural data about this alert
    details = {
        foo = ctx.Some.Data,
        bar = "baz",
    },
}
opsgenie

Triggers an alert in OpsGenie. The --opsgenie-key param must be set in the runtime configuration in order to use this action type.

Example:

{
    type = "opsgenie",

    -- required alert message
    message = "what the alert is",

    -- optional, defaults to alert's name, used to de-duplicate triggers on
    -- opsgenie's end
    alias = "something"

    -- see opsgenie api create alert documention for the rest of the valid
    -- optional parameters
    -- https://docs.opsgenie.com/docs/alert-api#section-create-alert
}

Alert context

Through its lifecycle each alert has a context object attached to it. The results from the search step are included in it, as well as other data. Here is a description of the available data in the context, as well as how to use it

NOTE THAT THE CONTEXT IS READ-ONLY IN ALL CASES

Context fields

{
    Name      string // The alert's name
    StartedTS uint64 // The timestamp the alert started at

    // The following are filled in by the search step
    TookMS      uint64  // Time search took to complete, in milliseconds
    HitCount    uint64  // The total number of documents matched
    HitMaxScore float64 // The maximum score of all the documents matched

    // Array of actual documents matched. Keep in mind that unless you manually
    // define a limit in your search query this will be capped at 10 by
    // elasticsearch. Usually HitCount is the important data point anyway
    Hits []{
        Index  string  // The index the hit came from
        Type   string  // The type the document is
        ID     string  // The unique id of the document
        Score  float64 // The document's score relative to the query
        Source object  // The actual document
    }

    // If an aggregation was defined in the search query, the results will be
    // set here
    Aggregations object
}

In lua

Within lua scripts the context is made available as a global variable called ctx. Fields on it are directly addressable using the above names, for example ctx.HitCount and ctx.Hits[1].ID.

In go template

In some areas go templates, provided by the template/text package, are used to add some dynamic capabilities to otherwise static configuration fields. In these places the context is made available as the root object. For example, {{.HitCount}}.

In addition to the fields defined above, the root template object also has some methods on it which may be helpful for working with dates. All methods defined on go's time.Time object are available. For example, to format a string into the logstash index for the current day:

logstash-{{.Format "2006.01.02"}}

And to do the same, but for yesterday:

logstash-{{(.AddDate 0 0 -1).Format "2006.01.02"}}

Go's system of date format strings is a bit unique (aka weird), read more about it here