Skip to content

Commit

Permalink
Docs
Browse files Browse the repository at this point in the history
  • Loading branch information
zix99 committed Jan 3, 2025
1 parent 57d1cfc commit c0d3d62
Show file tree
Hide file tree
Showing 10 changed files with 156 additions and 27 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ See [rare.zdyn.net](https://rare.zdyn.net) or the [docs/ folder](docs/) for the
## Features

* Multiple summary formats including: filter (like grep), histogram, bar graphs, tables, heatmaps, reduce, and numerical analysis
* Parse using regex (`-m`) or dissect tokenizer (`-d`)
* File glob expansions (eg `/var/log/*` or `/var/log/*/*.log`) and `-R`
* Optional gzip decompression (with `-z`)
* Following `-f` or re-open following `-F` (use `--poll` to poll, and `--tail` to tail)
Expand Down
2 changes: 1 addition & 1 deletion cmd/helpers/extractorBuilder.go
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ func getExtractorFlags() []cli.Flag {
Name: "ignore-case",
Aliases: []string{"I"},
Category: cliCategoryMatching,
Usage: "Augment regex to be case insensitive",
Usage: "Augment matcher to be case insensitive",
},
&cli.IntFlag{
Name: "batch",
Expand Down
32 changes: 24 additions & 8 deletions docs/cli-help.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ Filter incoming results with search criteria, and output raw matches

**--batch-buffer**="": Specifies how many batches to read-ahead. Impacts memory usage, can improve performance (default: 6)

**--dissect, -d**="": Dissect expression create match groups to summarize on

**--extract, -e**="": Expression that will generate the key to group by. Specify multiple times for multi-dimensions or use {$} helper (default: [{0}])

**--follow, -f**: Read appended data as file grows
Expand All @@ -75,7 +77,7 @@ Filter incoming results with search criteria, and output raw matches

**--ignore, -i**="": Ignore a match given a truthy expression (Can have multiple)

**--ignore-case, -I**: Augment regex to be case insensitive
**--ignore-case, -I**: Augment matcher to be case insensitive

**--line, -l**: Output source file and line number

Expand Down Expand Up @@ -113,6 +115,8 @@ Summarize results by extracting them to a histogram

**--csv, -o**="": Write final results to csv. Use - to output to stdout

**--dissect, -d**="": Dissect expression create match groups to summarize on

**--extra, -x**: Alias for -b --percentage

**--extract, -e**="": Expression that will generate the key to group by. Specify multiple times for multi-dimensions or use {$} helper (default: [{0}])
Expand All @@ -123,7 +127,7 @@ Summarize results by extracting them to a histogram

**--ignore, -i**="": Ignore a match given a truthy expression (Can have multiple)

**--ignore-case, -I**: Augment regex to be case insensitive
**--ignore-case, -I**: Augment matcher to be case insensitive

**--match, -m**="": Regex to create match groups to summarize on (default: .*)

Expand Down Expand Up @@ -167,6 +171,8 @@ Create a 2D heatmap of extracted data

**--delim**="": Character to tabulate on. Use {$} helper by default (default: \x00)

**--dissect, -d**="": Dissect expression create match groups to summarize on

**--extract, -e**="": Expression that will generate the key to group by. Specify multiple times for multi-dimensions or use {$} helper (default: [{0}])

**--follow, -f**: Read appended data as file grows
Expand All @@ -175,7 +181,7 @@ Create a 2D heatmap of extracted data

**--ignore, -i**="": Ignore a match given a truthy expression (Can have multiple)

**--ignore-case, -I**: Augment regex to be case insensitive
**--ignore-case, -I**: Augment matcher to be case insensitive

**--match, -m**="": Regex to create match groups to summarize on (default: .*)

Expand Down Expand Up @@ -223,6 +229,8 @@ Create rows of sparkline graphs

**--delim**="": Character to tabulate on. Use {$} helper by default (default: \x00)

**--dissect, -d**="": Dissect expression create match groups to summarize on

**--extract, -e**="": Expression that will generate the key to group by. Specify multiple times for multi-dimensions or use {$} helper (default: [{0}])

**--follow, -f**: Read appended data as file grows
Expand All @@ -231,7 +239,7 @@ Create rows of sparkline graphs

**--ignore, -i**="": Ignore a match given a truthy expression (Can have multiple)

**--ignore-case, -I**: Augment regex to be case insensitive
**--ignore-case, -I**: Augment matcher to be case insensitive

**--match, -m**="": Regex to create match groups to summarize on (default: .*)

Expand Down Expand Up @@ -273,6 +281,8 @@ Create a bargraph of the given 1 or 2 dimension data

**--csv, -o**="": Write final results to csv. Use - to output to stdout

**--dissect, -d**="": Dissect expression create match groups to summarize on

**--extract, -e**="": Expression that will generate the key to group by. Specify multiple times for multi-dimensions or use {$} helper (default: [{0}])

**--follow, -f**: Read appended data as file grows
Expand All @@ -281,7 +291,7 @@ Create a bargraph of the given 1 or 2 dimension data

**--ignore, -i**="": Ignore a match given a truthy expression (Can have multiple)

**--ignore-case, -I**: Augment regex to be case insensitive
**--ignore-case, -I**: Augment matcher to be case insensitive

**--match, -m**="": Regex to create match groups to summarize on (default: .*)

Expand Down Expand Up @@ -317,6 +327,8 @@ Numerical analysis on a set of filtered data

**--batch-buffer**="": Specifies how many batches to read-ahead. Impacts memory usage, can improve performance (default: 6)

**--dissect, -d**="": Dissect expression create match groups to summarize on

**--extra, -x**: Displays extra analysis on the data (Requires more memory and cpu)

**--extract, -e**="": Expression that will generate the key to group by. Specify multiple times for multi-dimensions or use {$} helper (default: [{0}])
Expand All @@ -327,7 +339,7 @@ Numerical analysis on a set of filtered data

**--ignore, -i**="": Ignore a match given a truthy expression (Can have multiple)

**--ignore-case, -I**: Augment regex to be case insensitive
**--ignore-case, -I**: Augment matcher to be case insensitive

**--match, -m**="": Regex to create match groups to summarize on (default: .*)

Expand Down Expand Up @@ -367,6 +379,8 @@ Create a 2D summarizing table of extracted data

**--delim**="": Character to tabulate on. Use {$} helper by default (default: \x00)

**--dissect, -d**="": Dissect expression create match groups to summarize on

**--extra, -x**: Display row and column totals

**--extract, -e**="": Expression that will generate the key to group by. Specify multiple times for multi-dimensions or use {$} helper (default: [{0}])
Expand All @@ -377,7 +391,7 @@ Create a 2D summarizing table of extracted data

**--ignore, -i**="": Ignore a match given a truthy expression (Can have multiple)

**--ignore-case, -I**: Augment regex to be case insensitive
**--ignore-case, -I**: Augment matcher to be case insensitive

**--match, -m**="": Regex to create match groups to summarize on (default: .*)

Expand Down Expand Up @@ -421,6 +435,8 @@ Aggregate the results of a query based on an expression, pulling customized summ

**--csv, -o**="": Write final results to csv. Use - to output to stdout

**--dissect, -d**="": Dissect expression create match groups to summarize on

**--extract, -e**="": Expression that will generate the key to group by. Specify multiple times for multi-dimensions or use {$} helper (default: [{@}])

**--follow, -f**: Read appended data as file grows
Expand All @@ -431,7 +447,7 @@ Aggregate the results of a query based on an expression, pulling customized summ

**--ignore, -i**="": Ignore a match given a truthy expression (Can have multiple)

**--ignore-case, -I**: Augment regex to be case insensitive
**--ignore-case, -I**: Augment matcher to be case insensitive

**--initial**="": Specify the default initial value for any accumulators that don't specify (default: 0)

Expand Down
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Supports various CLI-based graphing and metric formats (filter (grep-like), hist
## Features

* Multiple summary formats including: filter (like grep), histogram, bar graphs, tables, heatmaps, sparklines, reduce, and numerical analysis
* Parse using regex (`-m`) or dissect tokenizer (`-d`)
* File glob expansions (eg `/var/log/*` or `/var/log/*/*.log`) and `-R`
* Optional gzip decompression (with `-z`)
* Following `-f` or re-open following `-F` (use `--poll` to poll, and `--tail` to tail)
Expand Down
68 changes: 68 additions & 0 deletions docs/usage/dissect.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Dissect Syntax

*Dissect* is a simple token-based search algorithm, and can
be up to 10x faster than regex (and 40% faster than PCRE).

It works by searching for for constant delimiters in a string
and extracting the text between the tokens as named keys.

*rare* implements a subset of the full dissect algorithm.

**Syntax Example:**
```
prefix %{name} : %{value} - %{?ignored}
```

## Syntax

- Anything in a `%{}` is a variable token.
- A blank token, or a token that starts with `?` is skipped. eg `%{}` or `%{?skipped}`
- Tokens are extracted by both name and index (in the order they appear).
- Index `{0}` is the full match, including the delimiters
- Patterns don't need to match the entire line

## Examples

### Simple

```
prefix %{name} : %{value}
```

Will match:
```
prefix bob : 123
```

And will extract two keys:
```
name=bob
value=123
```

### Nginx Logs

As a simple example, to parse nginx logs that look like:

```
104.238.185.46 - - [19/Aug/2019:02:26:25 +0000] "GET / HTTP/1.1" 200 546 "-" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/98 Safari/537.4 (StatusCake)"
```

The following dissect expression can be used:

```
%{ip} - - [%{timestamp}] "%{verb} %{path} HTTP/%{?http-version}" %{status} %{size} "-" "%{useragent}"
```

Which, as json, will return:
```json
{
"timestamp": "12/Dec/2019:17:54:13 +0000",
"verb": "POST",
"path": "/temtel.php",
"status": 404,
"size": 571,
"useragent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36",
"ip": "203.113.174.104"
}
```
10 changes: 5 additions & 5 deletions docs/usage/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,10 +183,10 @@ Matched: 1,035,666 / 1,035,666 (R: 8; C: 61)
**NOTE:** For stacking (`-s`), the results will be color-coded (not shown here)

```sh
$ rare bars -z -m "\[(.+?)\].*\" (\d+)" -e "{buckettime {1} year}" -e "{2}" testdata/*
$ rare bars -z -m "\[(.+?)\].*\" (\d+)" -e "{buckettime {1} year}" -e "{2}" -s testdata/*

| 200 | 206 | 301 | 304 | 400 | 404 | 405 | 408
2019 ||||||||||||||||||||||||||||||||||||||| 3,741,444
2020 ||||||||||||||||||||||||||||||||||||||||||||||||| 4,631,884
Matched: 8,373,328 / 8,383,717
0 200 1 206 2 301 3 304 4 400 5 404 6 405 7 408
2019 000000000555555555555555555555555555555 3,742,444
2020 0000000000000000004455555555555555555555555555555 4,631,884
Matched: 8,374,328 / 8,384,811
```
4 changes: 2 additions & 2 deletions docs/usage/expressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The basic syntax structure is as follows:
* Characters can be escaped with `\`, including `\{` or `\n`
* Expressions are surrounded by `{}`.
* An integer in an expression denotes a matched value from the regex (or other input) eg. `{2}`. The entire match will always be `{0}`
* A string in an expression is a special key or a named regex group eg. `{src}` or `{group1}`
* A string in an expression is a special key or a named regex/dissect group eg. `{src}` or `{group1}`
* When an expression has space(s), the first literal will be the name of a helper function.
From there, the logic is nested. eg `{coalesce {4} {3} notfound}`
* Quotes in an argument create a single argument eg. `{coalesce {4} {3} "not found"}`
Expand Down Expand Up @@ -59,7 +59,7 @@ rare histo \
-b access.log
```

The above parses the method `{1}`, url `{2}`, status `{3}`, and response size `{4}` in the regex.
The above parses the method `{1}`, url `{2}`, status `{3}`, and response size `{4}` in the matcher.

It extracts the `<method> <url> <bytesize bucketed to 10k>`. It will ignore `-i` if response size `{4}` is less-than `1024*1024` (1MB).

Expand Down
55 changes: 48 additions & 7 deletions docs/usage/extractor.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,54 @@
The main component of *rare* is the extractor (or matcher). There are
three fundamental concepts around the parser:

* Each line of an input (separated by `\n`) is matched to a regex
* A regex is used to parse a line into a match (and optionally, groups)
* Each line of an input (separated by `\n`) is matched to a matcher
* A matcher is used to parse a line into a match (and optionally, groups)
* An expression (see: [expression](expressions.md)) is used to format an
output from a regex group
* Optionally, one or more ignore filter can be applied to silent matches
output from a matched groups
* Optionally, one or more ignore expressions can be applied to silent matches
that satisfy a truthy-comparison

## Decomposing a Filter
## Matcher Types

If no matcher is specified, by default, the entire line is always matched
and passed-through to the expression-stage.

More than one matcher can **not** be specified at the same time.

### Regex

A regex express is specified with `--match` or `-m`, and follows common
[regex syntax](regexp.md).

When matching a regex, groups and keys are extracted both index and
by-name if specified.

Set ignore-case with `-I` or `--ignore-case`.

**Example:**

```bash
rare filter -m '"(\w{3,4}) ([A-Za-z0-9/.@_-]+)' access.log
```

### Dissect

A dissect expression is specified with `--disect` or `-d`, and follows
[dissect syntax](dissect.md).

Like regex, groups are extracted by both index and name.

Set ignore-case with `-I` or `--ignore-case`.

**Example:**

```bash
rare filter -d 'HTTP/1.1" %{code} ${size}' -e '{code}' access.log
```

## Examples

### Decomposing a Matcher

The most primitive way use rare is to filter lines in an input. We'll
be using an example nginx log for our example.
Expand All @@ -34,7 +74,7 @@ If you want it to only output the matched portion, you can add `-e "{0}"`
Lastly, lets say we want to ignore all paths that equal "/", we could do that by adding
an ignore pattern: `-i {eq {1} /}`

## Histograms
### Histograms

Histograms are like filters, but rather than outputting every match, it will
create an aggregated count based on the extracted expression.
Expand All @@ -48,4 +88,5 @@ rare histogram -m '"(\w{3,4}) ([A-Za-z0-9/.@_-]+)' -e '{1} {2}' -b access.log

## See Also

* [Regular Expressions](regexp.md)
* [Regular Expressions](regexp.md)
* [Examples](examples.md)
9 changes: 5 additions & 4 deletions docs/usage/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ Read more at:

## Extraction (Matching)

Extraction is denoted with `-m` (match) and is the process of reading a line in
a file or set of files and parsing it with a regular expression into the
match-groups denoted by the regex.
Extraction is denoted with `-m` (regex) or `-d` (dissect) and is the process of reading
a line in a file or set of files and parsing it with a regular expression into the
match-groups denoted by the matcher.

If the regex doesn't match, the line is discarded (a non-match)
If the expression doesn't match, the line is discarded (a non-match)

These match groups are then fed into the next stage, the expression.

Expand Down Expand Up @@ -62,6 +62,7 @@ Aggregator types:
* `histogram` will count instances of the extracted key
* `table` will count the key in 2 dimensions
* `heatmap` will generate a 2D visualization using colored blocks to denote value
* `sparkline` will generate a 2D visualization with the results being a sparkline
* `bargraph` will create either a stacked or non-stacked bargraph based on 2 dimensions
* `analyze` will use the key as a numeric value and compute mean/median/mode/stddev/percentiles
* `reduce` allows evaluating data using expressions, and grouping/sorting the output
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ nav:
- JSON: usage/json.md
- Funcs File: usage/funcsfile.md
- Regular Expressions: usage/regexp.md
- Dissect Expressions: usage/dissect.md
- CLI Docs: cli-help.md
- Benchmarks: benchmarks.md
- Contributing: contributing.md
Expand Down

0 comments on commit c0d3d62

Please sign in to comment.