Skip to content

Commit

Permalink
Update bayes_expiry.md (#748)
Browse files Browse the repository at this point in the history
* Update bayes_expiry.md

* Rework Bayes expiry global settings description

---------

Co-authored-by: moisseev <[email protected]>
  • Loading branch information
dragoangel and moisseev authored May 18, 2024
1 parent bb86cba commit ccdb08d
Showing 1 changed file with 16 additions and 1 deletion.
17 changes: 16 additions & 1 deletion doc/modules/bayes_expiry.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The `Bayes expiry` module provides intelligent expiration of statistical tokens

## Module configuration

The configuration settings for the `bayes expiry` module should be incorporated into the appropriate `classifier` section, such as the `local.d/classifier-bayes.conf `file. Additionally, as the `Bayes expiry` module necessitates the use of the new statistics schema, it is imperative to enable it within the classifier configuration:
For the `bayes expiry` module, classifier-related configuration settings should be incorporated into the corresponding `classifier` section, such as the `local.d/classifier-bayes.conf` file. Additionally, as the `Bayes expiry` module necessitates the use of the new statistics schema, it is imperative to enable it within the classifier configuration:

```hcl
new_schema = true; # Enabled by default for classifier "bayes" in the stock statistic.conf since 2.0
Expand All @@ -29,6 +29,21 @@ expire = 8640000;
#lazy = true; # Before 2.0
```

To modify the global settings of the `bayes expiry` module, you can configure them in either the `local.d/bayes_expiry.conf` or `override.d/bayes_expiry.conf` file.

The following settings are valid:
- **interval**: time interval in seconds between each run of the expiry step on the controller. Default is `60`.
- **count**: the number of keys to check during each expiry step. The module utilizes a cursor-based iterator to ensure that the next step continues from where the previous one stopped. Default is `1000`. Consider increasing it to a higher value if your Redis instance is overwhelmed with too many persistent keys, suggesting faster learning compared to the module's processing.
- **epsilon_common**: a comparison tolerance used to determine if a token is considered `common`. Tokens with a difference between spam and ham relative frequencies not greater than this value are classified as `common`. Default is `0.01`.
- **common_ttl**: the initial TTL for `common` tokens. Default is `10 * 86400`, equivalent to 10 days.
- **significant_factor**: the threshold for token significance. Tokens with a relative frequency greater than this value are considered `significant`; otherwise, they are `insignificant`. Defaults is `3.0 / 4.0`, which corresponds to 75%.

Configuration example:
```hcl
interval = 90;
count = 15000;
```

## Principles of operation

The `bayes expiry` module performs an expiry step every minute. During each step, it examines the frequency of approximately 1000 statistical tokens and adjusts their TTLs if needed. The duration of a full iteration varies based on the number of tokens; for example, a full cycle for 10 million tokens takes approximately one week to complete. Once the `bayes expiry` module finishes a full iteration, it starts over again.
Expand Down

0 comments on commit ccdb08d

Please sign in to comment.