Skip to content

Commit

Permalink
Merge pull request #27 from charl/master
Browse files Browse the repository at this point in the history
Minor formatting and typo changes.
  • Loading branch information
Nathaniel Cook committed Nov 16, 2015
2 parents 38f6521 + a08feb9 commit 62ed6f1
Showing 1 changed file with 3 additions and 10 deletions.
13 changes: 3 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,16 +34,9 @@ learning algorithm adept at detecting anomalies in non Gaussian data.

Using standard deviations or 3-sigma algorithms is a very common way of detecting anmalies in metric data.
These techniques assume the data they operate on follows a Gaussian distribution (http://en.wikipedia.org/wiki/Normal_distribution).
Unfortunately much metric data is not Gaussian. Take cpu usage for example. Servers perfoming work from a queue tend to be nearly idle
then spike to 100% cpu usage and then drop back down. Most of the time the server is either near 0% utilization
or 100% following a bimodal distribution, (http://en.wikipedia.org/wiki/Bimodal_distribution). If cpu usage
were gaussian then the cpu would spend most of the time around 50% utilized and rarley 10% or 90% utilized.

The MGOF alorithm assumes no distribution of the data. Rather the way is detects anomalies is to calculate the
distribution for different windows of time. Then compare each of those distributions to the distribution of the window in question
using a simple chi-squared test (http://en.wikipedia.org/wiki/Chi-squared_test).

In summary the MGOF algorithm is well suited for data collected from systems and applications because it doesn't assume a distribution
of the data.
Unfortunately much metric data is not Gaussian. Take cpu usage for example. Servers perfoming work from a queue tend to be nearly idle then spike to 100% cpu usage and then drop back down. Most of the time the server is either near 0% utilization or 100% following a bimodal distribution, (http://en.wikipedia.org/wiki/Bimodal_distribution). If cpu usage were gaussian then the cpu would spend most of the time around 50% utilized and rarley 10% or 90% utilized.

The MGOF alorithm assumes no distribution of the data. Rather the way it detects anomalies is to calculate the distribution for different windows of time. Then compare each of those distributions to the distribution of the window in question using a simple chi-squared test (http://en.wikipedia.org/wiki/Chi-squared_test).

In summary the MGOF algorithm is well suited for data collected from systems and applications because it doesn't assume a distribution of the data.

0 comments on commit 62ed6f1

Please sign in to comment.