Skip to content

earthgecko/crucible

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crucible

x

Crucible is a refinement and feedback suite for algorithm testing. It was designed to be used to create anomaly detection algorithms, but it is very simple and can probably be extended to work with your particular domain. It evolved out of a need to test and rapidly generate standardized feedback for iterating on anomaly detection algorithms.

How it works

Crucible uses its library of timeseries in /data and tests all the algorithms in algorithms.py on all these data. It builds the timeseries datapoint by datapoint, and runs each algorithm at every step, as a way of simulating a production environment. For every anomaly it detects, it draws a red dot on the x value where the anomaly occured. It then saves each graph to disk in /results for you to check, grouped by algorithm-timeseries.

To be as fast as possible, Crucible launches a new process for each timeseries.

If you want to add an algorithm, simply create your algorithm in algorithms.py and add it to settings.py as well so Crucible can find it. Crucible comes loaded with a bunch of stock algorithms from an early Skyline release, but it's designed for you to write your own and test them.

Dependencies

Standard python data science suite - everything is listed in algorithms.py

  1. Install numpy, scipy, pandas, patsy, statsmodels, matplotlib.

  2. You may have trouble with SciPy. If you're on a Mac, try:

  • sudo port install gcc48
  • sudo ln -s /opt/local/bin/gfortran-mp-4.8 /opt/local/bin/gfortran
  • sudo pip install scipy

On Debian, apt-get works well for Numpy and SciPy. On Centos, yum should do the trick. If not, hit the Googles, yo.

Instructions

Just call python src/crucible.py. Then check the /results folder for the results. Happy algorithming!

To add a timeseries:

Create a json array of the form [[timestamp, datapoint], [timestamp], datapoint]]. Put it in the /data folder. Done.

Graphite integration:

There's a small tool to easily grab Graphite data and analyze it. Just call python utils/graphite-grab.py "your_graphite.com/render/?from=-24hour&target=your.metric&format=json" and the script will grab Graphite data, format it, and put it into /data for you.

Contributions

It would be fantastic to have a robust library of canonical timeseries data. Please, if you have a timeseries that you think a good anomaly detection algorithm should be able to handle, share the love and add the timeseries to the suite!

x

About

Forge automated algorithms out of STEEL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%