Skip to content

seandollinger/Anomaly-Practice-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Isolation Forest Anomaly Detection Overview This script demonstrates how to use the Isolation Forest algorithm to detect anomalies in a dataset. The Isolation Forest is an unsupervised learning algorithm particularly well-suited for identifying outliers or anomalous data points in high-dimensional datasets.

Requirements To run this script, you need the following Python libraries:

numpy matplotlib scikit-learn You can install these packages using pip:

bash pip install numpy matplotlib scikit-learn Code Explanation

  1. Importing Libraries The necessary libraries are imported at the beginning of the script:

python import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import IsolationForest

  1. Generating Sample Data The data used in this example is generated using a normal distribution. The data is then reshaped to fit the input format required by the Isolation Forest model:

python data = np.random.normal(scale=0.5, size=1000) data = data.reshape(-1, 1) 3. Initializing the Isolation Forest Model The Isolation Forest model is initialized with a contamination parameter, which represents the proportion of outliers in the dataset. In this example, the contamination is set to 0.5 (5%):

python clf = IsolationForest(contamination=0.05) 4. Fitting the Model and Predicting Anomalies The model is fitted to the data, and predictions are made to identify anomalies:

python predictions = clf.fit_predict(data) In the predictions, anomalies are labeled as -1, and normal data points are labeled as 1.

  1. Identifying Anomalous Data Points The indices of the anomalous data points are extracted using the following code:

python anomoly_indices = np.where(predictions == -1)[0] 6. Visualizing the Results The data is visualized using matplotlib. The normal data points are plotted as a line, and the anomalies are highlighted using red scatter points:

python plt.plot(range(len(data)), data) plt.scatter(anomoly_indices, data[anomoly_indices], c='r') plt.show()

Ensure you have the required packages installed. Copy the script into a Python file, e.g., isolation_forest_example.py. Run the script: bash

python isolation_forest_example.py The script will generate a plot displaying the dataset with anomalies highlighted in red.

Conclusion This example demonstrates how to use the Isolation Forest algorithm to detect outliers in a dataset. The script is a simple introduction to anomaly detection with Isolation Forest and can be adapted to more complex datasets and use cases.

About

Practice ML for Anomaly detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published