Skip to content

Latest commit

 

History

History
65 lines (50 loc) · 2.46 KB

README.md

File metadata and controls

65 lines (50 loc) · 2.46 KB

Isolation Forest Anomaly Detection Overview This script demonstrates how to use the Isolation Forest algorithm to detect anomalies in a dataset. The Isolation Forest is an unsupervised learning algorithm particularly well-suited for identifying outliers or anomalous data points in high-dimensional datasets.

Requirements To run this script, you need the following Python libraries:

numpy matplotlib scikit-learn You can install these packages using pip:

bash pip install numpy matplotlib scikit-learn Code Explanation

  1. Importing Libraries The necessary libraries are imported at the beginning of the script:

python import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import IsolationForest

  1. Generating Sample Data The data used in this example is generated using a normal distribution. The data is then reshaped to fit the input format required by the Isolation Forest model:

python data = np.random.normal(scale=0.5, size=1000) data = data.reshape(-1, 1) 3. Initializing the Isolation Forest Model The Isolation Forest model is initialized with a contamination parameter, which represents the proportion of outliers in the dataset. In this example, the contamination is set to 0.5 (5%):

python clf = IsolationForest(contamination=0.05) 4. Fitting the Model and Predicting Anomalies The model is fitted to the data, and predictions are made to identify anomalies:

python predictions = clf.fit_predict(data) In the predictions, anomalies are labeled as -1, and normal data points are labeled as 1.

  1. Identifying Anomalous Data Points The indices of the anomalous data points are extracted using the following code:

python anomoly_indices = np.where(predictions == -1)[0] 6. Visualizing the Results The data is visualized using matplotlib. The normal data points are plotted as a line, and the anomalies are highlighted using red scatter points:

python plt.plot(range(len(data)), data) plt.scatter(anomoly_indices, data[anomoly_indices], c='r') plt.show()

Ensure you have the required packages installed. Copy the script into a Python file, e.g., isolation_forest_example.py. Run the script: bash

python isolation_forest_example.py The script will generate a plot displaying the dataset with anomalies highlighted in red.

Conclusion This example demonstrates how to use the Isolation Forest algorithm to detect outliers in a dataset. The script is a simple introduction to anomaly detection with Isolation Forest and can be adapted to more complex datasets and use cases.