Isolation Forest Anomaly Detection Overview This script demonstrates how to use the Isolation Forest algorithm to detect anomalies in a dataset. The Isolation Forest is an unsupervised learning algorithm particularly well-suited for identifying outliers or anomalous data points in high-dimensional datasets.
Requirements To run this script, you need the following Python libraries:
numpy matplotlib scikit-learn You can install these packages using pip:
bash pip install numpy matplotlib scikit-learn Code Explanation
- Importing Libraries The necessary libraries are imported at the beginning of the script:
python import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import IsolationForest
- Generating Sample Data The data used in this example is generated using a normal distribution. The data is then reshaped to fit the input format required by the Isolation Forest model:
python data = np.random.normal(scale=0.5, size=1000) data = data.reshape(-1, 1) 3. Initializing the Isolation Forest Model The Isolation Forest model is initialized with a contamination parameter, which represents the proportion of outliers in the dataset. In this example, the contamination is set to 0.5 (5%):
python clf = IsolationForest(contamination=0.05) 4. Fitting the Model and Predicting Anomalies The model is fitted to the data, and predictions are made to identify anomalies:
python predictions = clf.fit_predict(data) In the predictions, anomalies are labeled as -1, and normal data points are labeled as 1.
- Identifying Anomalous Data Points The indices of the anomalous data points are extracted using the following code:
python anomoly_indices = np.where(predictions == -1)[0] 6. Visualizing the Results The data is visualized using matplotlib. The normal data points are plotted as a line, and the anomalies are highlighted using red scatter points:
python plt.plot(range(len(data)), data) plt.scatter(anomoly_indices, data[anomoly_indices], c='r') plt.show()
Ensure you have the required packages installed. Copy the script into a Python file, e.g., isolation_forest_example.py. Run the script: bash
python isolation_forest_example.py The script will generate a plot displaying the dataset with anomalies highlighted in red.
Conclusion This example demonstrates how to use the Isolation Forest algorithm to detect outliers in a dataset. The script is a simple introduction to anomaly detection with Isolation Forest and can be adapted to more complex datasets and use cases.