Skip to content

Commit

Permalink
Merge branch 'yzhao062:master' into dep_kpca
Browse files Browse the repository at this point in the history
  • Loading branch information
tam17aki authored Nov 14, 2023
2 parents 302f377 + b95b82a commit bbe2c4d
Show file tree
Hide file tree
Showing 31 changed files with 1,078 additions and 284 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/testing-cron.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements_ci.txt
pip install -r docs/requirements.txt
pip install pytest
pip install coverage
pip install coveralls
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements_ci.txt
pip install -r docs/requirements.txt
pip install pytest
pip install coverage
pip install coveralls
Expand Down
22 changes: 22 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.11"

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py

# We recommend specifying your dependencies to enable reproducible builds:
# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: docs/requirements.txt
37 changes: 0 additions & 37 deletions .travis.yml

This file was deleted.

3 changes: 3 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -177,3 +177,6 @@ v<1.0.8>, <03/08/2023> -- Improve clone compatibility (#471).
v<1.0.8>, <03/08/2023> -- Add QMCD detector (#452).
v<1.0.8>, <03/08/2023> -- Optimized ECDF and drop Statsmodels dependency (#467).
v<1.0.9>, <03/19/2023> -- Hot fix for errors in ECOD and COPOD due to the issue of scipy.
v<1.1.0>, <06/19/2023> -- Further integration of PyThresh.
v<1.1.1>, <07/03/2023> -- Bump up sklearn requirement and some hot fixes.
v<1.1.1>, <10/24/2023> -- Add deep isolation forest (#506)
70 changes: 64 additions & 6 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Python Outlier Detection (PyOD)

-----

**News**: We just released a 45-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_.
**News**: We have a 45-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_.
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.

**For time-series outlier detection**, please use `TODS <https://github.com/datamllab/tods>`_.
Expand All @@ -70,7 +70,7 @@ multivariate data. This exciting yet challenging field is commonly referred as
or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.

PyOD includes more than 40 detection algorithms, from classical LOF (SIGMOD 2000) to
the latest ECOD (TKDE 2022). Since 2017, PyOD has been successfully used in numerous academic researches and
the latest ECOD and DIF (TKDE 2022 and 2023). Since 2017, PyOD has been successfully used in numerous academic researches and
commercial products with more than `10 million downloads <https://pepy.tech/project/pyod>`_.
It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including
`Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_,
Expand Down Expand Up @@ -156,6 +156,7 @@ NeurIPS 2022 paper `ADBench: Anomaly Detection Benchmark Paper <https://www.andr
* `ADBench Benchmark <#adbench-benchmark>`_
* `Model Save & Load <#model-save--load>`_
* `Fast Train with SUOD <#fast-train-with-suod>`_
* `Thresholding Outlier Scores <#thresholding-outlier-scores>`_
* `Implemented Algorithms <#implemented-algorithms>`_
* `Quick Start for Outlier Detection <#quick-start-for-outlier-detection>`_
* `How to Contribute <#how-to-contribute>`_
Expand Down Expand Up @@ -198,9 +199,10 @@ Alternatively, you could clone and run setup.py file:
* numpy>=1.19
* numba>=0.51
* scipy>=1.5.1
* scikit_learn>=0.20.0
* scikit_learn>=0.22.0
* six


**Optional Dependencies (see details below)**\ :

* combo (optional, required for models/combination.py and FeatureBagging)
Expand Down Expand Up @@ -327,7 +329,25 @@ and `SUOD example <https://github.com/yzhao062/pyod/blob/master/examples/suod_e
clf = SUOD(base_estimators=detector_list, n_jobs=2, combination='average',
verbose=False)
----

Thresholding Outlier Scores
^^^^^^^^^^^^^^^^^^^^^^^^^^^

A more data based approach can be taken when setting the contamination level.
By using a thresholding method, guessing an abritrary value can be replaced
with tested techniques for seperating inliers and outliers. Refer to
`PyThresh <https://github.com/KulikDM/pythresh>`_ for
a more in depth look at thresholding.


.. code-block:: python
from pyod.models.knn import KNN
from pyod.models.thresholds import FILTER
# Set the outlier detection and thresholding methods
clf = KNN(contamination=FILTER())
----
Expand All @@ -337,7 +357,7 @@ and `SUOD example <https://github.com/yzhao062/pyod/blob/master/examples/suod_e
Implemented Algorithms
^^^^^^^^^^^^^^^^^^^^^^

PyOD toolkit consists of three major functional groups:
PyOD toolkit consists of four major functional groups:

**(i) Individual Detection Algorithms** :

Expand Down Expand Up @@ -373,6 +393,7 @@ Proximity-Based SOD Subspace Outlier Detection
Proximity-Based ROD Rotation-based Outlier Detection 2020 [#Almardeny2020A]_
Outlier Ensembles IForest Isolation Forest 2008 [#Liu2008Isolation]_
Outlier Ensembles INNE Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles 2018 [#Bandaragoda2018Isolation]_
Outlier Ensembles DIF Deep Isolation Forest for Anomaly Detection 2023 [#Xu2023Deep]_
Outlier Ensembles FB Feature Bagging 2005 [#Lazarevic2005Feature]_
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 [#Zhao2019LSCP]_
Outlier Ensembles XGBOD Extreme Boosting Based Outlier Detection **(Supervised)** 2018 [#Zhao2018XGBOD]_
Expand Down Expand Up @@ -411,8 +432,43 @@ Combination Median Simple combination by taking the median o
Combination majority Vote Simple combination by taking the majority vote of the labels (weights can be used) 2015 [#Aggarwal2015Theoretical]_
=================== ================ ===================================================================================================== ===== ========================================


**(iii) Utility Functions**:
**(iii) Outlier Detection Score Thresholding Methods**:

================================== ================ ================================================================ ====================================================================================================================
Type Abbr Algorithm Documentation
================================== ================ ================================================================ ====================================================================================================================
Kernel-Based AUCP Area Under Curve Percentage `AUCP <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.AUCP>`_
Statistical Moment-Based BOOT Bootstrapping `BOOT <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.BOOT>`_
Normality-Based CHAU Chauvenet's Criterion `CHAU <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.CHAU>`_
Linear Model CLF Trained Linear Classifier `CLF <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.CLF>`_
cluster-Based CLUST Clustering Based `CLUST <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.CLUST>`_
Kernel-Based CPD Change Point Detection `CPD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.CPD>`_
Transformation-Based DECOMP Decomposition `DECOMP <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.DECOMP>`_
Normality-Based DSN Distance Shift from Normal `DSN <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.DSN>`_
Curve-Based EB Elliptical Boundary `EB <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.EB>`_
Kernel-Based FGD Fixed Gradient Descent `FGD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.FGD>`_
Filter-Based FILTER Filtering Based `FILTER <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.FILTER>`_
Curve-Based FWFM Full Width at Full Minimum `FWFM <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.FWFM>`_
Statistical Test-Based GESD Generalized Extreme Studentized Deviate `GESD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.GESD>`_
Filter-Based HIST Histogram Based `HIST <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.HIST>`_
Quantile-Based IQR Inter-Quartile Region `IQR <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.IQR>`_
Statistical Moment-Based KARCH Karcher mean (Riemannian Center of Mass) `KARCH <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.KARCH>`_
Statistical Moment-Based MAD Median Absolute Deviation `MAD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.MAD>`_
Statistical Test-Based MCST Monte Carlo Shapiro Tests `MCST <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.MCST>`_
Ensembles-Based META Meta-model Trained Classifier `META <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.META>`_
Transformation-Based MOLL Friedrichs' Mollifier `MOLL <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.MOLL>`_
Statistical Test-Based MTT Modified Thompson Tau Test `MTT <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.MTT>`_
Linear Model OCSVM One-Class Support Vector Machine `OCSVM <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.OCSVM>`_
Quantile-Based QMCD Quasi-Monte Carlo Discrepancy `QMCD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.QMCD>`_
Linear Model REGR Regression Based `REGR <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.REGR>`_
Neural Networks VAE Variational Autoencoder `VAE <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.VAE>`_
Curve-Based WIND Topological Winding Number `WIND <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.WIND>`_
Transformation-Based YJ Yeo-Johnson Transformation `YJ <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.YJ>`_
Normality-Based ZSCORE Z-score `ZSCORE <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.ZSCORE>`_
================================== ================ ================================================================ ====================================================================================================================


**(iV) Utility Functions**:

=================== ====================== ===================================================================================================================================================== ======================================================================================================================================
Type Name Function Documentation
Expand Down Expand Up @@ -630,6 +686,8 @@ Reference
.. [#Wang2020adVAE] Wang, X., Du, Y., Lin, S., Cui, P., Shen, Y. and Yang, Y., 2019. adVAE: A self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection. *Knowledge-Based Systems*.
.. [#Xu2023Deep] Xu, H., Pang, G., Wang, Y., Wang, Y., 2023. Deep isolation forest for anomaly detection. *IEEE Transactions on Knowledge and Data Engineering*.
.. [#You2017Provable] You, C., Robinson, D.P. and Vidal, R., 2017. Provable self-representation based outlier detection in a union of subspaces. In Proceedings of the IEEE conference on computer vision and pattern recognition.
.. [#Zenati2018Adversarially] Zenati, H., Romain, M., Foo, C.S., Lecouat, B. and Chandrasekhar, V., 2018, November. Adversarially learned anomaly detection. In 2018 IEEE International conference on data mining (ICDM) (pp. 727-736). IEEE.
Expand Down
4 changes: 2 additions & 2 deletions docs/about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ About us
Core Development Team
---------------------

Yue Zhao (Ph.D. Student @ Carnegie Mellon University):
Yue Zhao (Assistant Professor @ USC, Ph.D. @ CMU):

- Initialized the project in 2017
- `Homepage <https://www.andrew.cmu.edu/user/yuezhao2/>`_
- `Homepage <https://viterbi-web.usc.edu/~yzhao010/>`_
- `LinkedIn (Yue Zhao) <https://www.linkedin.com/in/yzhao062/>`_

Zain Nasrullah (Data Scientist at RBC; MSc in Computer Science from University of Toronto):
Expand Down
39 changes: 39 additions & 0 deletions docs/example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,45 @@ please navigate to **"/notebooks/Model Combination.ipynb"**
Combination by AOM ROC:0.9257, precision @ rank n:0.4844
Combination by MOA ROC:0.9263, precision @ rank n:0.4688
Thresholding Example
--------------------


Full example: `threshold_example.py <https://github.com/yzhao062/Pyod/blob/master/examples/threshold_example.py>`_

1. Import models

.. code-block:: python
from pyod.models.knn import KNN # kNN detector
from pyod.models.thresholds import FILTER # Filter thresholder
2. Generate sample data with :func:`pyod.utils.data.generate_data`:

.. code-block:: python
contamination = 0.1 # percentage of outliers
n_train = 200 # number of training points
n_test = 100 # number of testing points
X_train, X_test, y_train, y_test = generate_data(
n_train=n_train, n_test=n_test, contamination=contamination)
3. Initialize a :class:`pyod.models.knn.KNN` detector, fit the model, and make
the prediction.

.. code-block:: python
# train kNN detector and apply FILTER thresholding
clf_name = 'KNN'
clf = KNN(contamination=FILTER())
clf.fit(X_train)
# get the prediction labels and outlier scores of the training data
y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_ # raw outlier scores
.. rubric:: References

.. bibliography::
Expand Down
Loading

0 comments on commit bbe2c4d

Please sign in to comment.