Skip to content

Commit

Permalink
Merge branch 'yzhao062:development' into development
Browse files Browse the repository at this point in the history
  • Loading branch information
tam17aki authored Nov 14, 2023
2 parents 680dcf3 + d26a1d0 commit 7b1d49c
Show file tree
Hide file tree
Showing 54 changed files with 1,746 additions and 517 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/testing-cron.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest]
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.7", "3.8", "3.9", "3.10"]

steps:
Expand All @@ -28,7 +28,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements_ci.txt
pip install -r docs/requirements.txt
pip install pytest
pip install coverage
pip install coveralls
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest]
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.7", "3.8", "3.9", "3.10"]

steps:
Expand All @@ -33,7 +33,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements_ci.txt
pip install -r docs/requirements.txt
pip install pytest
pip install coverage
pip install coveralls
Expand Down
22 changes: 22 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.11"

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py

# We recommend specifying your dependencies to enable reproducible builds:
# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: docs/requirements.txt
37 changes: 0 additions & 37 deletions .travis.yml

This file was deleted.

7 changes: 7 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -173,3 +173,10 @@ v<1.0.5>, <09/14/2022> -- Add ALAD.
v<1.0.6>, <09/23/2022> -- Update ADBench benchmark for NeruIPS 2022.
v<1.0.6>, <10/23/2022> -- ADD KPCA.
v<1.0.7>, <12/14/2022> -- Enable automatic thresholding by pythresh (#454).
v<1.0.8>, <03/08/2023> -- Improve clone compatibility (#471).
v<1.0.8>, <03/08/2023> -- Add QMCD detector (#452).
v<1.0.8>, <03/08/2023> -- Optimized ECDF and drop Statsmodels dependency (#467).
v<1.0.9>, <03/19/2023> -- Hot fix for errors in ECOD and COPOD due to the issue of scipy.
v<1.1.0>, <06/19/2023> -- Further integration of PyThresh.
v<1.1.1>, <07/03/2023> -- Bump up sklearn requirement and some hot fixes.
v<1.1.1>, <10/24/2023> -- Add deep isolation forest (#506)
74 changes: 67 additions & 7 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Python Outlier Detection (PyOD)

-----

**News**: We just released a 45-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_.
**News**: We have a 45-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-neurips-adbench.pdf>`_.
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 57 benchmark datasets.

**For time-series outlier detection**, please use `TODS <https://github.com/datamllab/tods>`_.
Expand All @@ -70,7 +70,7 @@ multivariate data. This exciting yet challenging field is commonly referred as
or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.

PyOD includes more than 40 detection algorithms, from classical LOF (SIGMOD 2000) to
the latest ECOD (TKDE 2022). Since 2017, PyOD has been successfully used in numerous academic researches and
the latest ECOD and DIF (TKDE 2022 and 2023). Since 2017, PyOD has been successfully used in numerous academic researches and
commercial products with more than `10 million downloads <https://pepy.tech/project/pyod>`_.
It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including
`Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_,
Expand Down Expand Up @@ -156,6 +156,7 @@ NeurIPS 2022 paper `ADBench: Anomaly Detection Benchmark Paper <https://www.andr
* `ADBench Benchmark <#adbench-benchmark>`_
* `Model Save & Load <#model-save--load>`_
* `Fast Train with SUOD <#fast-train-with-suod>`_
* `Thresholding Outlier Scores <#thresholding-outlier-scores>`_
* `Implemented Algorithms <#implemented-algorithms>`_
* `Quick Start for Outlier Detection <#quick-start-for-outlier-detection>`_
* `How to Contribute <#how-to-contribute>`_
Expand Down Expand Up @@ -198,9 +199,9 @@ Alternatively, you could clone and run setup.py file:
* numpy>=1.19
* numba>=0.51
* scipy>=1.5.1
* scikit_learn>=0.20.0
* scikit_learn>=0.22.0
* six
* statsmodels


**Optional Dependencies (see details below)**\ :

Expand Down Expand Up @@ -328,7 +329,25 @@ and `SUOD example <https://github.com/yzhao062/pyod/blob/master/examples/suod_e
clf = SUOD(base_estimators=detector_list, n_jobs=2, combination='average',
verbose=False)
----

Thresholding Outlier Scores
^^^^^^^^^^^^^^^^^^^^^^^^^^^

A more data based approach can be taken when setting the contamination level.
By using a thresholding method, guessing an abritrary value can be replaced
with tested techniques for seperating inliers and outliers. Refer to
`PyThresh <https://github.com/KulikDM/pythresh>`_ for
a more in depth look at thresholding.


.. code-block:: python
from pyod.models.knn import KNN
from pyod.models.thresholds import FILTER
# Set the outlier detection and thresholding methods
clf = KNN(contamination=FILTER())
----
Expand All @@ -338,7 +357,7 @@ and `SUOD example <https://github.com/yzhao062/pyod/blob/master/examples/suod_e
Implemented Algorithms
^^^^^^^^^^^^^^^^^^^^^^

PyOD toolkit consists of three major functional groups:
PyOD toolkit consists of four major functional groups:

**(i) Individual Detection Algorithms** :

Expand All @@ -351,6 +370,7 @@ Probabilistic FastABOD Fast Angle-Based Outlier Detection usin
Probabilistic COPOD COPOD: Copula-Based Outlier Detection 2020 [#Li2020COPOD]_
Probabilistic MAD Median Absolute Deviation (MAD) 1993 [#Iglewicz1993How]_
Probabilistic SOS Stochastic Outlier Selection 2012 [#Janssens2012Stochastic]_
Probabilistic QMCD Quasi-Monte Carlo Discrepancy outlier detection 2001 [#Fang2001Wrap]_
Probabilistic KDE Outlier Detection with Kernel Density Functions 2007 [#Latecki2007Outlier]_
Probabilistic Sampling Rapid distance-based outlier detection via sampling 2013 [#Sugiyama2013Rapid]_
Probabilistic GMM Probabilistic Mixture Modeling for Outlier Analysis [#Aggarwal2015Outlier]_ [Ch.2]
Expand All @@ -373,6 +393,7 @@ Proximity-Based SOD Subspace Outlier Detection
Proximity-Based ROD Rotation-based Outlier Detection 2020 [#Almardeny2020A]_
Outlier Ensembles IForest Isolation Forest 2008 [#Liu2008Isolation]_
Outlier Ensembles INNE Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles 2018 [#Bandaragoda2018Isolation]_
Outlier Ensembles DIF Deep Isolation Forest for Anomaly Detection 2023 [#Xu2023Deep]_
Outlier Ensembles FB Feature Bagging 2005 [#Lazarevic2005Feature]_
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 [#Zhao2019LSCP]_
Outlier Ensembles XGBOD Extreme Boosting Based Outlier Detection **(Supervised)** 2018 [#Zhao2018XGBOD]_
Expand Down Expand Up @@ -411,8 +432,43 @@ Combination Median Simple combination by taking the median o
Combination majority Vote Simple combination by taking the majority vote of the labels (weights can be used) 2015 [#Aggarwal2015Theoretical]_
=================== ================ ===================================================================================================== ===== ========================================


**(iii) Utility Functions**:
**(iii) Outlier Detection Score Thresholding Methods**:

================================== ================ ================================================================ ====================================================================================================================
Type Abbr Algorithm Documentation
================================== ================ ================================================================ ====================================================================================================================
Kernel-Based AUCP Area Under Curve Percentage `AUCP <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.AUCP>`_
Statistical Moment-Based BOOT Bootstrapping `BOOT <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.BOOT>`_
Normality-Based CHAU Chauvenet's Criterion `CHAU <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.CHAU>`_
Linear Model CLF Trained Linear Classifier `CLF <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.CLF>`_
cluster-Based CLUST Clustering Based `CLUST <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.CLUST>`_
Kernel-Based CPD Change Point Detection `CPD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.CPD>`_
Transformation-Based DECOMP Decomposition `DECOMP <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.DECOMP>`_
Normality-Based DSN Distance Shift from Normal `DSN <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.DSN>`_
Curve-Based EB Elliptical Boundary `EB <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.EB>`_
Kernel-Based FGD Fixed Gradient Descent `FGD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.FGD>`_
Filter-Based FILTER Filtering Based `FILTER <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.FILTER>`_
Curve-Based FWFM Full Width at Full Minimum `FWFM <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.FWFM>`_
Statistical Test-Based GESD Generalized Extreme Studentized Deviate `GESD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.GESD>`_
Filter-Based HIST Histogram Based `HIST <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.HIST>`_
Quantile-Based IQR Inter-Quartile Region `IQR <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.IQR>`_
Statistical Moment-Based KARCH Karcher mean (Riemannian Center of Mass) `KARCH <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.KARCH>`_
Statistical Moment-Based MAD Median Absolute Deviation `MAD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.MAD>`_
Statistical Test-Based MCST Monte Carlo Shapiro Tests `MCST <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.MCST>`_
Ensembles-Based META Meta-model Trained Classifier `META <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.META>`_
Transformation-Based MOLL Friedrichs' Mollifier `MOLL <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.MOLL>`_
Statistical Test-Based MTT Modified Thompson Tau Test `MTT <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.MTT>`_
Linear Model OCSVM One-Class Support Vector Machine `OCSVM <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.OCSVM>`_
Quantile-Based QMCD Quasi-Monte Carlo Discrepancy `QMCD <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.QMCD>`_
Linear Model REGR Regression Based `REGR <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.REGR>`_
Neural Networks VAE Variational Autoencoder `VAE <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.VAE>`_
Curve-Based WIND Topological Winding Number `WIND <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.WIND>`_
Transformation-Based YJ Yeo-Johnson Transformation `YJ <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.YJ>`_
Normality-Based ZSCORE Z-score `ZSCORE <https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.thresholds.ZSCORE>`_
================================== ================ ================================================================ ====================================================================================================================


**(iV) Utility Functions**:

=================== ====================== ===================================================================================================================================================== ======================================================================================================================================
Type Name Function Documentation
Expand Down Expand Up @@ -566,6 +622,8 @@ Reference
.. [#Cook1977Detection] Cook, R.D., 1977. Detection of influential observation in linear regression. Technometrics, 19(1), pp.15-18.
.. [#Fang2001Wrap] Fang, K.T. and Ma, C.X., 2001. Wrap-around L2-discrepancy of random sampling, Latin hypercube and uniform designs. Journal of complexity, 17(4), pp.608-624.
.. [#Goldstein2012Histogram] Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. In *KI-2012: Poster and Demo Track*\ , pp.59-63.
.. [#Goodge2022Lunar] Goodge, A., Hooi, B., Ng, S.K. and Ng, W.S., 2022, June. Lunar: Unifying local outlier detection methods via graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence.
Expand Down Expand Up @@ -628,6 +686,8 @@ Reference
.. [#Wang2020adVAE] Wang, X., Du, Y., Lin, S., Cui, P., Shen, Y. and Yang, Y., 2019. adVAE: A self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection. *Knowledge-Based Systems*.
.. [#Xu2023Deep] Xu, H., Pang, G., Wang, Y., Wang, Y., 2023. Deep isolation forest for anomaly detection. *IEEE Transactions on Knowledge and Data Engineering*.
.. [#You2017Provable] You, C., Robinson, D.P. and Vidal, R., 2017. Provable self-representation based outlier detection in a union of subspaces. In Proceedings of the IEEE conference on computer vision and pattern recognition.
.. [#Zenati2018Adversarially] Zenati, H., Romain, M., Foo, C.S., Lecouat, B. and Chandrasekhar, V., 2018, November. Adversarially learned anomaly detection. In 2018 IEEE International conference on data mining (ICDM) (pp. 727-736). IEEE.
Expand Down
53 changes: 0 additions & 53 deletions appveyor.yml

This file was deleted.

Loading

0 comments on commit 7b1d49c

Please sign in to comment.