This project implements the functional highest density region boxplot technique (Hyndman and Shang, 2009).
When you have functional data (i.e. a set of curves), you will want to answer some questions such as:
- What is the mode curve?
- Can I draw a confidence interval?
- Or, is there any outlier curves?
This module allows you to do this:
import othdrplot
algo = othdrplot.ProcessHighDensityRegionAlgorithm(
processSample, reducedComponents, reducedDistribution, [0.8, 0.5]
)
algo.run()
algo.drawOutlierTrajectories()
algo.draw()
The output is the following figure:
In the situation where a multivariate sample is given, the HighDensityRegionAlgorithm allows to plot the regions where the density is associated with a given fraction of the population.
import openturns
# Estimate the distribution
myks = ot.KernelSmoothing()
distribution = myks.build(sample)
# Create the HDR algorithm
algo = othdrplot.HighDensityRegionAlgorithm(sample, distribution)
algo.run()
algo.draw()
The output is the following figure:
The dependencies are:
- Python >= 2.7 or >= 3.3
- numpy >= 0.10
- matplotlib >= 1.5.3
- OpenTURNS >= 1.16
Using the latest python version is prefered!
To install from pip:
pip install othdrplot
To install from github:
git clone [email protected]:mbaudin47/othdrplot.git
cd othdrplot
python setup.py install
A short introduction to the algorithm is provided in the Introduction to high density region plots.
Several examples are available in the doc directory.
- a MatrixPlot example
- a HDRAlgorithm 2D example
- a HDRAlgorithm 3D example
- a ProcessHDR in 2D on the El-Nino data
- a ProcessHDR in 3D on the El-Nino data
- a ProcessHDR on logistic case
- a ProcessHDR on free fall case
- Rob J Hyndman and Han Lin Shang. Rainbow plots , bagplots and boxplots for functional data. Journal of Computational and Graphical Statistics, 19:29-45, 2009
Three classes are provided:
HighDensityRegionAlgorithm
: An algorithm to draw the density of a multivariate sample.ProcessHighDensityRegionAlgorithm
: An algorithm to compute and draw the density of a multivariate process sample.KarhunenLoeveDimensionReductionAlgorithm
: Simplifies the dimension reduction with Karhunen-Loève decomposition.
This is an algorithm to draw the density of a multivariate sample.
- Compute the minimum levelset associated with the sample.
- Plots the required minimum level sets and the outliers.
- Compute and draw the inliers and the outliers, based on the
MatrixPlot
. - The main ingredient is distribution of the sample, which is required.
The basic method to estimate this distribution is kernel smoothing, but any other method can be used, such as a gaussian mixture for example.
This is an algorithm to draw the density of a process sample.
- Plots the trajectories in the physical space.
- Plots the projection of the trajectories in the reduced space, based on the
HighDensityRegionAlgorithm
. - The main ingredients are the dimension reduction method and the method to estimate the density in the reduced space.
In the current implementation, the dimension reduction can be provided on the Karhunen-Loeve decomposition (but other methods can be used). The method to estimate the density in the reduced space can be the kernel smoothing estimator or any other density estimation method (e.g. a Gaussian mixture).