You must be signed in to change notification settings - Fork 5
Analyzers: Kolmogorov Smirnov Test for two samples
Suppose we have two samples. Suppose the first sample X_1,...,X_m of size m has distribution function F(x) and the second sample Y_1,...,Y_n of size n has distribution function G(x) and we want to test if F = G. Suppose F_m(x) and G_n(x) are corresponding empirical distribution functions. Let us define statistic D_nm:
The hypothesis that F = G is rejected at level a, if D_nm > c(a), where value of the c(a) comes from the table below for each level a:
a | 0.10 | 0.05 | 0.025 | 0.01 | 0.005 | 0.001 |
c(a) | 1.22 | 1.36 | 1.48 | 1.63 | 1.73 | 1.95 |
The purpose of the test is to check how much two data sets differ.
An example from (http://www.physics.csbsju.edu/stats/KS-test.html). To test if some drug really works, we compare "control group" results to the "treated group" results, using KS-test. If they don't pass the test, then the difference is significant and drug seems to work.
Two double data sets (they may contain different number of elements).
Boolean value: true, if at given level two data sets seem to come from the same distribution.
Algorithm is very straightforward; it comes from the mathematical definition. Firstly, D_nm should be calculated. To do so, we must find supremum of (F_m(x) - G_n(x)). To do so by definition seems complicated, so the following formulas should be used:
{if image isn't loading, please refer to slide 19 of [2] }
Then it should be multiplicated by the square root of (n * m / (n + m)). Afterwards, comparsion to c(a) should be conducted. Result of the comparison is the end result.
[2] https://compscicenter.ru/media/slides/math_stat_2013_spring/2013_04_10_math_stat_2013_spring.pdf
- Project structure
- Heroku
- Coding Guidelines
- [Java](Coding Guidelines)
- Html
- Логирование (Java)
- Analyzers: [TEMPLATE]
- Общая схема вычисляемых функций
- Линейный коэффициент корреляции
- Уравнение линейной регрессии
- Анализ временных рядов
- Анализ одномерных временных рядов: вычисление тренда, сезонной и шумовой компонент несколькими способами
- Критерий Стьюдента
- Analyzers: Kolmogorov Smirnov Test for two samples
- Анализ главных компонент (Principal Component Analysis, PCA)
- Показатель Ляпунова для временных рядов
- Navigation
- Классы - источники данных
- Добавление нового источника данных
- [Analyzers](Architecture of analize function)
- Analyzers (version 2)
- Пользователи, проекты и файлы
- Java R classic integration
- Renjin (R in JVM)
- R integration architecture
- R scripts
- RConfiguration
- Полезные функции в R
- Angular 2 in JavaScript
- Learning resources, etc.
- Angular 2 приложение проектов
- Точка входа для более быстрой разработки Angular 2 приложения
- Interpolation
- [Charts comparison](displaying graphs control)
- What is Servlet?
- Ace-editor
- Библиотека CoreArray
- Сборка CoreArray
- Сравнение скоростей способов доступа к CoreArray
- Переделка Gdsfmt под java
- Разбор структуры файла из центра биоинформатики