Description Usage Arguments Details Value Source References Examples

View source: R/cont_ks_distribution.R

Computes the complementary cdf *P(D_{n} ≥ q) \equiv P(D_{n} > q)* at a fixed *q*, *q\in[0, 1]*, for the one-sample two-sided Kolmogorov-Smirnov statistic, *D_{n}*, for a given sample size *n*, when the cdf *F(x)* under the null hypothesis is continuous.

1 | ```
cont_ks_c_cdf(q, n)
``` |

`q` |
numeric value between 0 and 1, at which the complementary cdf |

`n` |
the sample size |

Given a random sample *\{X_{1}, ..., X_{n}\}* of size `n`

with an empirical cdf *F_{n}(x)*, the two-sided Kolmogorov-Smirnov goodness-of-fit statistic is defined as *D_{n} = \sup | F_{n}(x) - F(x) | *, where *F(x)* is the cdf of a prespecified theoretical distribution under the null hypothesis *H_{0}*, that *\{X_{1}, ..., X_{n}\}* comes from *F(x)*.

The function `cont_ks_c_cdf`

implements the FFT-based algorithm proposed by Moscovich and Nadler (2017) to compute the complementary cdf, *P(D_{n} ≥ q)* at a value *q*, when *F(x)* is continuous.
This algorithm ensures a total worst-case run-time of order *O(n^{2}log(n))* which makes it more efficient and numerically stable than the algorithm proposed by Marsaglia et al. (2003).
The latter is used by many existing packages computing the cdf of *D_{n}*, e.g., the function `ks.test`

in the package stats and the function `ks.test`

in the package dgof.
More precisely, in these packages, the exact p-value, *P(D_{n} ≥ q)* is computed only in the case when *q = d_{n}*, where *d_{n}* is the value of the KS test statistic computed based on a user provided sample * \{x_{1}, ..., x_{n} \} *.
Another limitation of the functions `ks.test`

is that the sample size should be less than 100, and the computation time is *O(n^{3})*.
In contrast, the function `cont_ks_c_cdf`

provides results with at least 10 correct digits after the decimal point for sample sizes *n* up to 100000 and computation time of 16 seconds on a machine with an 2.5GHz Intel Core i5 processor with 4GB RAM, running MacOS X Yosemite.
For `n`

> 100000, accurate results can still be computed with similar accuracy, but at a higher computation time.
See Dimitrova, Kaishev, Tan (2020), Appendix C for further details and examples.

Numeric value corresponding to *P(D_{n} ≥ q)*.

Based on the C++ code available at https://github.com/mosco/crossing-probability developed by Moscovich and Nadler (2017). See also Dimitrova, Kaishev, Tan (2020) for more details.

Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan. (2020) "Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous". Journal of Statistical Software, **95**(10): 1-42. doi:10.18637/jss.v095.i10.

Marsaglia G., Tsang WW., Wang J. (2003). "Evaluating Kolmogorov's Distribution". Journal of Statistical Software, **8**(18), 1-4.

Moscovich A., Nadler B. (2017). "Fast Calculation of Boundary Crossing Probabilities for Poisson Processes". Statistics and Probability Letters, **123**, 177-182.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ```
## Compute the value for P(D_{100} >= 0.05)
KSgeneral::cont_ks_c_cdf(0.05, 100)
## Compute P(D_{n} >= q)
## for n = 100, q = 1/500, 2/500, ..., 500/500
## and then plot the corresponding values against q
n <- 100
q <- 1:500/500
plot(q, sapply(q, function(x) KSgeneral::cont_ks_c_cdf(x, n)), type='l')
## Compute P(D_{n} >= q) for n = 141, nq^{2} = 2.1 as shown
## in Table 18 of Dimitrova, Kaishev, Tan (2020)
KSgeneral::cont_ks_c_cdf(sqrt(2.1/141), 141)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.