- Hands-On Ensemble Learning with R
- Prabhanjan Narayanachar Tattar
- 516字
- 2025-04-04 16:30:55
Permutation test
Suppose that we have two processes, A and B, and the variances of these two processes are known to be equal, though unknown. Three independent observations from process A result in yields of 18, 20, and 22, while three independent observations from process B gives yields of 24, 26, and 28. Under the assumption that the yield follows a normal distribution, we would like to test whether the means of processes A and B are the same. This is a suitable case for applying the t-test, since the number of observations is smaller. An application of the t.test
function shows that the two means are different to each other, and this intuitively appears to be the case.
Now, the assumption under the null hypothesis is that the means are equal, and that the variance is unknown and assumed to be equal under the two processes. Consequently, we have a genuine reason to believe that the observations from process A might well have occurred in process B too, and vice versa. We can therefore swap one observation in process B with process A, and recompute the t-test. The process can be repeated for all possible permutations of the two samples. In general, if we have m samples from population 1 and n samples from population 2, we can have

different samples and as many tests. An overall test can be based on such permutation samples and such tests are called permutation tests.
For process A and B observations, we will first apply the t-test and then the permutation test. The t.test
is available in the core stats
package and the permutation t-test is taken from the perm
package:
> library(perm) > x <- c(18,20,22); y <- c(24,26,28) > t.test(x,y,var.equal = TRUE) Two Sample t-test data: x and y t = -3.6742346, df = 4, p-value = 0.02131164 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -10.533915871 -1.466084129 sample estimates: mean of x mean of y 20 26
The smaller p-value suggests that the means of processes A and B are not equal. Consequently, we now apply the permutation test permTS
from the perm
package:
> permTS(x,y) Exact Permutation Test (network algorithm) data: x and y p-value = 0.1 alternative hypothesis: true mean x - mean y is not equal to 0 sample estimates: mean x - mean y -6
The p-value is now at 0.1, which means that the permutation test concludes that the means of the processes are equal. Does this mean that the permutation test will always lead to this conclusion, contradicting the t-test? The answer is given in the next code segment:
> x2 <- c(16,18,20,22); y2 <- c(24,26,28,30) > t.test(x2,y2,var.equal = TRUE) Two Sample t-test data: x2 and y2 t = -4.3817805, df = 6, p-value = 0.004659215 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -12.46742939 -3.53257061 sample estimates: mean of x mean of y 19 27 > permTS(x2,y2) Exact Permutation Test (network algorithm) data: x2 and y2 p-value = 0.02857143 alternative hypothesis: true mean x2 - mean y2 is not equal to 0 sample estimates: mean x2 - mean y2 -8