Iris

Iris is probably the most famous classification dataset. The great statistician Sir R. A. Fisher popularized the dataset, which he used for classifying the three types of iris plants based on length and width measurements of their petals and sepals. Fisher used this dataset to pioneer the invention of the statistical classifier linear discriminant analysis. Since there are three species of iris, we converted this into a binary classification problem, separated the dataset, and created a formula as seen here:

> data("iris")
> ir2 <- iris
> ir2$Species <- ifelse(ir2$Species=="setosa","S","NS")
> ir2$Species <- as.factor(ir2$Species)
> set.seed(12345)
> Train_Test <- sample(c("Train","Test"),nrow(ir2),replace = TRUE,prob=c(0.7,0.3))
> head(Train_Test)
[1] "Test"  "Test"  "Test"  "Test"  "Train" "Train"
> ir2_Train <- ir2[Train_Test=="Train",]
> ir2_TestX <- within(ir2[Train_Test=="Test",],rm(Species))
> ir2_TestY <- ir2[Train_Test=="Test","Species"]
> ir2_Formula <- as.formula("Species~.")