- Hands-On Ensemble Learning with R
- Prabhanjan Narayanachar Tattar
- 231字
- 2025-04-04 16:30:55
Waveform
This dataset is an example of a simulation study. Here, we have twenty-one variables as input or independent variables, and a class variable referred to as classes
. The data is generated using the mlbench.waveform
function from the mlbench
R package. For more details, refer to the following link: ftp://ftp.ics.uci.edu/pub/machine-learning-databases. We will simulate 5,000 observations for this dataset. As mentioned earlier, the set.seed
function guarantees reproducibility. Since we are solving binary classification problems, we will reduce the three classes generated by the waveform function to two, and then partition the data into training and testing parts for model building and testing purposes:
> library(mlbench) > set.seed(123) > Waveform <- mlbench.waveform(5000) > table(Waveform$classes) 1 2 3 1687 1718 1595 > Waveform$classes <- ifelse(Waveform$classes!=3,1,2) > Waveform_DF <- data.frame(cbind(Waveform$x,Waveform$classes)) # Data Frame > names(Waveform_DF) <- c(paste0("X",".",1:21),"Classes") > Waveform_DF$Classes <- as.factor(Waveform_DF$Classes) > table(Waveform_DF$Classes) 1 2 3405 1595
The R function mlbench.waveform
creates a new object of the mlbench
class. Since it consists of two sub-parts in x
and classes, we will convert it into data.frame
following some further manipulations. The cbind
function binds the two objects x
(a matrix) and classes (a numeric vector) into a single matrix. The data.frame
function converts the matrix object into a data frame, which is the class desired for the rest of the program.
After partitioning the data, we will create the required formula
for the waveform dataset:
> set.seed(12345) > Train_Test <- sample(c("Train","Test"),nrow(Waveform_DF),replace = TRUE, + prob = c(0.7,0.3)) > head(Train_Test) [1] "Test" "Test" "Test" "Test" "Train" "Train" > Waveform_DF_Train <- Waveform_DF[Train_Test=="Train",] > Waveform_DF_TestX <- within(Waveform_DF[Train_Test=="Test",],rm(Classes)) > Waveform_DF_TestY <- Waveform_DF[Train_Test=="Test","Classes"] > Waveform_DF_Formula <- as.formula("Classes~.")