- Hands-On Ensemble Learning with R
- Prabhanjan Narayanachar Tattar
- 273字
- 2025-04-04 16:30:55
Multishapes
The multishapes
dataset from the factoextra
package consists of three variables: x
, y
, and shape
. It consists of different shapes, with each shape forming a cluster. Here, we have two concurrent circle shapes, two parallel rectangles/beds, and one cluster of points at the bottom-right. Outliers are also added across scatterplots. Some brief R code gives a useful display:
> library(factoextra) > data("multishapes") > names(multishapes) [1] "x" "y" "shape" > table(multishapes$shape) 1 2 3 4 5 6 400 400 100 100 50 50 > plot(multishapes[,1],multishapes[,2],col=multishapes[,3])

Figure 2: Finding shapes or groups
This dataset includes a column named shape, as it is a hypothetical dataset. In true clustering problems, we will have neither a cluster group indicator nor the visualization luxury of only two variables. Later in this book, we will see how ensemble clustering techniques help overcome the problems of deciding the number of clusters and the consistency of cluster membership.
Although it doesn't happen that often, frustrations can arise when fine-tuning different parameters, fitting different models, and other tricks all fail to find a useful working model. The culprit of this is often the outlier. A single outlier is known to wreak havoc on an otherwise potentially useful model, and their detection is of paramount importance. Hitherto this, the parametric and nonparametric outlier detections would be a matter of deep expertise. In complex scenarios, the identification would be an insurmountable task. A consensus on an observation being an outlier can be achieved using the ensemble outlier framework. To consider this, the board stiffness dataset will be considered. We will see how an outlier is pinned down in the conclusion of this book.