Summary

ML consists of constructing models that are able to convert data into knowledge that can be used to make decisions, some of which are based on complicated mathematical concepts to understand data. Scikit-learn is an open source Python library that is meant to facilitate the process of applying these models to data problems, without much complex math knowledge required.

This chapter explained the key steps of preprocessing your input data, from separating the features from the target, to dealing with messy data and rescaling the values of the data. All these steps should be performed before ping into training a model as they help to improve the training times, as well as the performance of the models.

Next, the different components of the scikit-learn API were explained: the estimator, the predictor, and the transformer. Finally, this chapter covered the difference between supervised and unsupervised learning, and the most popular algorithms of each type of learning were introduced.

With all of this in mind, in the next chapter, we will focus on detailing the process of implementing an unsupervised algorithm for a real-life dataset.