Working with Data in OpenCV and Python

Now that we have whetted our appetite for machine learning, it is time to delve a little deeper into the different parts that make up a typical machine learning system.

Far too often, you hear someone throw around the phrase, just apply machine learning to your data!, as if that will instantly solve all your problems. You can imagine that the reality of this is much more intricate. Although, I will admit that nowadays it is incredibly easy to build your own machine learning system simply by cutting and pasting just a few lines of code from the internet. However, in order to build a system that is truly powerful and effective, it is essential to have a firm grasp of the underlying concepts and an intimate knowledge of the strengths and weaknesses of each method. So don't worry if you aren't considering yourself a machine learning expert just yet. Good things take time.

Earlier, I described machine learning as a subfield of artificial intelligence. This might be true--mainly for historical reasons--but most often, machine learning is simply about making sense of data. Therefore, it might be more suitable to think of machine learning as a subfield of data science, where we build mathematical models to help understand data.

Hence, this chapter is all about data. We want to learn how data fits in with machine learning, and how to work with data using the tools of our choice: OpenCV and Python.

Specifically, we want to address the following questions:

  • What does a typical machine learning workflow look like; where does data come into play?
  • What are training data and test data; what are they good for?
  • How do I load, store, edit, and visualize data with OpenCV and Python?