Preparing the data

Well then, what are we waiting for? We have a series of numbers representing each movie review, with their corresponding label, indicating (1) for positive or (0) for negative. This sounds like a classic structured dataset, so why not start feeding it to a network? Well, it's not that simple. Earlier, we mentioned that neural networks have a very specific diet. They are almost exclusively Tensor-vores, and so feeding them a list of integers won't do us much good. Instead, we must represent our dataset as a tensor of n-dimensions before we attempt to pass it on to our network for training. At the moment, you will notice that each of our movie reviews is represented by a separate list of integers. Naturally, each of these lists are of different sizes, as some reviews are smaller than others. Our network, on the other hand, requires the input features to be of the same size. Hence, we have to find a way to pad our reviews so that each of them represents a vector of the same length.