The initializer parameter

When we created the initial values for our weights and biases (that is, model parameters), we used random numbers, but limited them to the values of -0.005 to +0.005. If you go back and review some of the graphs of the cost functions, you see that it took 2,000 epochs before the cost function began to decline. This is because the initial values were not in the right range and it took 2,000 epochs to get to the correct magnitude. Fortunately, we do not have to worry about how to set these parameters in the mxnet library because this parameter controls how the weights and biases are initialized before training.