Stepwise regression

The examples we have seen so far all had one independent and one dependent variable. This is used to illustrate the basic concepts of regression analysis. However, real-world scenarios are more complex and there are multiple factors that affect the outcome. As an example, the salary of an employee depends on multiple factors, such as skill sets, the ability to learn new tools and technologies, years of experience, past projects worked on, ability to play multiple roles, and location. As you can imagine, some of the factors contribute more than others in defining the outcome (salary, in this case).

When we do regression analysis on a dataset that contains lots of factors, the model can be accurately built if we select the factors that are more significant than others. Stepwise regression is a method by which the choice or selection of independent variables is automated.

Consider the following regression function:

y= β0 + β1x1+ β2x2 + β3x3 + ... . + βnxn

There are n number of input variables, along with their weights or coefficients. The goal for stepwise regression is to shortlist the variables that are most important for building an accurate model. Stepwise regression can be done with two approaches, which will be covered in the following sections.