Estimating the impact of an action

When a business decision consists of choosing between different options, the solution requires estimating the impact of each of them. This chapter shows how machine learning techniques predict future events depending on the options, and how we can measure the accuracy.

Business problems require estimating future events

If we have to choose between different options, we estimate the impact of each alternative and choose the best. In order to illustrate this, the example is a big supermarket that plans to start selling a new item, and the business decision consists of choosing its price.

In order to choose the best price, the company needs to know:

  • The price options
  • The impact of each price option on the item sales
  • The impact of each price option on the sales of other items

The ideal solution is to maximize the impact of the overall revenue for the short and long term. Regarding the item itself, if its price is too high, the company won't sell it, missing a potential profit. On the other hand, if the price is too low, the company will sell different units without making good revenue.

In addition, the price of a new item will have an impact on the sales of similar items. For instance, if the supermarket is selling a new cereal, the sales of all the other cereal products will be affected. If the new price is too low, some of the customers purchasing the other items will want to save money and consequently purchase the new item. In this way, some customers will spend less money and the overall revenue will decrease. Conversely, if the new item is overpriced, the customers might perceive that the other items are too cheap and consequently that their quality is lower.

There are different effects, and one option is to define a minimum and maximum price of the item as a first step, in order to avoid negative effects on the sales of the related items. Then, we can choose the new price, maximizing the revenue of the item itself.

Gathering the data to learn from

Let's assume that we have already defined the minimum and maximum price for the new item. The goal is to use the data in order to discover the information that allows us to maximize the revenue of the item. The revenue depends on:

  • The price of the item
  • The units that will be sold in the next month

In order to maximize the revenue, we want to estimate it depending on the price, and pick the price that maximizes it. If we can estimate the sales volume depending on the price, we can consequently estimate the revenue.

The data displays information of past transactions, which include:

  • Item ID
  • Date
  • Number of units that have been sold during the day
  • Price of the item

There are also other data mapping the item that include some features. In order to simplify the problem, the features are all categorical, so they display categories instead of numbers. Examples of features are as follows:

  • Department
  • Product
  • Brand
  • Other categorical features that define the item

We don't have any data about the sales of the new item, so we need to estimate the customer behavior using data about some similar items. We're assuming that:

  • The future customers' behavior is similar to the past
  • The customers' behavior is similar across similar items
  • The sales of the new item are not affected by the fact that it's new to the market

As we want to estimate the sales volume of the new item, the starting point is the sales volume of similar items. For each item, we extract its transactions in the last month and we compute:

  • The total number of units in the last month
  • The most common price in the last month

In addition, for each item, we have the data defining its features. The data about each item is the starting point to estimate the revenue.

Predicting future outcomes using supervised learning

In our problem, the target of machine learning algorithms is to forecast the sales volume of an item depending on its price. The branch of techniques that learn from the data to forecast a future event is called supervised learning. The starting point of the algorithms is a training set of data that consists of objects whose event is already known. The algorithms identify a relationship between the data that describes the object and the event. Then, they build a model that defines this relationship and use the model for forecasting the event on other objects. The difference between supervised and unsupervised learning is that supervised learning techniques use training with known events whereas unsupervised learning techniques identify patterns that were hidden.

For instance, we have a new item whose price can either be $2, $3, or $4. In order to have the optimal price, we need to estimate the future revenues.

The data displays the sales volume of any item depending on its price and features. The approach for estimating the sales volume of a new item, depending on its price, is to use the sales volume of a defined number (k) of items that are the most similar. For each price, the steps are as follows:

  1. Define which are the k most similar items, given the features and the price of the new item.
  2. Define how to use the data of the similar items to estimate the sales volume of the new item.

In order to identify the most similar items, we have to decide what is similar and how similar it is. In order to do that, we can define a way to measure the similarity between any two items depending on the features and on the price. The similarity can be measured through a distance function, taking into account these features:

  • Price difference
  • Same product
  • Same brand
  • Same department
  • Other similar features

An easy way is to measure the distance as the sum of dissimilarities between the features. For instance, a very simple dissimilarity can be the price difference plus the number of categorical features that display different values. A slightly more advantageous way is to give a different weight to each feature on the basis of its relevancy. For instance, two items that do not belong to the same department are very dissimilar, whereas two items that are of the same product but of different brands are very similar.

After defining the distance function, we want to identify the k most similar objects depending on the price. For each price point, we define an item with the features of the new item and the chosen price. Then, for each item in the supermarket, we compute the distance between the item and the new item. In this way, we can pick the k items whose distance is the lowest.

After having identified the k most similar items, we need to determine how to use this information in order to estimate the new volume. A simple method is to compute the average between the sales volumes of the k items. A more advanced approach is to give more importance to more similar items.

The techniques that estimate a future event depending on the past data are called supervised learning techniques. The algorithm that has been illustrated is the k-nearest neighbors (KNN) algorithm, and it's one of the most basic supervised learning techniques.