- Applied Supervised Learning with R
- Karthik Ramasubramanian Jojo Moolayil
- 430字
- 2021-06-11 13:22:31
Defining the Problem Statement
If you recollect the data we explored in Chapter 1, R for Advanced Analytics, bank marketing data, we have a dataset that captures the telemarketing campaigns conducted by a bank to attract customers.
A large multinational bank is designing a marketing campaign to achieve its growth target by enticing customers for bank deposits. The campaign has been ineffective in luring customers, and the marketing team wants to understand how the campaign can be improved to achieve the growth targets.
We can reframe the problem from the business stakeholders' perspective and try to see what kind of solution would best fit here.
Problem-Designing Artifacts
Just like there are several frameworks, templates, and artifacts for software engineering and other industrial projects, data science and business analytics projects can also be effectively represented using industry standard artifacts. Some popular choices are available from consulting giants such as McKinsey, BCG, and decision sciences giants such as Mu Sigma. We will use a popular framework based on the Minto Pyramid principle called Situation - Complication -Question Analysis (SCQ).
Let's try defining the problem statement in the following construct:
- Situation: Define the current situation. We can simplify this by answering the question—what happened?
A large multinational bank is designing a marketing campaign to achieve its growth target by enticing customers for bank deposits. The campaign has been ineffective in luring customers, and the marketing team wants to understand how the campaign can be improved to achieve the growth targets.
In the previous section, we saw a hypothetical business problem framed for the banking data's use case. Though this might be different in reality, we are definitely trying to solve a valid use case. By representing the problem statement in the format demonstrated as in the previous format, we have a clear area to focus on and solve. This solves the first step in the life cycle of a typical data science use case. The second step is data gathering, which we explored in the previous chapter. We will refer to the same dataset provided by UCI machine learning repository at https://archive.ics.uci.edu/ml/datasets/Bank%20Marketing.
Note
[Moro et al., 2014] S. Moro, P. Cortez, and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014.
This brings us to the final step: EDA. In this use case, we want to understand the various factors that are leading to the poor performance of the campaign. Before we delve into the actual exercise, let's take a moment to understand the concept of EDA in a more intuitive way.