Identify and Quantify the Problem
The first step in an ML project: how to identify and quantify the problem that machine learning should solve.
Identify and Quantify the Problem: The First Step in Any ML Project
The first and most critical step in any machine learning project is identifying a specific business problem that ML can solve and quantifying its potential impact. Many ML projects fail not because of technical limitations but because they target vague, poorly defined problems. A well-defined problem statement with a clear financial impact estimate provides the focus and justification needed for a successful ML initiative.
Defining the Problem Clearly
Start by articulating the business problem in plain language, without any reference to ML or technology. Good problem definitions describe a specific, measurable business outcome you want to improve. For example: "We lose 15 percent of our customers within the first 90 days, costing us approximately $500,000 in annual revenue." Bad problem definitions are vague: "We want to use AI to improve our business."
A well-defined problem for ML typically involves prediction (what will happen?), classification (what category does this belong to?), recommendation (what should we suggest?) or optimization (what is the best option?). If your problem does not fit these patterns, ML may not be the right solution.
Quantifying the Business Impact
Before investing in ML, estimate the financial value of solving the problem. This forces you to be specific about what success looks like and provides a benchmark for evaluating whether the ML solution delivers sufficient return on investment.
Calculate the impact using this framework:
- Current cost of the problem: How much revenue, profit or efficiency is lost because this problem is unsolved?
- Realistic improvement: What percentage improvement could an ML model realistically achieve? Be conservative, a 10-20 percent improvement is more realistic than 80 percent.
- Value of improvement: Multiply the current cost by the realistic improvement percentage to estimate the value.
- Implementation cost: Estimate the cost of data preparation, model development, deployment and ongoing maintenance.
- Net value: Subtract implementation cost from the value of improvement to determine if the project is worthwhile.
Evaluating ML Suitability
Not every business problem is suited for ML. Evaluate whether your problem meets these criteria:
First, is the problem data-rich? ML requires substantial historical data to learn patterns. If you have fewer than a few hundred examples of the outcome you want to predict, ML may not work effectively.
Second, is the problem repetitive and scalable? ML shines when the same type of decision needs to be made thousands or millions of times. A one-time strategic decision is better served by human analysis than an ML model.
Third, can the outcome be measured? You need a clear, measurable definition of success (a conversion, a churn event, a purchase amount) to train and evaluate an ML model. If the outcome is subjective or unmeasurable, ML cannot be effectively applied.
Problem Prioritization
If you have identified multiple potential ML problems, prioritize them using criteria similar to your experiment prioritization: business impact, data availability, implementation feasibility and time to value. Start with the problem that offers the highest impact with the most available data and the simplest implementation path.
Share your problem definition and impact estimate with stakeholders to build alignment and support. A clearly quantified problem with executive buy-in is much more likely to receive the resources and organizational support needed for success.
Common Mistakes in Problem Definition
The most common mistake is starting with a technology ("Let's use deep learning") rather than a problem ("Our customer churn is too high"). Another frequent error is choosing problems that are interesting technically but have limited business value. Always work backward from business impact to technical solution. Align ML problem selection with your North Star Metric and overall growth strategy.
Frequently Asked Questions
How specific should the problem definition be?
As specific as possible. "Reduce customer churn" is too broad. "Predict which customers are likely to cancel their subscription within the next 30 days so we can intervene with targeted retention offers" is specific enough to design a solution. The specificity of your problem definition directly determines the quality of your ML approach.
What if the problem is clear but we do not have data?
Acknowledge the data gap early and create a plan to fill it. This may involve implementing new tracking, integrating additional data sources or running a data collection period before starting the ML project. Do not skip this reality check, as data availability is the most common constraint in ML projects.
How long should the problem identification phase take?
Spend 1-2 weeks on problem identification and quantification. This includes stakeholder interviews, data availability assessment and impact estimation. It may seem like a lot of time upfront, but a well-defined problem saves months of wasted effort on poorly scoped ML projects. Use your data strategy to identify the highest-value problems systematically.
