The Right Tool for the Job: Linear models, Machine Learning and Ice Cream


The Right Tool for the Job: Linear Regression, Machine Learning and Ice Cream

August 2, 2018

I'm frequently asked by clients about machine learning (ML) and AI and whether they should be leveraging it at their company. The questions range from the general ("What's the deal with AI?") to the more pointed in nature ("Why are we doing linear regression and not machine learning for our study?"). It's natural, as there is a lot of buzz about companies using ML to solve problems or various start-ups using AI to...wait for it..."disrupt" one industry or another.


My response is generally to simply ask what our client’s goals are. Far and above any considerations of data or scalability, their response is usually the biggest determinant of which modelling techniques are most appropriate. If they want to know which media tactic delivers the highest ROI or how well a Q4 TV campaign performed relative to their summer copy, then almost certainly some form of linear regression is the best bet. If, on the other hand, they want to classify their customers based on purchase habits or predict sales volume on a daily or weekly basis, Machine Learning is a great candidate. Other factors, of course, factor into model choice as well. Machine learning models, for example, generally demand very large quantities of data—something not all companies may be able to provide.


The decision on what methodology to use largely comes down to whether the client wants to be able to explain what happened or predict what will happen in the future. So even though you may want to say you are using most advanced modelling technique it is more important that you use the right tool for the job. Simply put, what does the client want to learn from the study and how might the learnings be used strategically by the organization?


So why do we have to decide between explaining and predicting? It’s not a matter of choosing one or the other as it is choosing a point on a continuum between the two. On the explanatory end we have linear regression with easily interpretable coefficients and testable hypotheses. On the prediction end we have constructs like bagged and boosted decision trees, support vector machines, and neural networks (it's at this end of the continuum that fun terms like "black box models" quickly emerge). In the middle we'll find techniques like ridge and lasso regression. At this point interpretation of the model terms is still possible but can get quite tricky.


Spectrum from Explanatory Power to Predictive Power. Ordinary Least Squares, Weighted Least Squares, Ridge, Lasso, Mars, Random Forest, Support Vector Machines, XGB, Neural Networks


With a linear regression the explanatory power arises from the fact that we're essentially seeking the best approximation of reality with the data on hand (e.g., a model that explains why sales went up or down depending on levels of marketing and economic factors). Here our best fit is that which, on average, minimizes the difference between actual observed values and what our model said they would be given all of our inputs. The key point here is linear regression is retrospective and looks to approximate something that already happened.


A predictive model, on the other hand, is naturally forward-looking with the goal of accurately forecasting new values that haven't yet been witnessed. One of the most counter-intuitive aspects here is that it is possible to accurately predict what will happen without being able to clearly explain why.


An example: Consider a highly seasonal business -- an ice cream company perhaps -- whose sales are likely impacted by weather. We would be reasonable in suggesting ice cream sales would climb as temperature rises. When building a linear regression we would include temperature as a variable along with media and perhaps an interaction between the two. In other words, we’re allowing for the possibility that the effectiveness of our media might be affected by good or bad weather. What we’re doing is testing a specific hypothesis about weather and determining if it's valid. We'd get a coefficient from our model (essentially a number which represents how much we’d expect sales to change when weather or media change) and would report something along the lines of "all else equal, for every 1 degree increase in temperature we'd expect an associated Y increase in ice cream sales".


An ML the approach would be a little different. Sadly, machine learning doesn’t mean we’ll be working with talking robots but rather that our algorithm can learn relationships that aren't explicitly specified by the analyst. In our ice cream/weather example, a tree-based model like “Random Forest” may detect decision (i.e., inflection) points at specific and/or multiple temperature or precipitation levels. That is, the importance of temperature for predicting sales may be minimal up until a certain point but then spike at a certain threshold, and then flatten, spike again, or even decline at a higher temperature. With ML we wouldn't get just one coefficient for weather but many and this relationship may reveal that rising temperatures doesn’t have a strictly positive impact on ice cream sales.



Let's suppose such a relationship does appear in our fictional ice cream model. The ML model output shows a non-linear impact of temperature with impact levelling off and even declining around 95-100 degrees Fahrenheit. Further, let's say this model is delivering highly accurate predictions against our holdout data. Does this relationship make sense?


To help rationalize the results we could consider that temperature isn't likely to matter much when it's 30-40 degrees as ice cream loses its appeal if you are too cold. But when it rises to 80 degrees a frozen treat may sound like a great idea. What about 90 degrees? Sure, maybe even a little more so. What about 100 degrees? Perhaps people would pass on ice cream as it'll just melt right away and cause a big mess?

These are plausible explanations for the relationship we're seeing but does that last one hold water? Do people just stop buying ice cream because they think it'll melt? Perhaps. It could also be that when it gets that hot people simply stay home in the A/C and don't venture out to the store or the park. It could also be any number of other factors.


The question then becomes, from a practical standpoint, whether it benefits us any further to determine what those factors are. Would we derive any strategic benefit from knowing why temperature doesn't have a positive impact at 100 degrees or is it enough to know that relationship exists as it enables the ice cream merchant to respond accordingly, perhaps by adjusting media support or reducing shipments to avoid overstocks?


There could be any number of economic or behavioral factors that influence buying habits at higher temperatures and knowing what they are may not actually empower us to respond more strategically or even improve the predictive accuracy or our model.


This brings us back to that initial question that we ask when planning a project: What are we looking to learn and how (or can) the learnings be used by the organization? Depending on the answer we might build a linear regression or start boosting some decision trees. It could be that we'll do both.

First, we figure out the job at hand, then we choose the right tool.


-Brandon Rude, Consultant