About econometrics, statistics and data science: the great divide in forecasting

I attended and was priviliged to speak at the International Symposium of Forecasting (ISF), May 20-22nd in Santander, Spain. As a supply chain person, with a strong interest in forecasting, I was curious to get a deeper understanding of where the academics are and where they are heading. A main conclusion I took home was that there seems a great divide between 3 groups: the econometrists, the statisticians, and the data scientists.

By Bram Desmet

 

ISF is the yearly conference organized by the International Institute of Forecasters (IIF), a global community of approximately 350 forecast academics and practitioners. The 3 groups that attended the same conference treat comparable problems, but they talk a different language and seem to be wary of each other. I believe the three can be combined and explain how they will help in improving value for the supply chain.

Econometrists seem to be on a quest for causality and dynamics. They try to model the interactions of a complex system. An example could be predicting the sales of a new molecule still in development at a pharmaceutical company 10 to 15 years out. It could be predicting the peaks in electricity consumption in fast developing economies like India. As the change of the context is key in coming to a forecast, econometrists will model and forecast drivers and their interactions. The errors on the models are large. Econometrists are aware, and seem to be continuously adapting their models to gradually improve the uncertainty, bit by bit.

In supply chain forecasting, most software packages still rely on extrapolation techniques. In short, they analyze historical sales for repeatable patterns and will extrapolate these into the future. There is a huge variety of techniques available, and it’s clear from the conference, that expanding the list is still a favorite of the academics. We will look at trends, seasonality, autocorrelation, … and recombine in different fashions. I do notice however that this field is increasingly switching to statistics. Modelling of promotions, to give just 1 example, requires the addition of external information like ‘when did we have a promotion, what type of promotion was it, how long did it run, how did we advertise, …’. A statistical technique like regression is a good fit for including this type of external information.

Strange guys

If you listen to the small talk in the breaks, you’ll hear the econometrists blame the statisticians to assume they can build the perfect model on which they are overly reliant. The statisticians will blame the econometrists of making fuzzy models that are too uncertain to be of any practical value.

Whilst the two are quarreling, there’s a new species which is gently but firmly infiltrating the field. Their mission is to make sure we forget about econometrics or statistics. That’s old stuff that has been around for decades. The new and exciting stuff is ‘big data analytics’. I estimate the number of people at the conference with this background to be around 5-10 percent. Still clearly a minority. I saw some signs of an ‘intruder alert’: “Who are these strange guys?” They look different, they behave different and they speak a different language. The general behavior was that of “we are non-believers”. Choose for the old and the known. The results however were promising, for instance the application of neural networks to forecasting seems to gradually gain ground.

It’s strange to notice, that on a truly global conference, with many people coming from different nationalities, this is what is dividing the group. Not language, religion, political preferences … no, the academic discipline from which they come.

Reunite

So what’s in it for the supply chain?

In supply chain forecasting, we’ve been overly reliant on extrapolation methods, and we’ve been ignoring the effect of external data for too long. As the statisticians move into the field of including external data, the econometrics and statistics will make a beautiful marriage. It’s one of the reasons we at Solventure are sponsoring a Phd on Leading Indicator Forecasting. Some say it’s the holy grail. I believe there’s an opportunity if you bring things together that have been separated before.

Big data analytics will help in improving short-term forecasts. Sensing what’s going on in the social media and translating that into events with associated probabilities seems to be a perfect fit for the data science techniques. It’s also a topic where econometrics and statistics are a bit weaker in handling the variability. I’m not necessarily saying something like neural networks won’t add value for a mid-term forecast. I’m more advocating that there is more value add on the short term. At Solventure we had a master thesis on ‘hype detection in retail’ based on ‘social media input’. Loading 100 Million tweets from leading markets such as the UK and the US, has proven to have predictive value for markets in Western Europe.

These two evolutions will change the way that we make a forecast in the supply chain. Where over the last 30 years the competition has been on ‘who has the best technique’, I strongly believe that over the next 30 years it will be on ‘who has the best information’ and knows how to draw value from it.

Bram Desmet is Adjunct Professor Operations & Supply chain at Vlerick Leuven Gent Management School and Managing Director at Solventure (bram_desmet@solventure.eu)