
There is one thing for sure when working with data, particularly in new and emerging markets: There will be errors and inconsistencies; some discover-able and some hidden...
The point is to identify the discover-able errors and account for them to the extent possible. You also have to come to terms with the fact that some errors and data holes will be unidentifiable, and will remain hidden. While it is always true in analytics that “garbage in, garbage out”, there are methods to reduce the impact of data discrepancies. Data problems do not necessarily mean your analytic results will be inaccurate. Indeed, the value of analytic models increases when market data is less than complete. That's why you build a "model" because if you had perfect information there would be no need to model it.
One example of working with incomplete data is AP's recent work in Russia. A significant portion of the retail sector in Russia is not well tracked within syndicated sources. Yet, Russia remains a strong growing market where increased advertising and promotional spending needs to be evaluated to remain competitive. Faced with this challenge, we work with our clients to identify alternative sources of data such as direct retail data, shipment data and tracking of representative retailers to gauge sales activity and price points. By using multiple data sources, we are frequently able to triangulate our way to better information and validate poor and missing data through cross verification from several sources. This approach can be powerful since most client competitors will be unable or unwilling to do similar work to identify and validate data, and therefore AP clients can enjoy a significant competitive advantage in promising emerging markets.
Data is imperfect, and will continue to be so. The deeper you dive into new data streams, the more you risk encountering incomplete, inaccurate, double-counted, and overlapping data. There is no quick and easy solution to dirty data. It takes experience, common sense, and genuine curiosity to do the detective work to untangle a bird’s nest of data. Having said that, if you do the hard work to identify and cleanse new data, you will likely reveal opportunities for competitive advantage.