Serious marketers, innovators and quants can't find a better — or more controversial — case study in the problems and pathologies of predictive analytics than America's down-to-the-wire presidential campaign.
There's not a business in the world today that shouldn't be reexamining their own data-driven marketing research and customers analytic practices in light of issues raised by presidential polling. Predicting the next President of the United States is, after all, at least as interesting and insightful a marketing challenge as forecasting commercial success for a new tablet or automobile.
Have polls truly oversampled Democrats relative to Republicans and independents? Are voter turnout models appropriately calibrated to prior elections'? Do pollsters allow measurable political or partisan biases to taint their methodologies? Which demographic segments are most susceptible to changing their minds and their votes? How should one weight dedicated state polls in the context of national surveys?
When quantitative prognosticators like New York Times' 538 blogger Nate Silver argue that, in fact, oversampling accusations are overstated or that it's more statistically sensible to benchmark turnout projections against the 2008 presidential election than the 2010 congressional election, business leaders committed to data-driven decision making would be wise to look at the underlying reasons. Do they reflect probabilistic "best practice"? Or do they say more about the nuances and idiosyncrasies that make expert model builders expert?
Consider, for example, the oversampling conundrum. What represents a better predictive statistical model for a prognosticating pollster: a 35% representation of Democrats that faithfully reflect the 2008 presidential election proportions? Or a 30% level from the 2010 congressional election? Similarly, how should pollsters credibly fine-tune their surveys if the proportion of self-described "independents" jumps from 12% to 18%? What portion of that shift should be attributed to the survey methodology itself versus an actual structural shift in the number of independents?
While there's no shortage of excellent statistical techniques for finessing sample sizes and shifts, the unavoidable truth is that the best answers to these questions are usually judgment calls. Anyone who's read Silver's book on analytics (I reviewed it for Fortune) would likely agree he has a sure grasp to 'best practice' statistical modeling. By the same token, anyone following the intense — and intensifying — arguments around oversampling would likely come to the not unreasonable conclusion that there's zero consensus around "best practice" and quite a bit of legitimate disagreement around how much to weight the past.
These are exactly — and I mean "precisely" — the same kinds of debates and disagreements I hear in organizations invested in "Big Data" and next generation analytics. Arguments around best practice oftentimes reflect partisan interests — interdivisional and/or interdepartmental rather than Democrat or Republican — and virtually every innovator I know displays confirmation bias when it comes to weighting past data in predicting future performance. If the innovation naturally extends the trend lines, the data is welcomed; if the innovation challenges or kinks those curves, well, it's an innovation so what matters is the future, not the past.
The politics of innovation can be every bit as biased and bitter as presidential politics. What makes these presidential debates particularly appealing — and useful — is that offer two qualities most data-driven analytics enterprises lack: greater competition and greater transparency.
The presidential polls have organizations like Gallup, Pew, Rasmussen, the New York Times, Washington Post/ABC, Frank Luntz, etc., all competing with one another — often on both a state and national level — to provide the best and most timely information they can. What's more, their underlying sampling methodologies (questions, sample sizes, composition, etc.) are made increasingly transparent and accessible. The quantitative connoisseur can sample these samplings and survey these surveys to come to their own conclusions about what kind of best practices — and biases — co-exist.
More harshly put, everything you find frustrating, confusing and borderline dishonest about presidential polling is likely to be be either mirrored or reinvented in your own organizations. But chances are things will be even worse because, usually, there's far less competition and transparency around the market research polls, survey and interpretations you do. The proprietary nature of most enterprise analytics makes it far too easy for pathological "oversampling" and confirmation bias to take root rather than become a source of public challenge and debate.
While I have no idea who'll win America's presidential election, I'm confident that, whatever the outcome, the pollsters will be revisiting and revising their most cherished assumptions and methodologies. They'll be reviewing and refining best practice and their judgement calls in light of the useful — and useless — insights their data provided.
The 2012 campaign offers a master class in how to identify and exploit the risks and rewards of data-driven market research. Let's see how many enterprises and entrepreneurs learn from their mistakes and successes.