Companies that wish to take full advantage of their data must build strong, new, and different organizational capabilities. There is a lot to do, and data scientists are front and center. Good ones are rare. And critically, the difference between a great one and a good one is like the difference between lightning and a lightning bug.
A good one can help you find relationships in vast quantities of disparate data — often important insights that you would not have gotten in any other way. Great data scientists, on the other hand, develop new insights about the larger world. They certainly use data to develop those insights, but that is not the point.
Over the years I've had the privilege of working with dozens, maybe hundreds, of good statisticians, analysts, and data scientists. And a few great ones. Great data scientists bring four mutually reinforcing traits to bear that even the good ones can't.
1. A sense of wonder. Recently, many have noted that curiosity is the number one trait of a data scientist. That should go without saying. Good data scientists must be curious, just like a scientist in any discipline must be.
But the great ones take this trait to an extreme. They have a sense of wonder about the world and are happiest when they discover how something works or why it works that way! They look for those explanations in data — and anything else that will help. For example, great data scientists are interested in many things and develop networks of people with different perspectives than their own. So much the better to explore the world, and a mass of disparate data, from many angles!
2. A certain quantitative knack. Great data scientists simply see things that others don't. For example, I happened to chat with a summer intern (who now uses his analytical prowess as head of a media company) on his second day at an investment bank. His boss had given him a stack of things to read, and in scanning through, he spotted an error in a returns' calculation. It took him about an hour to verify the error and determine the correction.
What's important here is that thousands of others did not see the error. It was obvious to him, but not to anyone else. And this was a top-tier investment bank. Presumably, at least a few good analysts read the same material and did not spot it.
Mathematics has turned out to provide a convenient, amazingly-effective language (Einstein used the phrase "unreasonably effective") for describing the real world. The great data scientist taps into that language intuitively and easily in ways that even good data scientists cannot.
3. Persistence. The great data scientists are persistent, and in many ways. The intern in the vignette above made his discovery at a glance and confirmed it in an hour. It rarely works out that way. I believe it was Jeff Hooper, then at the great Bell Labs, who noted that "Data do not give up their secrets easily. They must be tortured to confess."
This is a really big deal. Even under the best of circumstances, too much data are poorly defined and simply wrong, and most turn out to be irrelevant to the problem at hand. Staring through this noisy data is arduous, frustrating work. Even good data scientists may move on to the next problem. Great data scientists stick with it.
Great data scientists also persist in making themselves heard. Dealing with a recalcitrant bureaucracy can be even more frustrating than dealing with noisy data. Continuing the vignette from above, I told the intern that he was in for a long summer. He would almost certainly spend it defending his discovery. Whichever group made the error would take great offense and may even attack him personally. Others would react with glee as they celebrated the ignorance of their peers. And he'd be caught in the middle.
4. Finally, technical skills. The abilities to access and analyze data using the newest methods are obviously important. But I'm less concerned about these than the ability to bring statistical rigor to bear. At the risk of oversimplifying, there are two kinds of analyses — descriptive and predictive. Descriptive analyses are tough enough. But the really profitable analyses involve prediction, which is inherently uncertain.
Great data scientists embrace uncertainty. They recognize when a prediction rests on solid foundations and when it is merely wishful thinking. They are simply outstanding in describing here's what has to go right for the prediction to hold; here's what will really foul it up; and here are the unknowns that will keep me awake at night. They can often quantify the uncertainty, and they are good at suggesting simple experiments to confirm or deny assumptions, reduce uncertainty, explore the next set of questions, etc.
To be clear, this ability is not "that certain quantitative knack." It is trained, sophisticated, disciplined inferential horsepower, practiced and honed by both success and failure.
Great data scientists are truly special. They're the Derek Jeters, the Michael Jordans, the Mikhail Barishnikovs, and the Julia Robertses of the data space. If you're serious about big data and advanced analytics, you need to find one or two, build around him or her, and craft an environment that helps them do their thing.