Data scientists are a rare breed. It's not easy to find the perfect blend of skills in information technology, statistics and business. Having said that, more and more are slowly beginning to be emerge after building up their profiles through a mix of experience and training. We're still a long way off, however, in terms of bridging the gap between supply and demand in the talent market.
Data scientists think data. They look for it, have enough programming knowledge to extract it, know how to clean it by removing noisy data elements and filling in missing bits. They can increase its information value by combining it with other data. They know how to analyse it using statistics and data analysis tools or languages such as R, SAS and SPSS, and they know when they have discovered new facts by interpreting their results with their business understanding. And, of course, they know how and when to use visualization tools and infographics techniques to present their findings to others.
These are some of the technical capabilities that data scientists have, although some may argue over the details. Some capabilities may be stronger than others, but it may not matter because they can seek augmentative assistance from supporting experts if they need it.
But there's more. The data scientist role is a business technology one, one that straddles and extends on either side beyond the boundary between business and technology. Therefore in addition to these technical skills there are a number of additional behavioural traits that are extremely useful, the difference between a good data scientist and a great one may well lie in their ability to use certain soft talents to make sure that their work eventually translates into corporate earnings. Some of these capabilities are as follows.
Having tools and the requisite subject matter expertise alone does not necessarily lead to useful outcomes. Combining knowledge with new ideas does. The power of a tool really lies in the hands of its user. A good data scientist knows all the right techniques, but a great data scientist will creatively figure out ways to combine and mine the obvious along with the not-so-obvious. They can and dare to think in different ways that others did not. It leads to trying out new things and that increases the probability of achieving a new result.
Great data scientists are the ones who know how to effectively communicate and deal with other professionals from various functions and at various levels of the organizations. They need to seek data, put forward their case for resources, get the help of technology and business experts, present their findings in understandable language and, finally, convince the business leaders of their findings.
Statistical methods involve creating a hypothesis and then finding out whether the data meets the hypothesis. Leading business questions or having a deep domain understanding can by itself sometimes bias the analysis of data towards a pre-supposed outcome, whether knowingly or unconsciously. A great data scientist knows his science and the business domain, but remains thoroughly unbiased while choosing methods and analysing the results.
It's easy to get lost in trying to perfect a technique or to go into deeper and deeper levels of refinement in the quest for perfection. A great data scientist loves his craft, but is mindful that it is all about a business case, and that the work has to follow a plan, and be bound by timelines and budgets. They work with intensity of purpose, continuously focusing on figuring out what next step will lead to a result. They know when to call a stop to time and increased complexity and produce an outcome that is good enough to work with.
Raw data by itself is just that - data. It doesn't actually say anything, contrary to what some might say. It's the analysis of it that says something, and sometimes even that is not enough. Every outcome at every stage of analysis needs to be questioned in different ways. It takes curiosity about data and observed business actions to be motivated enough to keep wrangling it until a new truth emerges.
A great data scientist knows how to work well with others. They may be very knowledgeable in more fields than one, but they will always need to be able to work well with colleagues from other disciplines in order to achieve their results.
If at first you don't succeed, try and try again. Everyone's heard that. Data science requires patience, creativity and the persistence - the ability to try and try again. Methods may be exact, but useful results are never guaranteed. Some things can go wrong, or perhaps the best methods aren't the ones first selected. The path followed may lead to a blind alley, in which case a fresh start may be required. A great data scientist knows this and keeps plodding along, step by step, iteration by iteration, until they get what they were looking for.
This blog is listed under Data & Information Management Community