Data scientists are now among the most highly sought-after professionals, and they are being called on to work more closely than ever with enterprise strategists to predict emerging trends, optimize outcomes, and create entirely new kinds of business value.
Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy.
To learn more about modern data scientists, how they operate, and why a new level of business analysis professional certification has been created by The Open Group, we are joined by Martin Fleming, Vice President, Chief Analytics Officer, and Chief Economist at IBM; Maureen Norton, IBM Global Data Scientist Professional Lead, Distinguished Market Intelligence Professional, and author of Analytics Across the Enterprise, and George Stark, Distinguished Engineer for IT Operations Analytics at IBM. The panel is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions.
Here are some excerpts:
Gardner: We are now characterizing the data scientist as a profession. Why have we elevated the role to this level, Martin?
Fleming: The benefits we have from the technology that’s now available allow us to bring together the more traditional skills in the space of mathematics and statistics with computer science and data engineering. The technology wasn't as useful just 18 months ago. It’s all about the very rapid pace of change in technology.
Gardner: Data scientists used to be behind-the-scenes people; sneakers, beards, white lab coats, if you will. What's changed to now make them more prominent?
Norton: Today’s data scientists are consulting with the major leaders in each corporation and enterprise. They are consultants to them. So they are not in the back room, mulling around in the data anymore. They're taking the insights they're able to glean and support with facts and using them to provide recommendations and to provide insights into the business.
Gardner: Most companies now recognize that being data-driven is an imperative. They can’t succeed in today's world without being data-driven. But many have a hard time getting there. It's easier said than done. How can the data scientist as a professional close that gap?
Stark: The biggest drawback in integration of data sources is having disparate data systems. The financial system is always separate from the operational system, which is separate from the human resources (HR) system. And you need to combine those and make sure they're all in the same units, in the same timeframe, and all combined in a way that can answer two questions. You have to answer, “So what?” And you have to answer, “What if?” And that’s really the challenge of data science.
Gardner: An awful lot still has to go on behind the scenes before you get to the point where the “a-ha” moments and the strategic inputs take place.
Martin, how will the nature of work change now that the data scientist as a profession is arriving – and probably just at the right time?
Fleming: The insights that data scientists provide allow organizations to understand where the opportunities are to improve productivity, of how they can help to make workers more effective, productive, and to create more value. This enhances the role of the individual employees. And it’s that value creation, the integration of the data that George talked about, and the use of analytic tools that's driving fundamental changes across many organizations.
Captain of the data team
Gardner: Is there any standardization as to how the data scientist is being organized within companies? Do they typically report to a certain C-suite executive or another? Has that settled out yet? Or are we still in a period of churn as to where the data scientist, as a professional, fits in?
Norton: We're still seeing a fair amount of churn. Different organizing approaches have been tried. For example, the centralized center of excellence that supports other business units across a company has a lot of believers and followers.
The economies of scale in that approach help. It’s difficult to find one person with all of the skills you might need. I’m describing the role of consultant to the presidents of companies. Sometimes you can’t find all of that in one individual -- but you can build teams that have complimentary skills. We like to say that data science is a team sport.
Gardner: George, are we focusing the new data scientist certification on the group or the individual? Have we progressed from the individual to the group yet?
Stark: I don’t believe we are there yet. We’re still certifying at the individual level. But as Maureen said, and as Martin alluded to, the group approach has a large effect on how you get certified and what kinds of solutions you come up with.
Gardner: Does the certification lead to defining the managerial side of this group, with the data scientist certified in organizing in a methodological, proven way that group or office?
Learn How to Become
Certified as a
Fleming: The certification we are announcing focuses not only on the technical skills of a data scientist, but also on project management and project leadership. So as data scientists progress through their careers, the more senior folks are certainly in a position to take on significant leadership and management roles.
And we are seeing over time, as George referenced, a structure beginning to appear. First in the technology industry, and over time, we’ll see it in other industries. But the technology firms whose names we are all familiar with are the ones who have really taken the lead in putting the structure together.
Gardner: How has the “day in the life” of the typical data scientist changed in the last 10 years?
Stark: It’s scary to say, but I have been a data scientist for 30 years. I began writing my own Fortran 77 code to integrate datasets to do eigenvalues and eigenvectors and build models that would discriminate among key objects and allow us to predict what something was.
The difference today is that I can do that in an afternoon. We have the tools, datasets, and all the capabilities with visualization tools, SPSS, IBM Watson, and Tableau. Things that used to take me months now take a day and a half. It’s incredible, the change.
Gardner: Do you as a modern data scientist find yourself interpreting what the data science can do for the business people? Or are you interpreting what the business people need, and bringing that back to the data scientists? Or perhaps both?
Collaboration is key
Stark: It’s absolutely both. I was recently with a client, and we told them, “Here are some things we can do today.” And they said, “Well, what I really need is something that does this.” And I said, “Oh, well, we can do that. Here’s how we would do it.” And we showed them the roadmap. So it’s both. I will take that information back to my team and say, “Hey, we now need to build this.”
Gardner: Is there still a language, culture, or organizational divide? It seems to me that you’re talking apples and oranges when it comes to business requirements and what the data and technology can produce. How can we create a Rosetta Stone effect here?
Norton: In the certification, we are focused on supporting that data scientists have to understand the business problems. Everything begins from that.
In the certification, we are focused on supporting that data scientists have to understand the business problems. Everything begins from that. Knowing how to ask the right questions, to scope the problem, and be able to then translate is essential.
Knowing how to ask the right questions, to scope the problem, and be able to then translate [is essential]. You have to look at the available data and infer some, to come up with insights and a solution. It's increasingly important that you begin with the problem. You don't begin with your solution and say, “I have this many things I can work with.” It's more like, “How we are going to solve this and draw on the innovation and creativity of the team?”
Gardner: I have been around long enough to remember when the notion of a chief information officer (CIO) was new and fresh. There are some similarities to what I remember from those conversations in what I’m hearing now. Should we think about the data scientist as a “chief” something, at the same level as a chief technology officer (CTO) or a CIO?
Chief Data Officer defined
Fleming: There are certainly a number of organizations that have roles such as mine, where we've combined economics and analytics. Amazon has done it on a larger scale, given the nature of their business, with supply chains, pricing, and recommendation engines. But other firms in the technology industry have as well.
We have found that there are still three separate needs, if you will. There is an infrastructure need that CIO teams are focused on. There are significant data governance and management needs that typically chief data officers (CDOs) are focused on. And there are substantial analytics capabilities that typically chief analytics officers (CAOs) are focused on.
It's certainly possible in many organizations to combine those roles. But in an organization the size of IBM, and other large entities, it's very difficult because of the complexity and requirements across those three different functional areas to have that all embodied in a single individual.
Gardner: In that spectrum you just laid out – analytics, data, and systems -- where does The Open Group process for a certified data scientist fit in?
Fleming: It's really on the analytics side. A lot of what CDOs do is data engineering, creating data platforms. At IBM, we use the term Watson Data Platform because it's built on a certain technology that's in the public cloud. But that's an entirely separate challenge from being able to create the analytics tools and deliver the business insights and business value that Maureen and George referred to.
Gardner: I should think this is also going to be of pertinent interest to government agencies, to nonprofits, to quasi-public-private organizations, alliances, and so forth.
Given that this has societal-level impacts, what should we think about in improving the data scientists’ career path? Do we have the means of delivering the individuals needed from our current educational tracks? How do education and certification relate to each other?
Academic avenues to certification
Fleming: A number of universities have over the past three or four years launched programs for a master’s degree in data science. We are now seeing the first graduates of those programs, and we are recruiting and hiring.
I think this will be the first year that we bring in folks who have completed a master’s in data science program. As we all know, universities change very slowly. It's the early days, but demand will continue to grow. We have barely scratched the surface in terms of the kinds of positions and roles across different industries.
That growth in demand will cause many university programs to grow and expand to feed that career track. It takes 15 years to create a profession, so we are in the early days of this.
Norton: With the new certification, we are doing outreach to universities because several of them have master’s in data analytics programs. They do significant capstone-type projects, with real clients and real data, to solve real problems.
We want to provide a path for them into certification so that students can earn, for example, their first project profile, or experience profile, while they are still in school.
Gardner: George, on the organic side -- inside of companies where people find a variety of tracks to data scientist -- where do the prospects come from? How does organic development of a data scientist professional happen inside of companies?
Stark: At IBM, in our group, Global Services, in particular, we've developed a training program with a set of badges. They get rewarded for achievement in various levels of education. But you still need to have projects you've done with the techniques you've learned througheducation to get to certification.
Having education is not enough. You have to apply it to get certified.
Gardner: This is a great career path, and there is tremendous demand in the market. It also strikes me as a very fulfilling and rewarding career path. What sorts of impacts can these individuals have?
Learn How to Become
Certified as a
Fleming: Businesses have traditionally been managed through a profit-and-loss statement, an income statement, for the most part. There are, of course, other data sources -- but they’re largely independent of each other. These include sales opportunity information in a CRM system, supply chain information in ERP systems, and financial information portrayed in an income statement. These get the most rigorous attention, shall we say.
We're now in a position to create much richer views of the activity businesses are engaged in. We can integrate across more datasets now, including human resources data. In addition, the nature of machine learning (ML) and artificial intelligence (AI) are predictive. We are in a position to be able to not only bring the data together, we can provide a richer view of what's transpiring at any point in time, and also generate a better view of where businesses are moving to.
It may be about defining a sought-after destination, or there may be a need to close gaps. But understanding where the business is headed in the next 3, 6, 9, and 12 months is a significant value-creation opportunity.
Gardner: Are we then thinking about a data scientist as someone who can help define what the new, best business initiatives should be? Rather than finding those through intuition, or gut instinct, or the highest paid person's opinion, can we use the systems to tell us where our next product should come from?
Pioneers of insight
Norton: That's certainly the direction we are headed. We will have systems that augment that kind of decision-making. I view data scientists as pioneers. They're able to go into big data, dark data, and a lot of different places and push the boundaries to come out with insights that can inform in ways that were not possible before.
It’s a very rewarding career path because there is so much value and promise that a data scientist can bring. They will solve problems that hadn't been addressed before.
It's a very exciting career path. We’re excited to be launching the certification program to help data scientists gain a clear path and to make sure they can demonstrate the right skills.
I's a very rewarding career path because there is so much value and promise that a data scientist can bring. They will solve problems that hadn't been addressed before.
Gardner: George, is this one of the better ways to change the world in the next 30 years?
Stark: I think so. If we can get more people to do data science and understand its value, I'd be really happy. It's been fun for 30 years for me. I have had a great time.
Gardner: What comes next on the technology side that will empower the date scientists of tomorrow? We hear about things like quantum computing, distributed ledger, and other new capabilities on the horizon?
Future forecast: clouds
Fleming: In the immediate future, new benefits are largely coming because we have both public cloud and private cloud in a hybrid structure, which brings the data, compute, and the APIs together in one place. And that allows for the kind of tools and capabilities that necessary to significantly improve the performance and productivity of organizations.
Blockchain is making enormous progress and very quickly. It's essentially a data management and storage improvement, but then that opens up the opportunity for further ML and AI applications to be built on top of it. That’s moving very quickly.
Quantum computing is further down the road. But it will change the nature of computing. It's going to take some time to get there but it nonetheless is very important and is part of that what we are looking at over the horizon.
Gardner: Maureen, what do you see on the technology side as most interesting in terms of where things could lead to the next few years for data science?
Norton: The continued evolution of AI is pushing boundaries. One of the really interesting areas is the emphasis on transparency and ethics, to make sure that the systems are not introducing or perpetuating a bias. There is some really exciting work going on in that area that will be fun to watch going forward.
Gardner: The data scientist needs to consider not just what canbe done, but what should be done. Is that governance angle brought into the certification process now, or something that it will come later?
Stark: It's brought into the certification now when we ask about how were things validated and how did the modules get implemented in the environment? That’s one of the things that data scientists need to answer as part of the certification. We also believe that in the future we are going to need some sort of code of ethics, some sort of methods for bias-detection and analysis, the measurement of those things that don't exist today and that will have to.
Gardner: Do you have any examples of data scientists doing work that's new, novel, and exciting?
Rock star potential
Fleming: We have a team led by a very intelligent and aggressive young woman who has put together a significant product recommendation tool for IBM. Folks familiar with IBM know it has a large number of products and offerings. In any given client situation the seller wants to be able to recommend to the client the offering that's most useful to the client’s situation.
And our recommendation engines can now make those recommendations to the sellers. It really hasn't existed in the past and is now creating enormous value -- not only for the clients but for IBM as well.
Gardner: Maureen any examples jump to mind that illustrate the potential for the data scientist?
Norton: We wrote a book, Analytics Across the Enterprise, to explain examples across nine different business units. There have been some great examples in terms of finance, sales, marketing, and supply chain.
Learn How to Become
Certified as a
Gardner: Any use-case scenario come to mind where the certification may have been useful?
Norton: Certification would have been useful to an individual in the past because it helps map out how to become the best practitioner you can be. We have three different levels of certification going up to the thought leader. It's designed to help that professional grow within it.
Stark: A young man who works for me in Brazil built a model for one of our manufacturing clients that identifies problematic infrastructure components and recommends actions to take on those components. And when the client implemented the model, they saw a 60 percent reduction in certain incidents and a 40,000-hour-a-month increase in availability for their supply chain. And we didn't have a certification for him then -- but we will have now.
Gardner: So really big improvement. It shows that being a data scientist means you're impactful and it puts you in the limelight.
IBM has built an internal process that matches with The Open Group. Other companies are getting accredited for running a version of the certification themselves, too.
Stark: And it was pretty spectacular because the CIO for that company stood up in front of his whole company -- and in front of a group of analysts -- and called him out as the data scientist that solved this problem for their company. So, yeah, he was a rock star for a couple days.
Gardner: For those folks who might be more intrigued with a career path toward certification as a data scientist, where might they go for more information? What are the next steps when it comes to the process through The Open Group, with IBM, and the industry at large?
Where to begin
Norton: The Open Group officially launched this in January, so anyone can go to The Open Group website and check under certifications. They will be able to read the information about how to apply. Some companies are accredited, and others can get accredited for running a version of the certification themselves.
IBM recently went through the certification process. We have built an internal process that matches with The Open Group. People can apply either directly to The Open Group or, if they happen to be within IBM or one of the other companies who will certify, they can apply that way and get the equivalent of it being from The Open Group.