As a Data Scientist or Data Analyst, or one of the various other specialists focused on deriving valuable insights through the analysis of data, it goes without saying that data is the life-blood of your analytic world – the raw materials essential to your craft. But what do you do if the data doesn’t exist?
There are of course many sources of data – from traditional transaction systems to social media and increasingly the "Internet of Things" (IoT). In this blog post, I will focus on IoT and will return to social media and other sources of data in a future post.
So you are faced with an analytic challenge, but the data you ideally want doesn’t currently exist. I don’t simply mean it’s not currently available for you to use – for example, that your company doesn’t currently capture it – I mean it really doesn’t exist, anywhere. What do you do?
Enter the Internet of Things. In simple terms, we can consider the Internet to have gone through four phases (so far):
- The Connectivity Phase. This was the beginning of the Internet – the first, ubiquitous (well almost) universal platform the world has ever known that everyone could connect to. The objective was simply to connect and make our data available – to "have a website" so that we were visible to the world.
- The Process Phase. Having connected ourselves to the world and shared our data (mostly product information and company details), we then wanted to actually do something. Phase 2 of the Internet enabled interaction – two-way communication. We could not only see product information but also actually place orders. The migration of manual, paper-based processes over to the Internet was begun and heralded in what became known as eBusiness.
- The Social Phase. The past few years of the Internet have extended beyond connecting companies and processes to connecting people – the Social Media phase. Now we could connect, communicate, share and interact at a personal level.
- The "Thing" Phase. Having connected data, processes and people, the fourth phase of Internet evolution that we are now experiencing is the connecting of every "thing" else. And I do mean "everything"!
It is projected that by 2020 there will be over 50 billion things connected to the Internet (there are an estimated 10 – 12 billion today). Personally, I believe this is a gross underestimation! We are connecting everything from our phones, cars and televisions to cameras, watches and clothing. We are even connecting ourselves – our very bodies. Indeed, everything about our activities and the objects around us are being connected to the Internet. IoT is accelerating at an unimaginable rate, and it's projected to have an impact that will be over 35 times greater than the Internet itself.
It is the connection of these Things that holds the key to our missing data. The remedy is simple: If the data don’t currently exist, connect something that can detect and deliver the data you need. Let me provide a few examples to indicate what I mean.
- One of the most difficult and costly aspects of healthcare is patient adherence; for example, taking medications as prescribed. Patients are notoriously bad at following their doctor’s instructions (e.g., completing a course of treatment or taking medications on time), and they are equally bad at honestly and accurately informing their doctor of their activities. How can doctors provide optimum care when they have no way of truly knowing what their patients are doing? The data simply does not exist, but what if the medications themselves could provide accurate data… Proteus Digital Health (http://www.proteus.com) manufactures a range of Digital Medicine products, including sensor-enabled pills that indicate when the patient has swallowed them. These, along with a range of software applications, can inform patients (reminding them to take their medications) and keep their doctor informed. The data made available by these digital medicines can also provide invaluable raw material for Data Scientists to conduct broad ranging and highly accurate research.
- In addition to knowing when their packages would arrive, Federal Express customers needed to know whether their package had remained within certain transportation tolerances. For example, whether a package had remained within allowable temperature ranges, or humidity, vibration etc. FedEx didn’t have the data to provide to it’s customers (or to itself for analysis purposes), and so it resolved to create that data by building a suite of sensor modules that could be placed inside packages and record the transportation vital statistics. This is now a fast growing and profitable revenue line for FedEx known as SenseAware.
- In some cases, it is not only the Data Scientists that is missing the data – what if we want to push new data to our customers. Take, for example, product information. In 2015, Diageo showcased its iconic Johnny Walker Blue Label whiskey using a new Smart Bottle label manufactured by ThinFilm. These Near Field Communication (NFC) labels are capable of providing never-before-available data to the company (inventory levels, location at a per-bottle level, whether the bottle has been opened or not, consumption rates etc.) as well as pushing new data to customers. For example, if unopened, the customer’s smartphone app may present them with a promotional offer, but once opened, the app may then provide interesting cocktail recipes.
The key for those involved in the data analysis world is that the data you need won’t always be available. It is therefore as important to understand how you can obtain it as it is to know how you can analyze it. There are a myriad of sensors available today – and more being invented all the time. Here are just a few you probably have just on your smartphone alone:
- Temperature sensor
- Light Sensor
- Voice sensor (Mic)
- Fingerprint sensor
- Touch sensor
As a Data Scientist, you may not be directly responsible for building the technologies or embedding the sensors in your company's products or processes, but you do need to understand the range of options available to obtain the data you need. Your company entrusts you to understand how to use data effectively. You can and should, therefore, be actively involved in helping your organizations understand where and how data can be captured so that you can obtain the data necessary to provide the analytic insights that will drive valuable business decisions going forward.
Rick Hutley is the Program Director and Clinical Professor of Analytics at the University of the Pacific, frequent keynote speaker and former Vice President of Innovation at Cisco Systems. You can find more information at: http://www.stratathought.com/#news