The Internet of Things (IoT) is touted as the next big thing—the new horizon, but an area fraught with technical and legal challenges. Wasn’t this the case for “big data” several years ago? Put these two topics together and what do you get? Well, more of a synergy than a mess . . . an explanation is in order.
In order to get our minds around this topic, we will need to define a few things. First, we will define big data and then IoT, and then we will be able to sew everything together. Big data is generally data that is too voluminous, complex, or fast-moving for conventional methods to accommodate. However, when we discuss big data, we are typically referring to solutions to big data, i.e., by our big data definition, the unconventional methods that do accommodate the voluminous, complex, or fast-moving data. These big data solutions in the area of data science help grapple with the IoT challenge. From here on, we will swap the term “big data” for “data science.” However, before getting to how the data science solutions come to the rescue, we have to define IoT.
The most simplified definition is simply a three-part phrase: (1) devices using a (2) gateway to (3) talk over the Internet. IoT is actually much broader than one might expect—it is a mile wide and a mile deep, and the requirements, and hence the problems, are highly dependent on the industry vertical. These verticals include transportation, health care, environmental monitoring, energy management, building management, media—and more. To compound this, standards traverse these verticals at each of the three levels of the IoT definition. There are dozens of standards, solution providers, and guiding principles at each level.
How Big Data Solutions Help
With our definition of IoT in hand, we can begin to look at how data science comes to the rescue. Data science is an interdisciplinary field including statistics, data mining, predictive analytics, information technology, and others. The high-level purpose is to gain insight from data. However, IoT is likely to focus on particular areas within data science given the unique types of data involved and the speed at which the data moves. Let’s take a look at a few examples to help conceptualize.
First, imagine a busy Los Angeles freeway interchange where hundreds of cars are traveling, merging, speeding up, and slowing down as each tries to make its way to its destination. Imagine trying to track the movement of each car, making sure they do not collide and ensuring that the traffic moves in a smooth and predictable manner. This would seem to be a daunting task. However, even this example will not suffice. Instead, imagine poking a beehive with a stick. Thousands of individual bees swirl and spin around the hive in panic, each with its own individual, predetermined role. This is a better representative of the challenge imposed by IoT.
Data science includes tools that accommodate fast-moving, streaming data. Streaming data is unique in that quick assessments and decisions have to be made in real time and based on relatively small amounts of data at a time. Some of the more interesting methods are below.
Association Rules and Sequence Analysis
Association rules are used to identify useful groupings of events that often occur together. Sequence analysis, used to identify the order in which events occur, is also informative, and is often used in conjunction with association rules to further refine a rule set.
With association rules and sequence analysis, we can identify what groupings and sequence of events lead to others. These tools are used in what many know as “market basket analysis” (though our use is quite different). For example, retailers want to predict what products shoppers choose. If shoppers buy hamburger meat and buns, they may then buy ketchup. Here, the purchase of hamburger and buns is an association with a sequential purchase of ketchup. The order in which the shopper buys hamburger meat and buns may not matter, but when these two purchases are made together, rule sets may show that shoppers will often follow, sequentially, with the purchase of ketchup. Why not display ketchup next to the buns? Or, at the meat counter?
In the IoT context, if a device is talking to several other devices over the Internet, we may want to have historical data at hand so we know what is supposed to happen. When working with many devices, their exchange of data can be analyzed to see what associations and sequences occur. Do certain devices tend to communicate with others often, thereby forming an association? Are there important sequences in the communication? This can be important in establishing “normal” communication behavior. Then, when, where, and how anomalous communication occurs may point to a device that is no longer communicating correctly.
Time Series Analysis and Forecasting
Time series analysis is the study of how events relate to each other over specific time intervals. Time series forecasting attempts to predict events based on time series analysis. Similar to sequence analysis, time series forecasting considers data as it relates to successive time intervals or time periods and predicts what might happen based on previously encountered event timings. However, time series analysis takes into account how much time has elapsed between events, while sequence analysis only concerns itself with the fact that one event follows another. In the example above, the shopper may take two minutes or 10 minutes to buy ketchup. A time series analysis would take this into account, while a sequence analysis would not.
Careful timing and synchronization can be critical in an IoT context; sequence is important but timing between the events is key. For example, many cars are now equipped with an automatic braking feature to stop the car before a collision with an obstacle can occur. To detect the obstacle and brake, each device in a time sequence must do its job at the appropriate time interval.
Anomaly detection looks for what should be flagged as problematic. Using this technology, data scientists are able to identify what is “normal” activity. When something does not fit a pattern, it suggests something is wrong and might require attention. Often anomaly detection is informed by association/sequence rule sets and time series forecasting. Or, it may be informed by any number of other algorithms within data science that may work together to establish what needs attention.
Anomaly detection is often used to detect fraud and cyberthreats, which often follow abnormal patterns. Statistical deviations in device-to-device communication or in computational activity may suggest nefarious activity is in play. Remedial measures may be needed to properly address the potential threat. These three approaches are but a few to make sure the aggravated beehive is kept under control.
Accommodating Security and Privacy
Security and privacy are important aspects of a well-run commercial system. However, for many, security and privacy are a necessary evil and often low on the list of priorities. As well, in IoT systems where there are tightly coupled devices communicating at lightning speed across complex networks, speed and bandwidth are vital—and security and privacy often compete with speed and bandwidth.
When adding security measures to any electronic process, time and bandwidth are impacted. For example, encryption of data can significantly slow execution. Also, adding encryption to a process increases the amount of memory required to execute the process, limiting bandwidth.
Analysis overhead is also increased when private data must be identified and properly handled. This is especially true when healthcare and financial data are being pushed through any number of devices and systems at hyperspeed.
In adapting big data tools to IoT, the following factors will remain important:
Establish Uniform, Open Standards
In order to bring so many technologies and standards together into a single functioning purpose, it will be necessary to expose these standards so they can be examined as broadly as possible. Confidential, proprietary standards can frustrate efforts to maintain an open environment; this complicates device interoperability and cross-team communication. Often, nondisclosure agreements are used to balance open discussion and confidentiality in these contexts, but this approach can become unwieldy and unrealistic. So, open, public standards should be fully expanded and exploited wherever possible.
Use of Cloud Computing
Analysis of data in an IoT application can occur on the device or at any point along the data’s path. However, where many devices are communicating and capability is dependent on culminating their information, use of a cloud to receive and analyze the data will likely be necessary. As many in the technology and law community know, cloud computing introduces its own legal issues. Data ownership, records retention, and privacy laws are a few examples. The IoT community should expect use of the cloud to remain a central component of IoT applications.
Keep It Simple
Often, there are very elegant and refined ways to optimize a process. In IoT, there are bound to be such approaches. However, very often in data science, more complex approaches are abandoned for simple and straightforward techniques simply because they are easy to understand. Many lawyers are not technical and must understand what is happening to assist clients. Complexity introduces confusion and unknown potential error. As such, IoT designs should err on the side of simplicity.
Mix of Hardware and Software Solutions
In order to complete an end product in IoT, it will be much more the rule than the exception that different hardware and software tools will be used. Collaborative development environments will need to be leveraged so that the depth and breadth of these tools can be brought together. This will introduce new challenges involving trade secrets, security, and interoperability.
IoT and big data are inextricably intertwined. Data science solutions will remain central to the success of IoT applications, and legal issues will certainly play a leading role in keeping commercial interests viable and information governance principles met. Open standards and close attention to interoperable efficiency will help meet these goals.