MAY-JUN 2018

Issue link:

Contents of this Issue


Page 12 of 53

COVER STORY INTECH MAY/JUNE 2018 13 tury" article back in 2012, more recent articles and anecdotes from end users tell a different story. The issue is that while data scien- tists know their algorithms, they do not know plant processes and context. There has been a more recent spate of articles on the need for data translators or data liaisons between data science and engineering teams. But all of this can be avoided if vendors simply close the gap and bring data science innova- tion to engineers by creating features that enable self-service, advanced ana- lytics for engineers and other subject- matter experts (figure 4). The strategy cannot end with engi- neers, however, because self-service is what engineers have been doing for 30 years with spreadsheets. Therefore, the new generation of advanced analyt- ics for big data must empower teams and networks of employees that rely on production and operations insights within the organization. If that sounds like fancy language for dashboards and reports, there is a critical difference. The key change is maintaining a con- nection between the analysis that is created and the underlying data set, so users can click through and get to the underlying data. These advanced ana- lytics offerings can be used to produce not just pictures of data in visualiza- tions but can also provide access to the analytics and sources that generated the outputs. Engineers, teams, man- agers, and organizations can therefore 2004 and became the basis for Hadoop, which was later commercialized by vendors such as Hortonworks. At the same time, Google did not expose the MapReduce API to users as the interface to their search engine. Instead, they presented the algorithm's functionality in a simple web page where any customer could simply search for whatever they wanted by just typing in data in plain English. This approach to wrapping complex functionality in easy-to-use interfaces is a common experience in our lives as consumers, and these same approach- es are now being adopted by analytics offerings for engineers in process man- ufacturing. For example, the ability to "search like Google" across all the tags in a his - torian or other big data storage system is now available in some advanced ana - lytics software. Other capabilities that make big data innovations more easily accessible are similarly delivered. This enables (never allows) engineers to work at an application level with pro - ductivity, empowerment, interaction, and ease-of-use benefits. The ability to transform complex data science programming to features easily used by engineers is a critical capability of the advanced analytics offerings. Although there has been much excitement about data scientists and their role in improving production outcomes, such as the Harvard Busi - ness Review's "Sexiest Job of the Cen- is adding context or information about the data as attributes of a time range. This could be data stored in another source, for example, the periods of time defined by a batch stage or asset state in a manufacturing execution system (MES) or computerized maintenance management system. The context could be within the time series data itself, defined by when a reading is above or below a certain threshold. Or it could simply be time periods of interest, for example, when a signal "looks like this," with context created to define when a shape or pattern is present in a signal. In each of these cases, context is add- ed to identify the time periods of inter- est. Once identified, these time periods can be combined to create a new set of time periods describing an exact, mul ti - dimensional data set for analysis (fig - ure 3). With new big data capabilities, there need not be any bounds to the depth or number of "stacked" layers re- quired, up to 15 or more sequential lay- ers of criteria in some cases. With most analytics efforts requiring integration of data from five to seven different sources, this is a critical advantage over current approaches. With unlike data types, in particular time series and relational data sources, advanced analytics can get off to a slow start by requiring extensive manual mapping of data types, not to mention data cleansing and other aspects of data preparation. But with recent innova- tions, underlying big data technologies provide this type of data connectivity, alignment, and mapping to accelerate the definition and modeling of complex operations. What was once the month- long job of programmers and applica- tion programming interfaces (APIs) can now be features any process engineer can implement in minutes. Delivering self service In the early stages, big data meant pro- grammers writing code to map the an- alytics of a large data set to a cluster of compute nodes, and then to reduce the output from the nodes into a consoli- dated summary. The MapReduce algo- rithm, which defined this programming model, was open sourced by Google in Figure 3. Using Seeq capsules, engineers can combine time periods to create a new set of time periods describing an exact, multidimensional data set for analysis.

Articles in this issue

Archives of this issue

view archives of InTech - MAY-JUN 2018