Solve Industrial IOT Ingestion and Data Processing

Balamurugan Balakreshnan
5 min readJul 20, 2019

One of the challenge I still see is the problem — how to solve ingestion and data processing for industrial IoT for heavy equipment and manufacturing and other industries. So you can ask me, this issue is solved with IoT why not use it?. Great most IoT platform are designed to process that are built with few sensors like home, consumer devices, or simple medical devices, new devices that has more modern and power to process. Most system work when data is streamed or data in motion without complex data processing like looking at historical data to make business decision.

Here is the reality of industrial IoT. Machines can produce lots of data in commercial space as technology keeps advancing. Machines are located in places where they might not have internet or network coverage all the time. Might be regulated due to bandwidth constraint or network issues. How can we retro fit existing machines or devices to collect data and send to cloud for processing. These machines some time can stream the data as it collects and also sends out burst of bulk data if it goes offline for a while. Now processing the streaming data on a stream is not a problem there are so many technologies that work. But here is the issue the streaming technologies don’t work when there is historical lookup for data to make business logic. So the reason here is when we do historical data processing speed or performance is always a constraint. The Streaming system starts to slow down when sudden data increases in their input like burstable data coming after equipment's or devices go offline and coming back online.

Yes we can have a huge database and expensive ones, but still when you look up million rows to pull to do some business logic or go back do processing for million rows to do business logic process is a challenge when it starts to scale. For few instance processing most system works and also economical. But when you have large scale systems like devices and machines that produce million messages a day, Need to go back and look up data for make the current data processing some sense matters to lot of industrial use case. It is not just querying time series and displaying for charts but could be doing of other processing. Yes you can also use historian databases and dump all the data and the issue is as concurrency and volume for each query increases the historian becomes super expensive and slows down.

In our recent years there are application or time series dB's like opentsdb with HBase or Kairos DB with cassandra where they are trying to solve the above problem. But here is the big challenge, skills to maintain systems like above and also is expensive even with software being open source. I thought open source is less expensive. So I was trying to do some research on current technologies with various options. So here is what i found as alternate and promising.

Architecture for Industrial IoT

I choose to look into PaaS as my criteria of choice. Also was looking for the simplest solutions as well. The data collected from the devices or machines can be streamed or burstable (batch) pushed into event hubs(which can be scaled independently). Use Azure data explorer which has inbuilt data ingestion processor for event hub so no need to use any other system or write code. Just configure which event hub to pull the data from and save in Azure Data Explorer. Once the data is saved in Azure data explorer the downstream system use the data for display or trending or use Kusto query to do business logic processing and push it to downstream data storage or data lake or other systems needed. Azure data explorer can also scale horizontal as needed or when the device’s count or machine count increases. We can also use Azure data explorer as temporary or operation data storage for time series for down stream application to process data and store it for other down stream processing applications.

Why Azure Data Explorer: so let’s go back the problem where we need to process business logic per device or machine or bunch of them or all and we wanted to process them in descent time frame to bring the systems to current state when lots of batch data is coming through the network after equipment’s went offline. We need a query engine that can pull historical data faster and also ability to time series data processing or timed based calculation or other interpolation or statically function or other scientific processing. With respect to Industrial IoT where we need to retro fit older machines and devices that collect large volume data and ability to process batch or burstable data ingestion and making the system to be current (like taking hours or days to catch up or process the influx of data coming once in a while and also keeping the cost down).

Azure Data explorer gives us option to store large volume time series data and provides faster way to query the time series. It also goes beyond just querying but also doing data processing like aggregates, windowing and machine learning in time series data. Provides option also for storing metadata for time series and joining as needed and also concurrency. Most of the data is manipulated in memory so the queries are very performant.

The above can break the barrier of having a system which is cost effective but at the same time allows to process data as it comes like streaming and at times when we get bulk data from devices that went offline and coming back, the ability to process those workloads in time and keep the system processing data with out delay is something amazing. I do know that no system can be perfect and infinite but so far what I have seen and heard other use looks very promising. I believe this might be a break through for the industrial IoT large volume use cases.

If you are curious like me please take a look at Azure Data explorer here

please do post your experience and share your story as well.

--

--