AI-based Coresets is a highly efficient set of algorithms that is highly relevant method for IoT network traffic anomaly detection.
As we discussed in our previous blog post it is imperative to discover the first signs of anomalous activity, resulting from various types of malfunctions and/or misconfigurations, before any damage to business operations and key services can take place.
It is obvious that in order to identify and mitigate such threats, a technology that can process huge amounts of data and detect these anomalies is required. The problem is that “traditional” anomaly detection algorithms are not applicable to the challenge at hand- processing huge amounts of data, finding the anomalies and doing it quickly at a low cost.
Most existing solutions have the one or more of the following shortcomings:
- They perform heuristics without provable guarantees for their performance in either running time but usually quality of anomaly detection.
- They have provable guarantees but are totally impractical, usually in terms of running time
- They do not support modern big data computation models such as streaming of unbounded data with very limited memory, distributed computation on the cloud, or parallel computations that uses e.g. GPUs devices.
When it comes to anomaly detection, the situation is worse: Simple distributions that can be learned via classic solutions such as PCA are considered hard when the input may include outliers that are not part of the model and should be identified/excluded, turning this problem to be at the level of NP-hard. Effectively, it means that the available practice is to run exhaustive search over all the possible sets of outliers, and choose the one that minimizes our fitting function after exclusion. Such solutions take considerably long time to execute, rendering this to be infeasible.
How AI-based Coresets Technology Works
AI-based Coresets is a powerful technique which enables the use of smaller sets of data instead of larger ones without compromising the quality of the output. This approach is based on highly efficient set of algorithms that reduce the data input on one hand, and execute anomaly detection routines on the reduced dataset (coreset), on the other hand. This technology can be applied to fixed big datasets, distributed data or to streaming data, which is highly relevant feature when we aim at network traffic anomaly detection.
Mitigating Security Risk With Corset Based AI
Most of the cyber security threats to IoT networks can be effectively mitigated and even prevented using the proposed solution; AI-based anomaly detection service utilizing coresets.
Here are a few examples.
- Denial-of-service attack mitigation: any change to existing network nodes behavior, such as an increase in the amount of data transmitted or in transmission frequency, would automatically be detected by the solution as an anomaly. The solution would then initiate the required action (such as “disconnect device from network”) to be automatically carried out by the operator’s device management systems, thus mitigating any potential damages by neutralizing (disconnecting) the captured (or replicated) nodes. The uniqueness of the coresets-based approach is that it ensures the solution can monitor such changes in real time for massive millions-of-devices networks.
- Communication hijacking / man-in the middle attack mitigation: the solution would be able to compare existing incoming/outgoing messaging vs. past ‘normality’ detecting the first signs of minor changes resulting from the link being hijacked by a malicious attacker. Being able to accurately detect these changes early on, would enable the operator to take the required actions before real damages can take place. The uniqueness of the coresets-based approach includes two main aspects: computation-wise enabling to perform these comparisons across countless potential ‘hijacking junctions’, as well as using the relevant AI detection model (unsupervised AI) to perform such comparisons and be able to detect this type of anomalous activity.
- Note that these examples do not necessarily rely on being able to analyze the actual transmitted data (as in many cases it may be encrypted also at the operator cloud), but on analyzing the communication metadata. In cases where the actual transmitted data is also available, the solution is able to detect attacks that are specifically focused on sending falsified information (for example a captured node smart meter that sends the wrong readings as part of a fraudulent malicious activity).
To summarize, the ability to accurately reduce the data in real-time (coresets), enables to use any proven off-the-shelf detection model (AI, unsupervised machine learning, PCA, clustering…), thus the same solution can be easily used to detect almost any type of cyber threat across different heterogeneous IoT networks. It is the only feasible solution for large-scale, cellular- based IoT deployments.