TECHNOLOGY

Introduction

The state of the art in anomaly detection algorithms is based on models that describe what is a “normal” behavior (i.e., expected distribution of the data), and try to identify exceptions or outliers in the data. These models are usually based on classic distributions, such as Gaussian distribution or Mahalanobis distances that can be detected using corresponding AI optimization solutions such as Principal component analysis (PCA) or mixture of Gaussians, respectively. Modern solutions may also include deep learning for detecting the outliers.

Limitations of Current Solutions

Most existing anomaly detection solutions have the one or more of the following shortcomings:

  • High false positive levels: most existing anomaly detection algorithms perform heuristics without provable guarantees for their performance in either running time and especially quality of anomaly detection
  • Impractical running time: existing anomaly detection algorithms have provable guarantees that are totally impractical, usually in terms of running time
  • Not built for streaming and distributed computing: most existing anomaly detection algorithms do not support modern big data computation models such as streaming of unbounded data with very limited memory, edge computing, distributed computation, or parallel computations that uses GPUs devices.

Coreset AI Anomaly Detection

AI-based coresets is a powerful technique which enables the use of smaller sets of data instead of larger ones without compromising the quality of the output. This approach is based on highly efficient set of algorithms that reduce the data input on one hand, and execute anomaly detection routines on the reduced dataset (coreset), on the other hand. This technology has been applied to fixed big datasets, distributed data or to streaming data, which is highly relevant feature when we aim at network traffic anomaly detection.

Intellectual Property

ShielIOT technology is based on over 12 years of academic research in MIT and Haifa University, and backed by over 35 academic papers and multiple patents, The philosophy behind the AI-based coresets is that running existing algorithms on the reduced dataset will probably give approximate result, as running them on the original big data. This field-proven technique allows us to reduce the data by unprecedented order of magnitude: from n to log(n)! This is done not by designing a new algorithm for solving the problem, but rather by running the existing algorithms on the reduced dataset. 

Tree Construction for Generating Coresets from Data Streams

Unlike other compression techniques like zip or mp4, our coreset is data reduction and not just compression of the input, in the sense that it is problem-dependent. our coreset, optimized for detecting network anomalies, removes most of the noise and therefore they are more effective and faster when executing the algorithms on them.

Research Results

Stastical research  was undertaken to compare ShieldIOT AI error rate (false positive) versus RANSAC, a state of the art anomaly detection algorithm (data: security camera, 114 features per data sample).

ShieldIOT AI Vs RANSAC

The results show that even for a very small number of samples (2500) ShieldIOT exhibited a superior accuracy of more than X5 better, and that the accuracy-level is almost identical to naively running on the entire data set (which is of course not feasible in large IoT networks / memory-sensitive edge devices).

The proven ability to reduce the data into coreset, without losing important and relevant information is at the heart of the innovation of this AI-based anomaly detection solution. Since the coreset is small compared to the original big data, it is possible to execute and run more anomaly detection algorithms in a given time. Thus, the effective data reduction enables significantly better execution time, allowing faster detection and response time to any anomaly over network traffic.

Selected Publications

Refer to these publications for more information about Coreset AI anomaly detection technology.

Data Reduction for Weighted and Outlier-resistant Clustering

Published by Dan Feldman and Leonard J. Schulman.

Coreset for k-means of Streaming Lines

Published by Dan Feldman and Yair Marom