QT-EWMA: QuantTree for change detection in multivariate datastreams
Abstract We introduce QuantTree Exponentially Weighted Moving Average (QT-EWMA), an effective online and nonparametric change-detection algorithm for multivariate datastreams. We model the initial data distribution by a QuantTree histogram [Boracchi et al. 2018] , and define a novel statistic based on the Exponential Weighted Moving Average (EWMA) chart. The properties of QuantTree guarantee that the distribution of our statistic is independent from the data distribution, enabling us to define thresholds to control the expected time before a false alarm, or Average Run Length (ARL0). We also introduce QT-EWMA-update, a variant specifically designed to cope with small training sets. Our experiments demonstrate that QT-EWMA and QT-EWMA-update are most powerful sequential monitoring schemes for multivariate datastream that can control ARL0.
Fig.1: The EWMA statistic of QT-EWMA enables change detection in multivariate datastreams (here the datstream is scalar for the illustration sake). QT-EWMA QT-EWMA is a nonparametric online change-detection test based on QuantTree [Boracchi et al. 2018] , and is designed to monitor multivariate datastreams. In particular, QT-EWMA models the initial data distribution from a training set by constructing a QuantTree histogram. Then, QT-EWMA monitors the proportion of samples from the datastream falling in each bin of the histogram by Exponentially Weighted Moving Average (EWMA) statistics. Finally, the QT-EWMA statistic measures the overall deviation of these EWMA statistics from their expected values and a change is reported when this statistic exceeds a threshold. Thanks to the properties of QuantTree, the distribution of the QT-EWMA statistic does not depend on the data distribution, so we can compute thresholds that control the ARL0 on any datastream using Monte Carlo simulations. QT-EWMA-update When the training set is small, a QuantTree histogram cannot adequately model the initial distribution since the estimated bin probabilities might be distant from the actual ones. To overcome this issue, we propose QT-EWMA-update, a modified QT-EWMA where we incrementally update the estimated bin probabilities during monitoring using the incoming data. The properties of QuantTree guarantee that also the distribution of the QT-EWMA-update statistic does not depend on the data distribution, thus we can use the same procedure designed for QT-EWMA to compute thresholds controlling the ARL0 on any datastream.
The main advantages of QT-EWMA and QT-EWMA-update:
Our code is available for download here
|
References [Frittoli et al. 2022] Nonparametric and Online Change Detection in Multivariate Datastreams using QuantTree [Frittoli et al. 2021] Change Detection in Multivariate Datastreams Controlling False Alarms [Boracchi et al. 2018] QuantTree: Histograms for Change Detection in Multivariate Data Streams |