QT-EWMA: QuantTree for change detection in multivariate datastreams


Abstract

We introduce QuantTree Exponentially Weighted Moving Average (QT-EWMA), an effective online and nonparametric change-detection algorithm for multivariate datastreams. We model the initial data distribution by a QuantTree histogram [Boracchi et al. 2018] , and define a novel statistic based on the Exponential Weighted Moving Average (EWMA) chart. The properties of QuantTree guarantee that the distribution of our statistic is independent from the data distribution, enabling us to define thresholds to control the expected time before a false alarm, or Average Run Length (ARL0). We also introduce QT-EWMA-update, a variant specifically designed to cope with small training sets. Our experiments demonstrate that QT-EWMA and QT-EWMA-update are most powerful sequential monitoring schemes for multivariate datastream that can control ARL0.

Class Distribution Monitoring

Fig.1: The EWMA statistic of QT-EWMA enables change detection in multivariate datastreams (here the datstream is scalar for the illustration sake).

QT-EWMA

QT-EWMA is a nonparametric online change-detection test based on QuantTree [Boracchi et al. 2018] , and is designed to monitor multivariate datastreams. In particular, QT-EWMA models the initial data distribution from a training set by constructing a QuantTree histogram. Then, QT-EWMA monitors the proportion of samples from the datastream falling in each bin of the histogram by Exponentially Weighted Moving Average (EWMA) statistics. Finally, the QT-EWMA statistic measures the overall deviation of these EWMA statistics from their expected values and a change is reported when this statistic exceeds a threshold. Thanks to the properties of QuantTree, the distribution of the QT-EWMA statistic does not depend on the data distribution, so we can compute thresholds that control the ARL0 on any datastream using Monte Carlo simulations.

QT-EWMA-update

When the training set is small, a QuantTree histogram cannot adequately model the initial distribution since the estimated bin probabilities might be distant from the actual ones. To overcome this issue, we propose QT-EWMA-update, a modified QT-EWMA where we incrementally update the estimated bin probabilities during monitoring using the incoming data. The properties of QuantTree guarantee that also the distribution of the QT-EWMA-update statistic does not depend on the data distribution, thus we can use the same procedure designed for QT-EWMA to compute thresholds controlling the ARL0 on any datastream.

The main advantages of QT-EWMA and QT-EWMA-update:

  • both QT-EWMA and QT-EWMA-update can monitor any multivariate datastream since their detection thresholds do not depend on the data distribution nor on the data dimension;
  • both QT-EWMA and QT-EWMA-update can effectively control false alarms by maintaining a target ARL0, namely the expected time before a false alarm, which can be set before monitoring;
  • QT-EWMA-update can promptly detect distribution changes even when the training set available before monitoring is extremely small.

Our code is available for download here


References

[Frittoli et al. 2022] Nonparametric and Online Change Detection in Multivariate Datastreams using QuantTree
Luca Frittoli, Diego Carrera, Giacomo Boracchi
IEEE Transactions on Knowledge and Data Engineering 2022, (Preprint),

[Frittoli et al. 2021] Change Detection in Multivariate Datastreams Controlling False Alarms
Luca Frittoli, Diego Carrera, Giacomo Boracchi
Proceedings of the European Conference on Machine Learning (ECML) 2021
(Preprint), (Code), (Supplementary Material), (Poster)

[Boracchi et al. 2018] QuantTree: Histograms for Change Detection in Multivariate Data Streams
Giacomo Boracchi, Diego Carrera, Cristiano Cervellera, Danilo Maccio'

International Conference on Machine Learning (ICML) 2018
(Paper), (Source Code), (Bibtex).