Software : Least Squares Anomaly Detection
Least Squares Anomaly Detection is a flexible, fast, probabilistic method for calculating outlier scores on test data, given training examples of inliers. The model is controlled by two parameters: sigma (a kernel length scale, controlling how 'smooth' the result should be) and rho (a regularisation parameter, which controls the sensitivity to outliers). The effect of altering these parameters is shown in one of the demos accompanying the Python implementation:
Where there are multiple inlier classes in training data, the method works as a robust classifier, i.e. it can assign to each test datapoint the probability of being in any of the inlier classes and the probability of being in an outlier class. An example provided with the code shows the method being used to classify handwritten digits 0 to 9 given only training examples of digits 0 to 8.
The method can also be applied to detection anomalies in sequences, with a Hidden Markov Model based extension to the static method. An example is included showing inference of abnormalities in an electrocardiagram time series (data from PhysioNet):
The Python software here provides training and inference methods, in a class which is compatible with the scikit-learn package. The class lsanomaly.LSAnomaly() can replace other methods such as svm.OneClassSVM() in any of the scikit-learn outlier detection examples.
>>> import lsanomaly >>> import numpy as np >>> X_train = np.array([[1.1],[1.3],[1.2],[1.05]]) >>> X_test = np.array([[1.15],[3.6],[1.25]]) >>> anomalymodel = lsanomaly.LSAnomaly() >>> anomalymodel.fit(X_train) >>> anomalymodel.predict(X_test) [0.0, 'anomaly', 0.0] >>> anomalymodel.predict_proba(X_test) array([[ 1.00000000e+000, 0.00000000e+000], [ 5.15255628e-103, 1.00000000e+000], [ 1.00000000e+000, 0.00000000e+000]])
Download Python source code
lsanomaly_v1.2.zip (core package and demos)
evaluate_lsanomaly.zip (evaluate performance on several standard datasets)
J.A. Quinn, M. Sugiyama. A least-squares approach to anomaly detection in static and sequential data. Pattern Recognition Letters 40:36-40, 2014.