# Anomaly Detection Notes

### Introduction

Wikipedia says

It is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.

This definition perfectly tells what an anomaly detection is. It is identifying the datapoints which are quite different from most of the data.

### Importance

In today’s big data world, every organisation has dataset of sizes where humans can’t manually look into them and derive interesting conclusions, thus we need computers to do it for us. But interesting stuffs can we derive from this data purely depends on the type of data. For monitoring data, we can find out things like software regressions caused by new version release, increase in traffic due to DDOS attacks, someone trying to exploit software vulnerabilities and many more which we maynot even guess at this point of time.

### Leveraging basic statistics

We need to do some level of aggregation before applying any anomaly detection algorithm.

Marking datapoints outside three-sigma as outliers. This method can only be applied to normal distribution dataset. Hence we need to figure see if the dataset is normally distributed (bell-curve). The fundamental flaw with three-sigma method is it looks at mean of all the past data.

But most of the data isn’t normally distributed but vary seasonally. I think this is very common in production services in organisations for metrics like web traffic. This mean we need to consider “moving” mean for such case. This is called moving average where the average is calculated only on a moving timeframe like past 1 hour datapoints. We also generally smooth the data over this sliding window before calculating the moving average.

Within moving average, one can do simple moving average or weighted moving average. In simple moving average, each datapoint in the sliding window has same weightage i.e each datapoint contributes equally in the average calculation. While, in weighted moving average, recent data has higher weightage in calculating the average.

For smoothing, we have exponential smoothing (holt linear), double exponential smoothing and triple exponential smoothing (holt-winters).

Also, we can compare current datapoints with last/last-to-last week for such seasonal data.