Double MAD?

Adam Cassar

Co-Founder

5 min read
This article explores the use of Double Median Absolute Deviation (Double MAD) for [anomaly detection](/learning/threat-detection/what-is-anomaly-detection/) in time series
data, particularly in skewed or non-symmetric distributions. Double MAD, which calculates two median absolute
deviations  one for data below the median and one for data above  provides a more nuanced approach than traditional
MAD, allowing for accurate detection of anomalies even in skewed data distributions. We also delve into its application
in identifying slow abuse, like bots, by catching lower range anomalies. However, it's important to note Double MAD's
limitations such as not capturing seasonal data shape and trends over time. A comparison is also drawn with the Z-score
method, highlighting that the choice between the two depends on the nature of your data. The article provides insights
into the practical implementation of Double MAD and its potential to improve your data analysis toolkit.

Unveiling the Power of Double MAD in Anomaly Detection

As we tread deeper into the digital age, the importance of leveraging data for informed decision-making is becoming increasingly apparent. Anomaly detection in time-series data is one such vital application. By identifying patterns that deviate from the norm, businesses can proactively take measures to address potential issues or leverage unexpected opportunities.

One powerful technique for anomaly detection is the Median Absolute Deviation (MAD) and, more specifically, its extension, the Double MAD. This article will delve into the world of Double MAD, exploring its utility for anomaly detection in time series data and its application in identifying anomalous clients.

Understanding MAD and Double MAD

MAD, a robust measure of variability, is less susceptible to outliers than standard deviation. It calculates the median of absolute deviations from the data's median, providing a more accurate representation of 'normal' behaviour in datasets with skewed distributions or outliers.

Double MAD is an extension of MAD, where two MADs are calculated — one for the data below the median and another for the data above. This bifurcation of data offers an improved detection process for asymmetric data, which is common in real-world time series data.

Why Double MAD?

While MAD provides a robust way to understand the 'normal' range of a dataset, it assumes a symmetric distribution of data around the median, which may not always hold true. This is where Double MAD shines, offering an enhanced anomaly detection process for skewed or asymmetric datasets.

In time-series analysis, especially with 24-hour cycles like web traffic or server usage, patterns can exhibit seasonality and trend components. These patterns can often be asymmetric, making Double MAD a valuable tool for capturing the variability in different parts of the data.

Using Double MAD in Anomaly Detection

The Double MAD implementation provided uses Rust, a system programming language, known for its speed and memory safety features. The code calculates the lower and upper MAD values, along with their respective thresholds. Anomalies can then be detected by comparing each data point to these thresholds.

An anomaly is defined as a data point that deviates significantly from the expected range. If a data point falls below the lower MAD threshold or above the upper one, it can be flagged as an anomaly. This approach is especially effective when handling datasets with high variability or extreme values.

Double MAD for Anomalous Client Detection

Beyond time-series data, Double MAD can also be instrumental in identifying anomalous behaviour among clients. By comparing each client's behaviour against the Double MAD of the time-series data, one can pinpoint clients that deviate from the norm.

For instance, in the context of web service usage, an anomalous client might be one that is sending an unusually high or low number of requests. By using Double MAD, you can effectively flag such outliers and take appropriate action, like investigating potential misuse or reaching out to understand and address any issues they may be facing.

Detecting Lower-Range Anomalies: A Case of Slow Abuse

An interesting application of Double MAD is in detecting lower-range anomalies, a pattern often associated with slow abuse such as bots or Distributed Denial of Service (DDoS) attacks. These abuses are characterised by an unusually low frequency of activity that is consistent over a prolonged period. This consistent, low-level activity can fly under the radar of typical anomaly detection systems, making it a potentially harmful threat.

By setting a lower MAD threshold, Double MAD can effectively detect these lower-range anomalies, providing early warning of slow abuse. This ability to detect both high and low anomalies makes Double MAD a versatile and powerful tool for anomaly detection.

The Math Behind Double MAD

To illustrate the power of Double MAD, let's consider a dataset from a right-skewed distribution. Applying the conventional MAD approach might lead tofalse positives where normal data points are marked as outliers. This is because MAD uses a symmetric interval around the median, which doesn't account for the skewed nature of our data.

With Double MAD, we instead calculate two MADs — one for the data below the median (MAD-lower) and another for the data above (MAD-upper). Outlier thresholds are then defined using these two MADs. The lower threshold is calculated as the median minus a multiplier (k) times MAD-lower. The upper threshold is the median plus k times MAD-upper.

This approach takes into account the asymmetric nature of our data, thereby providing more accurate anomaly detection. For example, in a right-skewed distribution, Double MAD would correctly identify only the extreme right tail values as outliers without incorrectly flagging data points on the left tail.

Wrapping Up

In an era of big data, being able to accurately detect anomalies in time series data is increasingly vital. The Double MAD approach provides a robust, nuanced method for achieving this, allowing businesses to better understand their data, spot potential issues early, and ultimately make more informed decisions.

Whether you're monitoring web traffic, server usage, or client behaviour, leveraging Double MAD can offer valuable insights and help ensure your operations continue to run smoothly. The ability to detect both high and low anomalies makes it especially powerful, providing protection against potential threats like slow abuse.

Understanding and implementing Double MAD can be a game changer in your data analysis toolkit, providing a more holistic view of your data and enabling you to stay one step ahead of potential anomalies.

Enterprise-Grade Security and Performance

Peakhour offers enterprise-grade security to shield your applications from DDoS attacks, bots, and online fraud, while our global CDN ensures optimal performance.

Contact Us

Related Content

From Research Paper to Running Code

From Research Paper to Running Code

Exploring how AI can dramatically accelerate the process of turning complex academic research into functional code, with examples from anomaly detection to small LLMs.

Advanced Anomaly Detection

Deep dive into Robust Random Cut Forest (RRCF) implementation for real-time anomaly detection in Application Security Platforms. Learn how advanced machine learning algorithms enhance threat detection and automated response capabilities.

Double MAD vs the Rest

A look at the limitations of Double MAD for anomaly detection, and a comparison with the Z-score method, to help you choose the right approach for your data.

Scaling anomaly detection with RRCF

Discusses strategies for scaling the Robust Random Cut Forest (RRCF) algorithm for large-scale anomaly detection, including using summary statistics, buffering input, and parallelisation.

Applied RRCF - thresholding techniques.

Explores various thresholding techniques like Median Absolute Deviation (MAD), Min/Max, and Z-Score for interpreting Robust Random Cut Forest (RRCF) anomaly scores, crucial for classifying data points as normal or anomalous.

What is Account Monitoring?

Back to learning

Account Monitoring is the continuous surveillance and analysis of user account activities to detect security threats, unusual behavior, and policy violations. This proactive security approach tracks user actions, login patterns, and account changes to identify potential account takeover attempts and fraudulent activities.

Monitoring Components

Activity Tracking

Comprehensive …

© PEAKHOUR.IO PTY LTD 2025   ABN 76 619 930 826    All rights reserved.