Peakhour.IO - Residential Proxies

The Real Cost of Breached Credentials

2026-06-19T00:00:00+10:00

The cost of breached credentials is usually counted in the wrong place.

When an organisation suffers a data breach, the obvious costs are incident response, legal work, notification, customer support, remediation, and regulatory attention. Those costs matter. IBM's 2025 Cost of a Data Breach Report puts the global average breach cost at about USD 4.4 million. IBM's data breach explainer also says stolen or compromised credentials were one of the top five initial attack vectors in the 2025 report, accounting for 10% of breaches and taking up to 186 days to identify.

But that is only the first bill.

Once usernames and passwords leave the original system, they do not stay attached to the original incident. They are copied, sorted, bundled, tested, resold, and mixed with other personal data. Another company's breach becomes your login problem. A password reused somewhere else becomes your fraud queue, your support call, your chargeback, your locked account, your angry customer, and your next security review.

That is the real cost of breached credentials: not just the breach, but the long tail of account abuse that follows.

The Roundup: Breaches Are Feeding Account Abuse

The numbers are not subtle.

The Identity Theft Resource Center's 2025 Annual Data Breach Report tracked 3,322 data compromises in 2025, a record high and a 79% increase over five years. The same report found that 70% of breach notices did not include attack information, making it harder for consumers and downstream businesses to understand what risk they now carry.

The ITRC also introduced a category it calls Previously Compromised Data: old stolen data that is repackaged and recirculated. In the full report, the ITRC says there were four major PCD releases in 2025, including two incidents involving roughly 16 billion records with no known notices. Its warning is the important part: while this may not be "new" stolen data, aggregation makes it highly effective for credential stuffing and account takeover attacks.

That matches the operational pattern security teams see on login endpoints. OWASP describes credential stuffing as automated testing of stolen username and password pairs against login forms. The reason it works is boring and persistent: people reuse passwords. Attackers do not need to breach your site if a customer has already reused a working credential somewhere else.

For Australian organisations, the local signals are just as relevant. The OAIC received 532 Notifiable Data Breach notifications between January and June 2025, with malicious or criminal attacks remaining the largest source of notifications. ASD's Annual Cyber Threat Report 2024-25 notes that its credential exposure notification process proactively sent 9,587 credential exposure events to about 220 organisations between 19 November 2024 and 30 June 2025.

None of that means every fraud loss starts with a reused password. It does mean credential exposure is part of the operating environment. Attackers have supply, tooling, proxy infrastructure, and plenty of places to turn account access into money.

The FBI's 2025 IC3 report gives useful context for that monetisation path. Cyber-enabled fraud accounted for 452,868 complaints and USD 17.697 billion in reported losses. Those losses include many crime types, not just credential stuffing, but the transaction paths are familiar to anyone dealing with account abuse: wire and ACH transfers, cards, peer-to-peer transfers, prepaid and gift cards, and cryptocurrency.

Where the Cost Actually Lands

A breached credential is cheap for the attacker and expensive for everyone else.

The first cost is detection. A login using the right username and password does not automatically look malicious. If the attacker spreads attempts across residential proxy infrastructure, uses one attempt per account, or targets mobile API endpoints directly, simple IP-based rate limits may not see the pattern. Peakhour has written about this in The Australian epidemic of Account Takeover attacks and in Credential Stuffing Does Not Stop at the Login Form.

The second cost is fraud. Once a credential works, the attacker looks for value: stored cards, gift cards, loyalty points, refunds, store credit, subscription changes, delivery addresses, and saved payment flows. This is why account takeover is not just an authentication problem. The expensive moment may be checkout, not login.

The third cost is support. Customers do not usually know whether the original password leak happened somewhere else. They know their account was used, their card was charged, their loyalty balance disappeared, or their email address changed. The business still has to handle the support ticket, freeze the account, unwind the transaction, review the evidence, and explain what happened.

The fourth cost is trust. We have covered this before in The Cost of Credential Stuffing: the reputational damage is practical. Customers see refunds, account locks, suspicious messages, and public complaints. Even if the business was not the source of the original breach, it becomes the place where the harm is felt.

The fifth cost is friction. If the only response is to challenge everyone, the business pays through abandonment and customer frustration. If the response is too soft, the business pays through fraud. The work is to apply friction where the evidence justifies it.

You Do Not Need Surveillance to Secure Accounts

There is a bad version of account protection that tries to identify people everywhere they go. That is not necessary, and it is not the right model for this problem.

Credential abuse defence should be scoped to the account security decision in front of you. Is this login using a known exposed credential pair? Is the session coming from suspicious infrastructure? Is it a first-seen device for the account? Is it trying to change email, reset the password, add a payout method, redeem stored value, or check out with saved payment details? Did the same client pattern just test many accounts?

Those questions can be answered with security-specific signals, not advertising-style tracking. Hash the credential check. Treat fingerprints as evidence, not identity. Keep the evidence tied to the protected account and request path. Use network, device, route, behaviour, and credential-risk context to decide whether to allow, step up, throttle, block, or review. Do not build a cross-site identity graph when the job is to stop account abuse on your own service.

That distinction matters. Users should not have to trade privacy for basic account security. Businesses also do not need to choose between doing nothing and adding blanket friction. Contextual security is useful because it lets the response match the risk.

What Teams Should Measure

If breached credentials are a business cost, measure them like one.

Useful measures include:

How many login attempts match known breached credential pairs.
How many breached-credential attempts result in a successful login.
Which routes see the risk: login, password reset, email change, stored-card checkout, gift card redemption, account recovery, mobile API, partner API, or admin access.
How often high-risk sessions move from login into sensitive account actions.
Which signals appear together: breached credential, residential proxy, first-seen device, unusual geography, repeated failure, rapid checkout, or recovery-flow pressure.
How many support tickets, refunds, chargebacks, account locks, and fraud reviews are linked to account takeover.
How many controls create customer friction, and whether that friction is landing on risky sessions or ordinary customers.

This does not need to be perfect on day one. The important step is to stop treating credential stuffing as a vague security category and start treating it as an observable account-risk workflow.

The Control Pattern

The control pattern is layered.

Start with breached credential scanning so reused or exposed credentials are visible at login. Feed that signal into account takeover prevention rather than treating it as a standalone report. Add bot management and advanced rate limiting so automation and distributed testing are harder to run at scale. Use residential proxy detection as a risk signal, especially where attackers are trying to make automated traffic look like normal customer traffic.

Then carry the risk forward after login.

A low-risk page view and a saved-card checkout should not inherit the same level of trust just because the password worked. A session that begins with a breached credential match, comes through suspicious infrastructure, and immediately changes the email address or redeems stored value deserves a different response from a known customer browsing order history.

The response can be graduated:

Log low-risk activity for visibility.
Tighten rate limits on suspicious automation.
Require step-up verification before sensitive account changes.
Hold or review risky transactions.
Notify the customer when high-risk account changes are attempted.
Block sessions when the evidence is strong enough.

That is how breached credential data becomes useful. It is not a panic button. It is a signal that helps decide when trust should be earned again.

The Practical Takeaway

Breached credentials are not only a breach-response issue. They are an account protection issue, a fraud issue, a support issue, and a customer trust issue.

The original breach may have happened somewhere else. The cost can still land on your login form, your checkout, your API, and your support team.

The goal is not to make every login difficult. The goal is to make stolen credentials harder to turn into account control, money movement, stored-value abuse, or customer harm.

That starts by making credential risk visible, connecting it to session and route context, and applying proportionate controls where the cost would otherwise show up.

How Residential Proxies Changed API and Account Abuse

2026-06-19T00:00:00+10:00

Residential proxies have changed the shape of API and account abuse. The old picture was easier to reason about: too many failed logins from one IP, a known hosting provider range, an obvious bot user agent, or a burst that crossed a threshold quickly enough to trip a rule.

That still happens, but it is not the harder problem.

The harder problem is the attempt that arrives through ordinary consumer networks, spreads itself across many addresses, and behaves just slowly enough to avoid looking like an incident. One login attempt here. A password reset probe there. A token refresh pattern that is unusual only when it is seen beside the route, the client, the ASN, the credential history, and the account event.

That is why residential proxy detection should be treated as part of the account and API decision path, not as a standalone allow/block list.

The Account Workflow Is Now a Distributed Target

Attackers do not need to break the whole application at once. They can work through the account surface in pieces:

Login attempts against known usernames.
Password reset initiation and verification.
New account registration.
Token issue and refresh routes.
Payment, address, profile, and email changes.
Loyalty, wallet, checkout, or stored-value workflows.
API calls that reveal whether an account or credential is valid.

Each route may look acceptable in isolation. The risk appears when the pattern is joined together.

A residential proxy network helps the attacker keep that pattern quiet. Requests rotate through many residential-looking exits. IP-based rate limits see different sources. A reputation feed may not have labelled a fresh or private proxy network yet. Geo checks can look plausible enough. The traffic does not necessarily arrive as a clean burst.

This is where static thinking breaks down. If the only question is "is this IP bad?", the answer will often arrive too late or be too blunt to use safely.

Fresh and Private Proxy Networks Create a Timing Problem

Many teams think about proxy detection as a database problem: look up the IP, see whether it is a proxy, then block it. That works for some traffic, especially known data centre proxies and commodity infrastructure.

Residential proxy abuse is less tidy. Fresh networks can appear before public datasets have a confident label. Private networks may not show up in broad feeds at all. Some exit points are shared with legitimate users. Some sit behind carrier-grade NAT or normal household connections. Blocking the address outright can create customer pain, while allowing it without context leaves the account flow exposed.

This is the practical reason Peakhour talks about residential proxy use as a signal. The signal matters, but it has to sit beside IP intelligence, connection characteristics, client history, request behaviour, account state, and route sensitivity.

A residential proxy on a marketing page may only need logging. The same proxy signal on a login route with recent failures may justify a challenge. On a password reset or high-value account change, it may justify step-up authentication, throttling, or blocking depending on the rest of the evidence.

The control should match the risk of the action.

Low-and-Slow Behaviour Is Still Automation

Low-and-slow abuse is uncomfortable because it avoids the easy operational story. There is no dramatic spike. There may be no single IP worth banning. The application may not be overloaded. Support may only see a few confused users, a few locked accounts, or a gradual rise in reset attempts.

For API and account workflows, this is still automation. It just looks less like a flood and more like a background process.

Useful signals include:

Repeated failed authentication across a shared fingerprint or client pattern.
Many accounts touched by similar request timing.
Token or reset routes used out of sequence.
Browser characteristics that do not match the claimed client.
Residential proxy use on sensitive account routes.
Fresh IP or ASN patterns appearing around account events.
Similar request shapes distributed across unrelated accounts.

None of these signals has to prove abuse by itself. The point is to combine them early enough that the application does not have to make the decision alone.

Peakhour's view is that proxy detection belongs in the same operating model as bot management, rate limiting, account risk scoring, and event evidence. The useful question is not "can we block every residential proxy?" It is "what should this route do when proxy use appears with this account, this client, this credential pattern, and this recent behaviour?"

API Routes Need the Same Treatment as Browser Flows

A common gap is protecting the visible login page while leaving API routes with weaker controls. Browser-side checks can help on web flows, but many account actions now happen through mobile apps, single-page applications, partner integrations, and backend APIs.

Those routes still need context. They need request-level validation, route-aware thresholds, proxy and IP signals, token checks, and evidence that can be reviewed later. A login API, a reset API, and a profile-change API should not all receive the same action just because the source address has the same reputation.

This is also why rate limiting has to move beyond source IP. A rule can key on a token, header, fingerprint, account identifier, route, response code, or a combination of signals. That makes it possible to slow failed login behaviour without punishing every legitimate user behind the same network.

The background reading on proxy detection challenges and quantifying residential proxy risk covers the broader detection problem. For API and account teams, the immediate step is more operational: find the routes where a residential proxy signal should change the action.

The Right Outcome Is Controlled Friction

Residential proxy detection is not a magic verdict. It is a way to make the account decision more honest.

Some traffic should pass. Some should be logged. Some should be rate limited. Some should be challenged. Some should be blocked. The difference should come from route sensitivity, request context, and observed behaviour, not from a single IP label.

A practical policy might look like this:

Monitor proxy use across all account and API routes.
Apply tighter thresholds on login, reset, token, and account-change routes.
Combine proxy use with credential, client, rate, and behaviour signals.
Preserve decision records so security and support can explain what happened.
Move from monitor to enforce only after reviewing false positives and customer impact.

That model gives teams a way to respond to residential proxy abuse without turning every shared residential network into a casualty.

For a grounding definition, see What is Residential Proxy Detection?. For the product control, see Residential Proxy Detection.

The important shift is simple: residential proxies are not just a network category. In account and API protection, they are context for deciding how much trust a request deserves.

Anatomy of a Credential Stuffing Attack

2025-09-01T00:00:00+10:00

In early 2024, major Australian retailer The Iconic was hit by a widespread account takeover attack. Fraudsters used stolen credentials to log into customer accounts, place orders with stored credit cards, and ship goods to different locations. The incident caused significant reputational damage and financial loss, forcing the company to issue refunds and publicly address the security breach.

This attack wasn't the result of a direct hack on The Iconic's systems. It was a classic case of credential stuffing: an automated attack that works because people reuse passwords across services. This article breaks down how credential stuffing works, the attacker's toolkit, the business impact, and the controls that make it harder to run at scale.

What is Credential Stuffing?

Credential stuffing is an automated attack where malicious actors use lists of stolen usernames and passwords—often obtained from third-party data breaches—to gain unauthorised access to user accounts on other websites. The attack works because many users recycle the same password across multiple online services. If a password for a user's social media account is leaked, attackers will "stuff" that same email and password combination into the login forms of e-commerce sites, banking portals, and other high-value targets.

Because attackers submit valid credentials, even though they are stolen, these login attempts can be difficult to distinguish from genuine user activity. That makes credential stuffing harder for traditional security controls to spot.

The Attacker's Toolkit

Modern credential stuffing is not a manual process. Attackers use a mature set of tools and resources to automate and scale their campaigns:

Automation Software: Tools like OpenBullet are central to these attacks. OpenBullet is a powerful, open-source web testing suite that allows even non-programmers to create complex attack scripts. Attackers can find or create "configs" that tell the software exactly how to interact with a target website's login form.
Breached Credential Lists: Dark web markets carry massive databases of usernames and passwords harvested from data breaches. These "combo lists" are the raw material for credential stuffing attacks and can be purchased for very little cost.
Proxy Networks: To avoid being blocked, attackers distribute their login attempts across thousands or even millions of IP addresses. They often use residential proxy networks, which route traffic through the internet connections of real home users. This can make malicious traffic appear to come from legitimate customers, weakening IP-based blocking and rate limiting.

The Business Impact

The consequences of a successful credential stuffing attack extend beyond the login event:

Direct Financial Loss: As seen with The Iconic, attackers can make fraudulent purchases, drain loyalty points, or transfer funds, leading to direct financial losses and the cost of refunding customers.
Damage to Brand Reputation: Publicly reported breaches erode customer trust. Users who have been defrauded may share their negative experiences on social media, leading to lasting reputational harm.
Loss of Customer Trust: When customers believe their accounts are not secure, they may abandon the platform altogether, leading to customer churn and a decline in lifetime value.
Operational Costs: Responding to an attack involves significant operational overhead, including customer support time, fraud investigation, and new security measures.

Building a Multi-Layered Defense

Stopping automated attacks requires a defence strategy that goes beyond simple password policies. A modern, multi-layered approach should include:

Advanced Bot Protection: The first step is to distinguish bots from humans. Modern bot management solutions use techniques like network and browser fingerprinting and behavioural analysis to detect automated login attempts, even when they mimic human behaviour.
Check Credentials Against Breach Databases: Proactively check usernames and passwords used in login attempts against comprehensive databases of known breached credentials. If a credential pair is known to be compromised, you can flag the login for additional verification or alert the user to change their password.
Advanced Rate Limiting: Traditional IP-based rate limiting struggles against distributed attacks. Advanced rate limiting groups requests by more stable identifiers, such as a TLS fingerprint, which can remain consistent even as an attacker rotates through thousands of IP addresses. This helps track and block a single malicious actor launching a distributed attack.
Enforce Multi-Factor Authentication (MFA): MFA is not a silver bullet, but it provides a critical layer of security by requiring a second form of verification. Websites should strongly encourage or enforce MFA, especially for sensitive actions like changing account details or making purchases.

By combining these controls, organisations can make credential stuffing harder to scale, protect user accounts, and reduce the business risk when attackers test stolen credentials.

Beyond the IP Address

2025-09-01T00:00:00+10:00

For years, rate limiting has been a standard control for protecting websites and APIs from abuse. The basic model is simple: limit the number of requests a single "user" can make in a given period. If a user exceeds the limit (e.g., 10 login attempts in a minute), they are temporarily blocked.

The hard part has always been identifying that "user". Traditionally, the answer was the IP address. The assumption was that one IP address equaled one user. In the early days of the internet, this was a reasonable approximation. Today, that assumption no longer holds, and it leaves systems exposed to modern attacks.

The IP address is no longer a reliable identifier for a single user or device. There are three common reasons:

Proxy Networks: Attackers don't use a single IP address. They use large residential proxy networks to rotate requests through thousands or even millions of different IP addresses, making each request look like it comes from a new user.
Shared IPs (CGNAT): At the same time, a single IP address can represent thousands of legitimate users. Mobile carriers use Carrier-Grade NAT (CGNAT) to make many mobile devices share the same public IP. Similarly, an entire office building or university campus might appear to the internet as a single IP.
Distributed Attacks: Modern automated attacks, like Layer 7 DDoS or credential stuffing, are inherently distributed. Attackers use botnets or proxy networks to spread their attack across a large number of IPs, so no single IP ever exceeds a traditional rate limit.

Blocking a shared IP because of one bad actor can cause collateral damage, denying access to thousands of legitimate users. On the other side, failing to see that thousands of IPs are part of a single coordinated attack means the attack succeeds. Traditional IP-based rate limiting is no longer enough.

The New Way: Advanced Rate Limiting

Advanced Rate Limiting addresses this by moving beyond the IP address. Instead of grouping requests by a single, unreliable identifier, it lets you count requests using more stable and meaningful characteristics of the connection or the software making it.

This approach groups requests using identifiers like:

TLS/HTTP2 Fingerprints: Every client application (like a browser or a script) has a unique "fingerprint" based on how it initiates a secure connection (TLS) or communicates over HTTP/2. This fingerprint remains consistent even as an attacker rotates through thousands of IP addresses. By rate limiting based on the TLS fingerprint, you can track and block the underlying automation tool itself, not just the IPs it uses.
Device Characteristics: A fingerprint can be constructed from a range of attributes, including the device's operating system, browser version, and more. This allows for the detection of repeated requests coming from the same class of device.
A Combination of Headers: For authenticated APIs, you can group requests by an Authorization header or API key, enforcing fair usage and preventing abuse by a single authenticated client.

Practical Use Cases

The value of advanced rate limiting is clearest when it is applied to real-world threats:

Mitigating Distributed Credential Stuffing: An attacker using a tool like OpenBullet launches a credential stuffing attack against your login page, rotating through thousands of residential proxy IPs. Traditional rate limiting is ineffective here. However, the OpenBullet software has a consistent TLS fingerprint. By setting a rule to limit failed login attempts per TLS fingerprint, you can detect and block the entire distributed attack, regardless of how many IPs are involved.
Protecting APIs from Abuse: A partner is abusing their API key, sending far too many requests and degrading service for other users. By rate limiting based on the Authorization header, you can enforce usage limits on a per-client basis, keeping access fair without affecting other users.
Stopping Content Scrapers: A scraper is hammering your e-commerce site to steal pricing data. They are using a botnet to distribute the requests across hundreds of IPs. However, the scraping script has a unique combination of a user-agent and a TLS fingerprint. Advanced rate limiting can count requests based on this combined signature and block the scraper, protecting your intellectual property.

When attackers are distributed, your defences need to see the single actor behind the many IPs. Advanced rate limiting provides that visibility and should be part of a modern application security strategy.

The Invisibility Cloak

2025-09-01T00:00:00+10:00

Every time you connect to a website, you leave behind a "digital fingerprint." This is not a physical fingerprint, but a set of signals from your device and browser. Security tools analyse this fingerprint—which includes your IP address, browser type, operating system, supported fonts, and even subtle characteristics of your network connection (TLS fingerprinting)—to distinguish legitimate users from malicious bots.

For years, this was a reliable way to spot automated threats. Bots often had clumsy, inconsistent fingerprints that made them easier to identify. Today, attackers can combine tools that mimic real users closely enough to weaken many traditional defences. The two most important components of this modern "invisibility cloak" are residential proxies and anti-detect browsers.

What Are Residential Proxies?

A residential proxy is an intermediary server that uses an IP address assigned by an Internet Service Provider (ISP) to a real home internet connection. When a bot routes its traffic through a residential proxy, its requests appear to originate from a genuine home user, not a data centre.

These proxy networks are large, often containing millions of IP addresses sourced from around the globe. How are these IPs obtained? Often through questionable means:

Malware and Botnets: Unsuspecting users' devices are infected with malware that turns them into proxy endpoints.
SDKs in Free Apps: Some free applications (often VPNs or mobile apps) include code that enrols the user's device into a proxy network in exchange for using the app, often without the user's full knowledge or consent.

By rotating through this large pool of legitimate-looking IPs, attackers can launch large-scale attacks that are difficult to separate from normal traffic. To a website's security system, a distributed attack from a residential proxy network looks like thousands of individual customers from different locations.

What Are Anti-Detect Browsers?

While residential proxies mask the attacker's network location, anti-detect browsers are designed to spoof the rest of the digital fingerprint. These specialised browsers allow an attacker to create and manage thousands of unique browser profiles, each with a customised and consistent fingerprint.

An anti-detect browser can control and randomise every detail a website uses for identification, including:

Browser type and version (e.g., Chrome, Firefox, Safari)
Operating system (Windows, macOS, iOS, Android)
Screen resolution, fonts, and plugins
Time zone and language settings
Subtle browser characteristics like Canvas and WebGL rendering

With a few clicks, an attacker can make a single machine in one country appear as thousands of unique users on different devices and operating systems from all over the world.

The Combined Threat: A Perfect Storm for Attacks

When attackers combine residential proxies with anti-detect browsers, they cover both the network and browser layers that many controls rely on. The residential proxy provides a legitimate IP address, and the anti-detect browser provides a consistent, human-looking browser fingerprint.

This combination makes attacks like large-scale credential stuffing, content scraping, and inventory scalping much harder to distinguish from legitimate user traffic. Each malicious request appears to be from a unique person on a standard device, using a normal home internet connection.

Why Traditional Defenses Fail and What to Do About It

This level of sophistication weakens traditional security measures:

IP Blocklists and Reputation Services: These struggle when attackers are using a constantly rotating pool of millions of legitimate residential IP addresses. Our own research shows that even the best IP intelligence services fail to detect the vast majority of residential proxy traffic.
Basic Browser Fingerprinting: Anti-detect browsers are specifically designed to defeat these checks by providing a consistent and realistic fingerprint.

To combat this combined threat, organisations need a modern approach to bot detection that looks beyond the surface:

Advanced Network Fingerprinting: Instead of just looking at the IP address, modern solutions analyse the underlying characteristics of the network connection itself (like the TLS/JA3 fingerprint). These signatures can often identify the underlying automation tool or proxy network, even when the IP address appears legitimate.
Behavioural Analysis: Advanced systems model normal user behaviour—such as mouse movements, typing speed, and page navigation—to identify the subtle, non-human patterns of automation that even sophisticated bots can't perfectly mimic.
Hardware and Rendering Fingerprinting: While anti-detect browsers can spoof software-level details, faking the underlying hardware is far more difficult. Advanced techniques, such as those used in Google's Picasso, analyse how a device renders graphics (e.g., Canvas and WebGL), processes audio, and performs CPU-intensive tasks. This creates a hardware fingerprint based on the unique characteristics of the GPU, audio stack, and CPU clock speed. This fingerprint can reveal inconsistencies between the claimed browser profile and the actual hardware being used. When combined with network fingerprinting and residential proxy detection, this becomes a strong signal for identifying a single machine attempting to impersonate many different users.
Dedicated Residential Proxy Detection: Specialised techniques are required to identify traffic coming from residential proxy networks. This is a critical signal, as very few legitimate users have a reason to route their traffic this way.

Attackers using residential proxies and anti-detect browsers are harder to identify, but they still leave signals. Network characteristics, hardware fingerprints, and the behavioural tells of automation give security teams a better chance of separating the bot from the user it is trying to resemble.

Key Considerations for Effective Bot Management

2025-09-01T00:00:00+10:00

Introduction

Bots account for a large share of web traffic. Recent studies estimate that nearly 50% of all internet traffic is generated by automated programs. Some bots are necessary for the web to function, such as search engine crawlers, but a significant portion are malicious. These "bad bots" are used for content scraping, credential stuffing, spam, and DDoS attacks.

As bot operators become more sophisticated, bot management needs to cover detection, classification, and response. This article outlines the main considerations for security teams protecting intellectual property, online revenue, and user accounts.

The Goal: Accurate Bot Detection and Classification

The first step in effective bot management is separating legitimate users from automated threats. Identification is not enough on its own. Security teams also need accurate classification across good, bad, and "grey" bots.

Good Bots: Support normal internet operations, such as search engine crawlers (Googlebot, Bingbot) and performance monitoring bots.
Bad Bots: Carry out malicious activity such as content scraping, account takeover, and spamming.
Grey Bots: Serve a legitimate purpose but can cause problems when they crawl too aggressively, such as SEO and marketing bots (Ahrefs, SEMrush).

Effective detection usually needs more than basic signatures. A layered approach commonly includes:

Basic Protection: Targets simple bots using user agent checks and IP reputation databases.
Intermediate Protection: Uses JavaScript-based challenges and basic network fingerprinting, such as JA3/JA4, to detect less sophisticated bots.
Advanced Protection: Combines comprehensive network fingerprinting, behavioural analysis, and machine learning to identify sophisticated bots that mimic human behaviour, use residential proxies, or rely on anti-detect browsers.

Machine learning models help in this context because they can learn from changing bot strategies and inspect incoming traffic for subtle signs of automation.

The Method: Continuously Adaptive Detection and Response

Bot behaviour changes quickly. Threat actors modify tooling, traffic patterns, and infrastructure to avoid detection, so static defence rules degrade over time. Organisations need detection and response that can adapt as the attack changes.

That means correlating metadata with behavioural factors in real time, then applying the right response for the risk. When a bot attempts account takeover or data scraping, an adaptive response can act immediately to reduce the impact.

Effective adaptive responses include:

Advanced Rate Limiting: Goes beyond simple IP-based limits by grouping requests with more stable identifiers, such as TLS/HTTP2 fingerprints or device characteristics. This helps stop distributed attacks from tools like OpenBullet that rotate through thousands of IP addresses.
Web Application Firewalls (WAF): Provide an important first line of defence by filtering harmful Layer 7 traffic based on predefined rules.
Tarpitting: Slows malicious connections to increase cost and resource consumption for attackers.
Challenges: Traditional visible CAPTCHAs can harm user experience and are often solvable by modern bots. Invisible challenges can verify a legitimate browser environment with less friction.
Alternate Content Serving: Misleads scraping bots by serving alternate or cached content with incorrect information (e.g., higher prices), making their scraped data useless.

The same response process should also feed learning loops, building a repository of bot attack patterns that can train machine learning models and improve accuracy over time.

The Expected Outcomes: A Resilient Security Posture

An adaptive bot management strategy should support several practical outcomes:

Risk Mitigation: Reduce potential financial losses, service disruption, and data breaches associated with malicious bot activity such as credential stuffing, ad fraud, and inventory hoarding.
Improved User Experience: Keep disruption low for genuine users by using invisible challenges and behavioural analysis instead of frustrating CAPTCHAs, which can reduce conversions by up to 40%.
Intellectual Property Protection: Protect valuable content, pricing data, and other intellectual property from unauthorised scraping.
Online Revenue Security: Protect online revenue streams by preventing fraud, inventory scalping, and other malicious activity that targets e-commerce platforms.
Regulatory Compliance: Help organisations meet data protection and privacy regulations with a proactive bot management approach.

Conclusion: Fortifying Against Sophisticated Bots

Modern bot defence depends on accurate detection, precise classification, and adaptive response. Machine learning, comprehensive network fingerprinting, and behavioural analysis all contribute, but they work best as part of a layered control set.

With that approach, security teams can better protect intellectual property, online revenue, and user accounts from sophisticated bot activity.

The Bot Spectrum

2025-09-01T00:00:00+10:00

The word "bot" is often used as shorthand for unwanted automation: scripts trying to break into accounts, scrape content, or overwhelm websites. A large share of internet traffic does come from bad bots, but automated traffic is not automatically harmful. Some bots are part of how the web is discovered, monitored, and kept usable.

Effective bot management is not about blocking every automated request. It depends on accurate classification: separating good bots from bad bots, and recognising the "grey" bots that sit between them. That classification lets you apply controls that reduce risk without cutting off traffic that helps your site operate.

Good Bots: The Essential Workers of the Web

Good bots are automated programs that perform useful or necessary tasks. They are usually clear about who they are and respect the rules you set in your robots.txt file. Blocking them can damage search visibility, monitoring, or other business workflows.

Examples of Good Bots:

Search Engine Crawlers: Bots like Googlebot and Bingbot are the best-known good bots. They crawl and index your website's content, which is how your pages appear in search engine results. Blocking them would make your site invisible on Google.
Performance Monitoring Bots: These bots are used by services to check your website's uptime and performance from different locations around the world, and to alert you if your site goes down.
Copyright Bots: These bots scan the web for plagiarised content, helping to protect your intellectual property.

Management Strategy: Good bots should be identified and allowed to access your site freely. Verification techniques, such as reverse DNS lookups, can be used to confirm that a bot claiming to be Googlebot is actually coming from Google.

Bad Bots: The Malicious Actors

Bad bots are designed for malicious activity. They are a major reason bot management exists as a security function. These bots are deceptive, often hiding their identity and purpose, and they can be responsible for a wide range of costly and damaging activity.

Examples of Bad Bots:

Credential Stuffers: These bots use stolen usernames and passwords to carry out account takeover attacks.
Content and Price Scrapers: These bots steal your valuable content, product listings, and pricing data, often for use by competitors.
Spam Bots: These bots flood comment sections, forums, and contact forms with unwanted ads or malicious links.
Denial of Service (DDoS) Bots: These bots are part of a botnet used to overwhelm a website with traffic, causing it to slow down or crash.
Inventory Hoarding Bots: Common in e-commerce, these bots automatically add limited-edition products to shopping carts to prevent legitimate customers from buying them, often for resale at a higher price (scalping).

Management Strategy: Bad bots need to be accurately identified and blocked as quickly as possible, ideally at the network edge before they consume your server resources.

Grey Bots: The Nuanced Category

Grey bots are not inherently malicious, but their behaviour can still cause problems. They often serve a legitimate purpose, but become an issue when they crawl too aggressively, consume excessive bandwidth or server resources, and slow the site down for real users.

Examples of Grey Bots:

Aggressive SEO Tools: Bots from marketing tools like Ahrefs, SEMrush, and Majestic crawl websites to gather data for backlink analysis and competitive research. They can be useful, but their crawling can also be heavy.
Partner and Aggregator Bots: These could be bots from partner companies or price comparison websites that need to access your data. The activity may be legitimate, but it still needs to be managed.
Feed Fetchers: Bots that collect data for news aggregators or other applications fall into this category.

Management Strategy: Grey bots require more than a simple allow or block rule. The best strategy is often to rate-limit or tarpit them.

Rate-Limiting: This allows the bot to continue accessing your site, but slows it to a manageable level so it does not overwhelm your servers.
Tarpitting: This intentionally slows the connection for a specific bot, increasing the cost and time required to crawl your site and discouraging overly aggressive behaviour.

By classifying incoming bot traffic and applying the right control for each category, organisations can block threats, manage resource consumption, and allow the useful automation the modern web depends on.

A Complete Guide to SMS Pumping Fraud

2025-03-13T14:00:00+11:00

The Growth of SMS Fraud

SMS pumping fraud is a costly online abuse pattern, with global losses reaching an estimated $6.7 billion in 2021 alone. It targets companies that rely on SMS for verification or customer communications, leaving them to pay for traffic they did not request.

The scheme relies on malicious actors and dishonest telecom operators working together to generate and monetise large volumes of fraudulent text messages. For businesses caught in these schemes, the financial impact can be severe. Twitter (now X) reportedly lost $60 million to this type of fraud.

This guide explains how SMS pumping works, which businesses face the highest risk, and the controls your organisation can use to reduce exposure.

Understanding SMS Pumping Fraud

SMS pumping (also called SMS toll fraud, SMS spamming, or Artificially Inflated Traffic) involves manipulating mobile networks to inflate charges for text messages. The term "pumping" describes fraudsters forcing high SMS volume through a target's systems.

This fraud exploits how SMS messages travel and get billed across phone networks. Attackers target companies that use SMS codes to verify users. Each time a business sends a verification code, it pays a fee. Fraudsters trigger these systems to send thousands of messages to numbers they control.

These attacks create direct costs for businesses and revenue for the attackers. The fraud works through coordination between criminals and corrupt telecom operators, who charge premium rates for message delivery and share the proceeds.

The fraud has changed as more businesses have adopted SMS verification. Attackers keep developing new methods, and the phone industry has not removed the risk. Many companies still carry the financial exposure.

How SMS Pumping Works

SMS pumping attacks usually exploit message systems through these steps:

Finding Targets: Attackers look for websites or apps that send SMS codes for account verification or password resets.
Creating Fake Requests: Fraudsters use automation to send thousands of code requests to phone numbers they own or control.
Hiding Their Tracks: Attackers change their IP addresses and device information so requests appear to come from real users.
Sharing Profits: Fraudsters work with dishonest phone companies that charge high fees when messages pass through their networks. These companies then share the money with the attackers.
Using Complex Routes: Messages travel through many networks before reaching their destination, making the source of the fraud harder to trace.
Targeting Expensive Routes: Attackers focus on international numbers where sending messages costs more or where rules are weaker.

These attacks look legitimate because each message contains a real code sent to what appears to be a normal phone number. Companies like Twilio or Bird must pay fees to deliver these messages. Most businesses only find out about the fraud when a large bill arrives from their SMS service.

SMS pumping differs from basic spam because the profit-sharing between attackers and phone companies creates a direct cost for the target business.

Businesses at Risk

SMS pumping is most likely to affect these types of businesses:

Financial Institutions

Banks, investment platforms, and cryptocurrency exchanges use SMS codes to protect accounts. These firms send thousands of codes each day, which makes it hard to spot fake requests mixed with real ones.

E-commerce Platforms

Online shops use SMS messages when users create accounts, reset passwords, or make purchases. These businesses often run on small profit margins, so extra SMS costs can hurt their earnings. High volumes of new users make it easier for attackers to hide their activity.

Social Media Companies

Social networks use text messages to check user identity and stop fake accounts. These companies send millions of codes each day to users around the world. Twitter lost $60 million from this type of fraud, showing the scale these bills can reach.

Software-as-a-Service (SaaS) Providers

These companies often offer free trials that require SMS verification. They plan for a set cost to acquire each new user, but fraud can push these costs much higher than expected.

Telecommunications Companies

Phone companies face two problems: their own systems can be attacked, and parts of their network might help fraudsters. They need strong monitoring tools to find unusual patterns in message traffic.

Small Businesses and Startups

While smaller firms send fewer messages, they often lack security teams and fraud detection tools. This makes them easier targets. The cost of an attack can put these businesses at risk of closing down because they have less money in reserve.

Advanced Attack Methods

Attackers now combine SMS pumping with other techniques to avoid detection.

Credential Stuffing

Fraudsters use passwords stolen in data breaches to break into accounts. Once inside, they change phone numbers to ones they control and trigger verification messages. This makes fraud appear to come from real users.

Peakhour's breach database detection identifies when stolen credentials are used to access accounts. The system flags these attempts before phone numbers can be changed, stopping the attack chain.

Residential Proxy Networks

Unlike data centre proxies that security systems can often spot, residential proxies hide attack traffic behind home internet connections. This makes fraud look like it comes from regular users in different locations.

Peakhour specialises in residential proxy detection. Its technology identifies these masked connections and blocks them before verification requests can pass through. The system maps known proxy networks and detects signs of traffic passing through residential IPs.

When combined with device fingerprinting, these protections create a stronger defence. Fingerprinting tracks device characteristics that remain consistent even when attackers change IP addresses or accounts. Peakhour's fingerprinting technology works without cookies, making it effective against attackers who clear browser data.

These methods focus on the techniques fraudsters use to hide their identity. With Peakhour's protection, businesses can detect and block these attacks before they trigger costly SMS verification messages.

Historical Incidents

Reported SMS pumping incidents show how quickly costs can build:

Twitter's $60 Million Loss

In January 2023, Twitter owner Elon Musk said the platform lost more than $60 million to SMS pumping fraud. He named over 390 phone companies that took part in the scheme. While Twitter later questioned some claims, the case brought public attention to this type of fraud.

Industry-Wide Financial Impact

The Communications Fraud Control Association reports that SMS pumping caused global losses of $6.7 billion in 2021. Many companies do not share their fraud losses with the public.

Costs to Individual Businesses

Companies hit by these attacks pay between tens of thousands and millions of dollars each month in fake charges. These costs grow fast because each fake message costs much more than normal text rates.

Verification Policy Changes

Because of these threats, many large platforms have moved away from SMS codes. Twitter removed SMS verification for most users in March 2023, stating fraud as the reason.

Operational Disruptions

Beyond the cost of messages, businesses can face service problems during attacks. Real users may not get their codes on time. This can cause users to abandon transactions, contact support more often, and lose confidence in the company.

Rules and Enforcement

Rules to stop these attacks differ around the world. Some telecoms authorities have strict rules and fines for networks that allow fraud, but enforcement remains hard. Fraudsters use complex message routes that cross many countries to avoid getting caught.

Understanding the Stakeholders

SMS pumping involves these key groups:

Businesses

Companies use SMS to check user identity and send updates. They hire SMS gateway providers to handle their messages. When fraud happens, these businesses pay for the fake messages. Most find out about the attack only when they receive an unexpected bill.

SMS Gateway Providers

Companies like Twilio and MessageBird connect businesses to phone carriers. They give businesses tools to send text messages without working with phone networks directly. When fraud passes through their systems, these providers may try to stop it, but still charge businesses for the messages sent.

Mobile Network Operators (MNOs)

These companies run the networks that deliver messages to phones. Most work honestly, but SMS pumping schemes often include corrupt operators who charge extra fees for messages to numbers they control. These operators then split the money with the attackers who started the fraud.

Content Aggregators

These middlemen combine message traffic and work with many carriers to find the best routes. Most run honest operations, but their position in the message chain creates routing and oversight gaps that attackers can use.

Regulatory Bodies

Groups like the GSM Association create rules and standards for the industry. These rules are hard to enforce because phone networks cross many countries with different laws.

Financial Flow

The payment flow starts when businesses pay gateway providers to send messages. The gateway providers then pay fees to network operators based on where messages go. In fraud schemes, inflated fees go to corrupt operators who share the money with attackers. This creates a system where sending more fake messages makes more money for criminals while costing honest businesses more.

Effective Protection Strategies

Protecting your organisation usually requires several controls:

Basic Protections

Rate Limits: Restrict how many verification attempts a user can make in a set time period.
Traffic Pattern Checks: Track normal SMS message patterns and watch for changes that might indicate attacks.
Provider Protection: Services like Prelude's SMS Pumping Protection find and block messages to fake numbers.
Other Ways to Verify Users: Use app-based verification or push alerts instead of SMS codes.
Control by Country: Limit SMS verification to countries where you do business and add more checks for countries with higher fraud risk.
Work with Trusted Partners: Choose SMS service providers that focus on security and can help stop fraud quickly.

Advanced Protection Methods

Residential Proxy Detection: Find and block users who hide their true location behind home networks used as proxies.
Device Fingerprinting: Collect device signals to track users across sessions and spot when many verification requests come from the same device.
User Behaviour Tracking: Learn how real users act on your site and flag unusual actions that might be bots.
Machine Learning Systems: Use systems that learn from data to find hidden fraud patterns and adapt to new attack types.
Phone Number Checks: Use lists of known bad numbers to decide which phone numbers need more verification steps.
Verify in Multiple Ways: Ask users to prove who they are in different ways, such as email plus SMS, to make attacks harder.
Work with Other Companies: Share information about new attack methods and bad phone numbers with other businesses.
Watch Transactions as They Happen: Use systems that can pause message sending when they spot unusual patterns and learn from both legitimate and abusive traffic.

Fighting SMS Pumping Fraud

SMS pumping fraud costs businesses $6.7 billion worldwide each year. Companies like Twitter lost $60 million to these attacks, showing that scale alone does not remove the risk.

SMS pumping works through a network of fraudsters, network operators, and service providers who exploit the payment system for text messages. Fraudsters target authentication systems to generate large volumes of SMS, then collect revenue shares from the process.

Peakhour and Prelude offer combined protection against these threats. Peakhour provides device fingerprinting to identify suspicious devices attempting verification. Its residential proxy detection stops fraudsters who hide behind legitimate IP addresses. These tools block attackers before they access verification systems.

Prelude complements this protection with their multi-routing SMS verification platform. Its system uses real-time fraud detection across five messaging channels in 230 countries. When Prelude detects a potential attack, it automatically redirects traffic through secure routes.

Businesses need to understand the SMS delivery chain to protect themselves. Gateway providers, network operators, and content aggregators each introduce possible points of exploitation.

Prevention requires multiple security layers:

Rate limiting to restrict message volume
Device fingerprinting to track suspicious patterns
Residential proxy detection to unmask hidden attackers
Behavioural analytics to spot unusual activity
Machine learning to adapt to new attack methods
Continuous learning based on real user interactions

The continuous learning systems from both Peakhour and Prelude build protection that improves with each user interaction. Their platforms analyse legitimate traffic patterns to differentiate them from attacks, helping protection adapt over time.

While SMS verification remains common, Peakhour and Prelude help businesses implement more secure authentication methods. Together, they provide protection that adapts to evolving threats and reduces the cost of fraudulent verification traffic.

See how Peakhour's Application Security Platform helps protect against SMS pumping and other automated threats. Contact our team to secure your applications.

Why We Can't Trust IP Addresses

2025-03-11T14:00:00+11:00

Blocking bad traffic by checking an IP address used to be a reasonable starting point. It is not enough anymore. The rise of residential proxies, especially mobile proxies like those from Proxidize, has weakened one of the simpler assumptions in web security: that an IP address tells you much about who is behind a request.

Why is this a problem now?

Residential proxies route traffic through real household IP addresses, so requests look as if they come from normal homes rather than data centres. Companies like Proxidize have made mobile proxy setups accessible using Android phones or USB modems.

In my presentations at AISA and other security conferences, I've described these proxies as systems that "masquerade internet usage as originating from residential and office networks," because they sit outside the assumptions used by many security controls.

What has changed recently is access. Proxidize offers kits that let anyone set up a proxy farm - from 5-modem kits at $499 to 80-modem setups for around $6,000. They have turned proxy farming into a plug-and-play system where you can be up and running "in less than 60 seconds."

The scale is large. Proxidize users process an estimated 80 billion records combined every single day: 80B+ Records Scraped Daily.

The model is also being sold as a "passive income opportunity," where people can earn money by setting up proxy farms and selling access to others. In their recent webinar, they announced plans for a "Proxidize Grid" marketplace where users can sell their proxies with "a single click through an automated Marketplace."

The BYOD mobile proxy revolution

Companies like iProxy.online have taken this further with a Bring Your Own Device (BYOD) approach. Rather than requiring specialised hardware, they let customers turn any Android device into a mobile proxy.

As Sabir, the cofounder of iProxy.online, explained in a recent interview, "You can install iProxy app here and in the dashboard you have proxy access like Socks5, HTTP accesses, and traffic goes through your device."

This means anyone with an old Android phone and a SIM card can create their own mobile proxy, lowering the barrier to entry. For around $59 per month (based on Proxidize's pricing), users get access to what Sabir calls "precious" mobile IP addresses.

Why are mobile IPs so valuable? As Sabir explains: "If you have Barcelona, we are here in Barcelona and you have like 2 million people living there and you have like several thousands of IP addresses from your mobile providers. And one IP address is shared by many. By thousands of people... And if you have mobile IP address, this cannot be blocked by Facebook or Instagram or any other services because in this case, like innocent people, like thousands of them will be blocked."

This carrier-grade NAT (CGNAT) technology means mobile IP addresses are shared across thousands of users, making broad IP blocks difficult without affecting legitimate users.

What this enables attackers to do

With residential proxies, attackers can:

Hide behind legitimate IP addresses that security systems trust
Bypass geo-restrictions to attack from what appears to be a local source
Distribute attacks across thousands of residential IPs to avoid detection
Make malicious traffic look like it comes from normal users

In my work at Peakhour.IO, we've seen a rise in attacks originating from these residential proxies. The Chinese state-sponsored group Camaro Dragon showed the potential of the model when they developed custom firmware for TP-Link routers, turning them into residential proxies for their operations. This method let them bypass traditional defences like GeoIP blocking because the traffic appeared to come from normal homes.

The broader trend is commoditisation. You no longer need to be a nation-state actor to use them. Anyone with a few hundred dollars can set up a residential proxy farm or use services like iProxy.online to route their traffic through mobile networks.

How it enables data exfiltration

Data exfiltration is harder to detect when residential proxies are involved. State-sponsored actors like Volt Typhoon have used compromised network devices to "proxy all network traffic to targets through compromised SOHO network edge devices."

This means stolen data travels through home routers or office equipment before reaching the attacker, making it harder to trace. Since the traffic appears to come from thousands of different legitimate sources, traditional data loss prevention tools struggle to identify and block the exfiltration.

I've worked with organisations that have suffered breaches where data was exfiltrated through residential proxies. In these cases, the traffic blended in with normal home user traffic, making it difficult to detect. These weren't sophisticated nation-state attacks - they were conducted by ordinary cybercriminals using commercially available residential proxy services.

How it enables credential stuffing and other attacks

Credential stuffing attacks have hit Australian businesses hard, with companies like The Iconic, Guzman y Gomez, Dan Murphy's, and others falling victim. Residential proxies help these attacks work because attackers can distribute their login attempts across thousands of residential IP addresses.

When an attack comes through residential proxies, each login attempt appears to come from a different legitimate user. IP-based rate limiting fails because no single IP shows suspicious volume. Even when security teams try to block suspicious regions, proxies let attackers appear to be local customers.

According to our research at Peakhour.IO, traditional IP intelligence services are failing to detect these proxies. Tests we conducted showed that top providers like Maxmind detected 0% of residential proxies, while even the best performer, IP Quality Score, only identified 24%.

The traffic share can be significant. We've seen cases where up to 40% of traffic to Australian e-commerce sites consists of bots using residential proxies for credential stuffing, price scraping, and inventory checking. This puts customer accounts at risk, distorts analytics, and wastes marketing budgets on fake traffic.

The TCP/IP fingerprinting challenge

One aspect of mobile proxies that makes them even more effective is the ability to match TCP/IP fingerprints with the purported device. As Sabir from iProxy.online explains:

"In some cases, your fingerprint, TCP fingerprint should match to your user agent. For example, if you like pretending to be a Mac user or iOS user or Windows user, your TCP fingerprint should be matched with your browser fingerprint."

This means detection mechanisms that look for mismatches between TCP/IP fingerprints and browser types can also be bypassed.

Anybody can now set them up

The barrier to entry for setting up residential proxies has fallen sharply. Companies like Proxidize market their products as simple to use, with statements like "Start using Proxidize in less than 60 seconds."

There are YouTube videos showing how to earn "passive income" by setting up proxy farms. One video explains how hosts can earn "$200 a month minimum" by hosting Proxidize hardware in their homes.

With iProxy.online, it's even simpler—just install an app on an Android phone, and you have a mobile proxy. As Sabir explains, "Actually your expenses are like you pay like for the SIM card, you pay a small subscription fee to the service and you just... That's it. It requires like one minute of work just to download an app."

This accessibility means residential proxy use is no longer limited to nation-states and sophisticated cybercriminal organisations. It is now within reach of anyone with basic technical skills.

The solution: per-connection detection

The rise of residential proxies means IP reputation databases are not enough on their own. As I've been explaining in my talks, "Residential proxies pose a significant challenge to traditional defense mechanisms... making malicious traffic appear legitimate."

The practical answer is per-connection detection that looks at network behaviour patterns rather than just IP addresses. At Peakhour.IO, we stack detections across layers to identify and mitigate proxy traffic.

A useful technique is analysing protocol behaviour. When traffic passes through a residential proxy, there are often detectable differences between network signatures (which come from the proxy) and the application behaviour (which comes from the third-party application).

These techniques can identify proxy connections even when they come from legitimate residential IP addresses, giving defenders a way to respond without blocking whole residential or mobile networks.

A call to action for businesses

If you're a business, especially in e-commerce, financial services, or any industry that relies on user accounts, residential proxy traffic needs to be part of your security model.

Traditional security approaches based on IP reputation, geolocation, and rate limiting are no longer sufficient. You need to implement per-connection detection that can identify residential proxy usage regardless of the source IP address.

At Peakhour.IO, we've seen organisations fall victim to attacks that could have been prevented with the right detection mechanisms. Waiting until credential stuffing or data exfiltration becomes visible is the expensive way to learn this lesson.

IP addresses alone can no longer tell us who to trust. We need to look deeper at each connection to protect systems and data now that proxy networks are easy to rent or build.

Did Residential Proxies enable a $600 Billion loss?

2025-01-31T00:00:00+11:00

The DeepSeek story puts residential proxy networks under scrutiny as a possible factor in AI's latest market disruption. In January 2025, the Chinese startup's emergence erased $600 billion from Nvidia's market value by demonstrating AI capabilities that match industry leaders at a fraction of the cost.

The path to this capability raises a practical security question for AI platforms. Leading platforms protect their APIs with multiple security layers - rate limiting to prevent mass data extraction, bot detection to block automated requests, and geoblocking to restrict access from certain regions. These measures are meant to prevent the systematic collection of training data.

Residential proxy networks create a route around those protections. These networks route traffic through household IP addresses, so requests appear to originate from homes in permitted regions. A request from a restricted location could look like legitimate traffic from Sydney, Melbourne, or Perth.

The circumstances suggest this approach is plausible. By distributing requests across millions of residential IPs worldwide, each IP could maintain human-like patterns while staying below rate limits. The aggregate data could form a substantial training set without triggering security alerts.

Meta's lawsuit against Bright Data strengthens this possibility. The case exposed how proxy providers monetise residential IPs, often without homeowners' knowledge. That model creates a global network capable of bypassing traditional security measures - exactly the type of infrastructure needed for large-scale data collection.

The residential proxy industry threatens $600 billion in business value through data theft and security bypasses. DeepSeek's impact on Nvidia's market capitalisation highlights the real-world impact of residential proxies.

For AI platforms, the question is operational. How can platforms distinguish between legitimate users and well-crafted requests through residential proxies? When geographical restrictions lose meaning, what security measures remain effective? Traditional IP Intelligence based proxy detection based on historical usage is no longer effective; per-connection proxy detection is essential.

DeepSeek's emergence suggests AI security teams need to revisit their assumptions. The potential use of residential proxy networks to dissolve digital borders challenges current approaches to platform protection.

How Bots Are Corrupting Your A/B Testing Data

2025-01-20T00:00:00+11:00

Bot traffic contaminates A/B testing results and can undermine marketing strategy. Your testing programme is exposed when residential proxy networks generate fake interactions (e.g. click fraud) that appear to come from your target market.

These residential proxies hide behind real household internet connections in the targeted geographic areas. When a bot network routes traffic through Sydney IP addresses to masquerade as real Australians, your analytics counts that traffic as legitimate local engagement. Because the traffic matches your demographic and geographic targeting profile, traditional detection methods become less effective.

This contamination affects the accuracy of the marketing strategy. Your A/B tests should show clear winners, but the results are masked by bot behaviour rather than real user preferences. Marketing teams then optimise campaigns from false signals, wasting budget and time on the wrong opportunities. The data starts driving decisions that harm conversion rates and revenue.

The scale of the problem continues to grow. Residential proxy services now offer millions of local IPs in every market. They rotate these IPs automatically and match real browser characteristics. Without specialised detection such as Peakhour A/B Testing Protection, this traffic can appear identical to genuine users in your analytics.

Each day without detection compounds the damage. Tests generate misleading data that guides strategic decisions. Marketing teams spend hours analysing invalid results and implementing changes that reduce performance. Budget allocated to testing delivers diluted ROI as optimisations based on bot data decrease conversion rates.

Traditional bot protection fails against this distributed threat. IP-based detection cannot identify residential proxy traffic that matches your target geography. Rate limiting proves ineffective against attacks spread across thousands of residential IPs. These bots evade basic JavaScript challenges through sophisticated browser emulation.

Peakhour's A/B Testing Protection uses network fingerprinting to detect residential proxy traffic. Our system analyses subtle patterns in how these proxies connect and behave, identifying bot networks that other solutions miss. We maintain a real-time database of residential proxy services to block new threats as they emerge.

Our customers have discovered that 40% of their test traffic came from bots. After implementing protection, they achieved:

Valid test results reflecting real user preferences
Increased conversion rates from accurate optimisation
Reduced waste of marketing team time and resources
Protection of testing budget from invalid traffic
Confidence in strategic decisions based on clean data

The rise of residential proxies has amplified bot threats to A/B testing. Traffic that appears to come from local users may mask sophisticated bot networks. Protecting your testing programme requires detection that goes beyond IP addresses and basic challenges. Contact us to learn how Peakhour can help secure your A/B testing data and keep optimisation decisions grounded in real users.

How MTU Fingerprinting Identifies VPNs and Mobile Users

2025-01-15T14:00:00+11:00

For traffic analysis, it helps to know how a user reached the service. Are they on a home network, a mobile connection, or a VPN? Deep packet inspection is invasive, but TCP handshake metadata can still carry useful context about the Maximum Transmission Unit (MTU) a connection appears to be using. By analysing those inferred MTU values, we can build "fingerprints" that point to the underlying network technology carrying the connection.

This article looks at how common technologies affect MTU values and shows how a SQL query can turn that data into useful network labels.

What is MTU and Why Does it Change?

The Maximum Transmission Unit (MTU) is the largest data packet, or frame, that a network-connected device can transmit. On standard Ethernet networks, this value is typically 1500 bytes. Larger payloads have to be split into chunks that fit that limit.

Encapsulation and Tunneling

The value starts to shift when tunnelling protocols are involved, including those used by VPNs and mobile networks. These protocols wrap the original data packet inside another packet, a process called encapsulation. The outer packet has its own headers for routing and management.

This encapsulation "steals" space from the original 1500 bytes available on the physical network. If a tunnelling protocol adds 60 bytes of headers, for example, the maximum size for the original data packet is now 1440 bytes (1500 - 60).

The Problem with Fragmentation

What happens if a device tries to send a 1500-byte packet through this 1440-byte tunnel? The packet has to be broken into smaller pieces, a process called fragmentation. It works, but it is inefficient. Fragmentation consumes CPU resources on the router performing it, adds header overhead to each fragment, and requires the receiving device to reassemble the pieces. The result is lower speed and higher latency.

To avoid that penalty, operating systems and network devices reduce the MTU of the connection to account for the tunnel's overhead. The amount of the reduction follows from the tunnelling protocol in use. That predictable drop is the basis for MTU fingerprinting.

A Guide to Common MTU Values

Different technologies add different overheads, which produces distinct MTU values.

WireGuard

WireGuard is a modern VPN known for its efficiency, but it still adds overhead.

IPv4 Overhead: 60 bytes (20-byte IPv4 header + 8-byte UDP header + 32-byte WireGuard header).
IPv6 Overhead: 80 bytes (40-byte IPv6 header + 8-byte UDP header + 32-byte WireGuard header).

On a standard 1500-byte network, that produces predictable MTU values:

1500 - 60 = 1440 bytes (WireGuard over IPv4)
1500 - 80 = 1420 bytes (WireGuard over IPv6)

There is a special case with ISPs that use DS-Lite (Dual-Stack Lite) to carry IPv4 traffic over an IPv6 network. This adds another 40-byte IPv6 header, reducing the MTU further.

1420 - 40 = 1380 bytes (WireGuard over DS-Lite)

OpenVPN

OpenVPN is another common VPN solution, but its fingerprint is less tidy. Instead of setting a static interface MTU, OpenVPN often uses a feature called mssfix. This dynamically adjusts the Maximum Segment Size (MSS) value within the TCP headers of encapsulated packets to prevent fragmentation.

The MSS is the MTU minus the IP and TCP header sizes (typically 40 bytes for IPv4). The exact MSS value, and therefore the effective MTU, depends on OpenVPN's configuration, including the transport protocol (UDP or TCP), cipher, MAC algorithm, and compression. As noted by security researcher ValdikSS, these unique MSS values can be used to fingerprint a connection with high precision. For example, a common configuration might result in an MSS of 1369, which corresponds to an effective MTU of 1409 (1369 + 40).

For general analysis, connections with an MTU around 1400 or 1380 bytes often indicate OpenVPN or other VPN usage, especially when seen with other factors.

Mobile Networks (LTE & 5G)

Mobile networks also modify MTU values. When your phone connects to the internet, its data is tunnelled through the carrier's network using the GPRS Tunnelling Protocol (GTP). This encapsulation adds its own layer of headers.

As detailed by Nick vs Networking, the typical overhead for GTP traffic over an Ethernet transport network is 50 bytes:

14 bytes for the Ethernet header
20 bytes for the outer IPv4 header
8 bytes for the UDP header
8 bytes for the GTP header

For a mobile carrier using a standard 1500-byte MTU on its transport network, the maximum MTU available to the user's device is 1450 bytes (1500 - 50).

Mobile devices don't guess this value; they are explicitly told what MTU to use by the network during the connection setup process (via Protocol Configuration Options). Mobile operators have two choices to avoid fragmentation:

Increase Transport MTU: Enable jumbo frames (for example, 1600 bytes or more) on their internal network to accommodate the 50-byte overhead and still provide a full 1500-byte MTU to the user.
Lower Advertised MTU: Advertise a lower MTU to the user's device. This is why values such as 1450 are common. Some operators may configure a more conservative MTU, such as 1300 bytes, to maintain stability across all parts of their network.

Other Common Values

Standard Ethernet: The baseline is 1500 bytes.
PPPoE: Common for DSL connections, adds 8 bytes of overhead, resulting in an MTU of 1492 bytes.
IPv6 Minimum: The IPv6 specification mandates a minimum MTU of 1280 bytes, so this value is also a significant marker.

Analysis with SQL

With this context, we can analyse network logs to classify user connections. The following SQL query buckets and attributes MTU values from a large dataset, turning raw numbers into meaningful labels.

The query works in several stages:

Extract Data: It parses the MTU from a fingerprint string in the logs.
Bucket MTUs: It uses a CASE statement to group MTUs. Specific known values, such as 1500, 1440, 1420, and 1380, go into their own buckets. Jumbo frames (>1500) are grouped into 100-byte buckets, and everything else is grouped into 20-byte buckets.
Attribute Buckets: In the final SELECT, another CASE statement translates those numeric buckets into human-readable descriptions based on the fingerprints we've identified.

The Query

-- Bucketing logic and attribution informed by research from:
-- https://ripx80.de/posts/06-wg-mtu/ (WireGuard)
-- https://medium.com/@ValdikSS/detecting-vpn-and-its-configuration-and-proxy-users-on-the-server-side-1bcc59742413 (OpenVPN)
-- https://nickvsnetworking.com/mtu-in-lte-5g-transmission-networks-part-1/ (Mobile Networks)
WITH base_data AS (
    SELECT
        toInt32OrNull(splitByChar(':', splitByChar(',', synner_fingerprint)[1])[4]) AS mtu,
        toInt32OrNull(splitByChar(':', splitByChar(',', synner_fingerprint)[1])[5]) AS wsize,
        toInt32OrNull(splitByChar(':', splitByChar(',', synner_fingerprint)[2])[1]) AS scale,
        (tls.handshake_rtt_us - tcp.min_rtt_us) >= 65000 AS is_high_latency
    FROM logs.client_logs
    WHERE time >= '2025-07-01' AND shielded = 0
),
main_aggs AS (
    SELECT
        CASE
            WHEN mtu = 1500 THEN 1500
            WHEN mtu = 1440 THEN 1440
            WHEN mtu = 1420 THEN 1420
            WHEN mtu = 1380 THEN 1380
            WHEN mtu > 1500 THEN 1501 + intDiv(mtu - 1501, 100) * 100
            ELSE intDiv(mtu, 20) * 20
        END AS mtu_bucket,
        countIf(is_high_latency) AS high_latency_count,
        countIf(not is_high_latency) AS normal_latency_count,
        round(avg(wsize * pow(2, scale))) AS avg_real_wsize
    FROM base_data
    WHERE mtu IS NOT NULL AND wsize IS NOT NULL AND scale IS NOT NULL
    GROUP BY mtu_bucket
),
top_wsizes AS (
    SELECT
        mtu_bucket,
        groupArray((wsize, cnt)) AS top_wsizes
    FROM
    (
        SELECT
            CASE
                WHEN mtu = 1500 THEN 1500
                WHEN mtu = 1440 THEN 1440
                WHEN mtu = 1420 THEN 1420
                WHEN mtu = 1380 THEN 1380
                WHEN mtu > 1500 THEN 1501 + intDiv(mtu - 1501, 100) * 100
                ELSE intDiv(mtu, 20) * 20
            END AS mtu_bucket,
            wsize,
            count() AS cnt,
            row_number() OVER (PARTITION BY mtu_bucket ORDER BY cnt DESC) AS rn
        FROM base_data
        WHERE mtu IS NOT NULL AND wsize IS NOT NULL AND scale IS NOT NULL
        GROUP BY mtu_bucket, wsize
    )
    WHERE rn <= 5
    GROUP BY mtu_bucket
),
top_scales AS (
    SELECT
        mtu_bucket,
        groupArray((scale, cnt)) AS top_scales
    FROM
    (
        SELECT
            CASE
                WHEN mtu = 1500 THEN 1500
                WHEN mtu = 1440 THEN 1440
                WHEN mtu = 1420 THEN 1420
                WHEN mtu = 1380 THEN 1380
                WHEN mtu > 1500 THEN 1501 + intDiv(mtu - 1501, 100) * 100
                ELSE intDiv(mtu, 20) * 20
            END AS mtu_bucket,
            scale,
            count() AS cnt,
            row_number() OVER (PARTITION BY mtu_bucket ORDER BY cnt DESC) AS rn
        FROM base_data
        WHERE mtu IS NOT NULL AND wsize IS NOT NULL AND scale IS NOT NULL
        GROUP BY mtu_bucket, scale
    )
    WHERE rn <= 5
    GROUP BY mtu_bucket
)
SELECT
    CASE
        WHEN mtu_bucket IN (1500, 1440, 1420, 1380) THEN toString(mtu_bucket)
        WHEN mtu_bucket > 1500 THEN concat(toString(mtu_bucket), '-', toString(mtu_bucket + 99))
        ELSE concat(toString(mtu_bucket), '-', toString(mtu_bucket + 19))
    END AS mtu_range,
    CASE
        WHEN mtu_bucket = 1500 THEN 'Standard Ethernet'
        WHEN mtu_bucket = 1480 THEN 'Likely PPPoE (e.g., 1492)'
        WHEN mtu_bucket = 1460 THEN 'Likely DS-Lite/GRE Tunnel'
        WHEN mtu_bucket = 1440 THEN 'Likely Mobile LTE/5G (e.g., 1450) / WireGuard over IPv4'
        WHEN mtu_bucket = 1420 THEN 'WireGuard over IPv6'
        WHEN mtu_bucket = 1400 THEN 'Likely OpenVPN / Mobile'
        WHEN mtu_bucket = 1380 THEN 'Likely OpenVPN / WireGuard over DS-Lite / Mobile'
        WHEN mtu_bucket = 1300 THEN 'Likely Mobile LTE/5G configured'
        WHEN mtu_bucket = 1280 THEN 'IPv6 Minimum'
        WHEN mtu_bucket > 1500 THEN 'Jumbo Frame'
        ELSE 'Other'
    END AS mtu_attribution,
    high_latency_count,
    normal_latency_count,
    round(high_latency_count / (high_latency_count + normal_latency_count), 2) AS high_latency_ratio,
    top_wsizes,
    top_scales,
    avg_real_wsize
FROM main_aggs
LEFT JOIN top_wsizes USING (mtu_bucket)
LEFT JOIN top_scales USING (mtu_bucket)
WHERE (high_latency_count + normal_latency_count) > 10000
ORDER BY mtu_bucket
LIMIT 50 FORMAT Vertical

Why Jumbo Frames Matter

Jumbo frames (MTU values greater than 1500 bytes) are a useful edge case in MTU fingerprinting. These frames, typically ranging from 9000-9216 bytes, are primarily used in high-performance computing environments, data centres, and enterprise networks where throughput optimisation is important.

When we detect jumbo frame MTUs in our analysis, they often indicate:

Enterprise Users: Corporate networks frequently enable jumbo frames for internal communications
Data Centre Traffic: Cloud services and CDNs often use jumbo frames between their infrastructure
High-Performance Applications: Video streaming, large file transfers, and backup operations can benefit from larger frame sizes
Network Misconfiguration: Jumbo frames sometimes appear because of network equipment misconfiguration

The presence of jumbo frames can help distinguish consumer and enterprise traffic, adding useful context for traffic classification and security analysis.

Practical Use Cases and Applications

MTU fingerprinting is useful across several security and operational domains:

Security Applications

VPN Detection for Compliance: Organisations can identify employees bypassing corporate network policies with personal VPNs, supporting compliance with data governance requirements.

Bot Traffic Classification: Automated traffic from residential proxy networks often shows consistent MTU patterns that differ from genuine residential users, improving bot detection.

Threat Intelligence Enhancement: Correlating MTU patterns with other indicators helps build broader threat profiles and improves attack attribution.

Network Operations

Performance Optimisation: Understanding the MTU distribution of your user base helps optimise content delivery and reduce fragmentation-related performance issues.

Infrastructure Planning: MTU analysis reveals the underlying network technologies your users employ, informing CDN placement and capacity planning decisions.

Quality of Service: Different MTU patterns correlate with connection quality, enabling proactive support for users on constrained networks.

Business Intelligence

Market Analysis: Geographic and demographic patterns in MTU distribution reveal technology adoption trends and market characteristics.

User Experience Optimisation: Identifying users on mobile or constrained networks enables adaptive content delivery and interface optimisation.

Dynamic Analysis vs Static IP Databases

MTU fingerprinting is a dynamic signal, which makes it useful alongside static IP reputation databases. It has several practical advantages:

Real-Time Adaptation

Static IP databases go stale. A residential IP address might be flagged as malicious based on historical activity, but MTU fingerprinting analyses the current network configuration. This dynamic approach captures the infrastructure being used at the moment of connection, providing more accurate and timely intelligence.

Circumvention Resistance

Attackers can rotate IP addresses or use clean residential proxies to bypass static blacklists. It is harder to manipulate the network characteristics that influence MTU values, because MTU is determined by the underlying network infrastructure.

Granular Classification

Where IP databases provide binary classifications (malicious/benign), MTU fingerprinting offers more detail on the specific technologies and configurations in use. This granularity enables more sophisticated risk assessment and response strategies.

Reduced False Positives

Static databases often flag legitimate users sharing IP addresses with malicious actors, which is common with residential ISPs and mobile carriers. MTU fingerprinting focuses on network behaviour rather than IP reputation, reducing false positive rates while maintaining security effectiveness.

Infrastructure Transparency

MTU analysis reveals the network path and technologies involved in a connection, providing transparency that static IP databases cannot match. This visibility enables more informed security decisions and a better understanding of threat actor capabilities.

Conclusion

MTU fingerprinting turns network metadata into useful context about the infrastructure behind a connection. Unlike static databases that rely on historical reputation, this dynamic analysis technique provides real-time insight into network technologies, user behaviours, and potential security threats.

By understanding MTU patterns, security teams can identify VPN usage, classify mobile traffic, detect residential proxy abuse, and optimise network performance. Its resistance to circumvention and low false-positive rates make it a useful addition to modern security architectures.

As network technologies continue to evolve, MTU fingerprinting provides a stable way to understand and classify traffic based on fundamental network characteristics rather than short-lived indicators. That makes it a practical signal for network security and operations.

The Hidden Cost of Click Fraud

2025-01-14T13:00:00+11:00

Marketing organisations are losing money to automated clicks and fake impressions. These attacks drain advertising budgets and corrupt the data CMOs rely on for strategic decisions. The lost money cannot be recovered, but understanding the scale and mechanics of click fraud helps marketing teams protect future investment and optimise campaigns.

The Scale of Click Fraud

Click fraud now consumes 40% of digital advertising budgets through fake clicks and impressions that never reach real customers. It affects every digital marketing channel, from pay-per-click and display advertising to social media campaigns, retargeting, and video advertising. The damage goes beyond direct financial loss, because it also corrupts the metrics teams use for decision-making.

Our research on bot traffic shows the percentage of fraudulent clicks continues to rise each quarter. Marketing teams that ignore this threat base their strategies on flawed data, which leads to misallocated resources and weaker campaign performance.

How Bots Generate Fake Clicks

Automated bots generate clicks and impressions at scale across digital advertising platforms. These programs target competitor advertisements to drain marketing budgets through fake clicks. They create artificial impressions that inflate metrics and send false engagement signals. Bots also manipulate bidding algorithms and skew attribution data, leading to misallocated advertising resources.

Modern bots use more advanced techniques to evade standard security controls. They mimic human behaviour patterns and rotate through different IP addresses to avoid detection and blocking.

The Residential Proxy Challenge

Residential proxies create a significant obstacle for click fraud detection systems. These proxy services route bot traffic through IP addresses assigned to real consumers' homes and devices, making fraudulent traffic look legitimate to traditional anti-bot tools.

Residential proxy networks build their IP pools through multiple channels. They partner with consumer VPN services, distribute browser extensions, embed code in mobile applications, and in some cases exploit compromised devices. This mix gives proxy operators access to millions of residential IP addresses.

Traditional IP reputation services fail to identify this proxy traffic. Our research demonstrates these services miss up to 96% of residential proxy traffic, leaving advertising campaigns exposed to fraud through these channels.

Impact on Marketing Strategy

Click fraud undermines three core areas of marketing decision-making. First, it distorts campaign performance metrics through false click-through rates and inflated impression counts. The fraud creates skewed conversion data and engagement metrics that mask true campaign performance.

In budget allocation, click fraud wastes marketing spend on non-existent users while reducing campaign ROI. Artificially inflated acquisition costs lead marketing teams to misallocate resources across channels and campaigns.

Strategic planning suffers when data is contaminated across multiple dimensions. A/B testing results become invalid when bots generate fake interactions. Geographic and demographic data lose accuracy due to proxy traffic. Competitive intelligence becomes unreliable as bot activity masks true market dynamics.

Marketing teams that base decisions on corrupted data take on significant risk. Their optimisation efforts target bot behaviour instead of real users. Campaign budgets flow to channels dominated by fraud. Strategic initiatives fail because decisions are based on artificial metrics rather than genuine customer behaviour.

Protecting Your Marketing Investment

Lost money from click fraud cannot be recovered, so marketing teams need protection measures for future investment. Detection forms the first line of defence through continuous monitoring of traffic patterns and IP reputation analysis. Teams track user behaviour to identify suspicious patterns that indicate fraud.

Prevention requires a multi-layered security approach. Marketing teams need systems that block known bot networks and detect residential proxies attempting to generate fake clicks. These controls validate real user traffic and filter out fraudulent clicks before they affect campaigns.

Campaign optimisation becomes more useful once fraud protection is in place. Teams can adjust targeting parameters based on genuine user data and reallocate budgets to channels with verified traffic. This supports updates to bidding strategies and refinement of audience segments based on real engagement.

Our Ad Fraud Protection solution protects marketing investment by blocking bot traffic, detecting residential proxies, and validating real users. This helps ensure ad spend reaches genuine customers rather than fraudulent clicks.

Making Informed Decisions

Understanding click fraud changes how marketing teams analyse data and plan campaigns. Data analysis starts with identifying corrupted metrics in campaign reports. Teams must filter bot traffic from their analytics to measure real user engagement. This enables tracking of true campaign performance based on human interactions.

Budget planning improves once teams understand the scale of click fraud. Marketing teams can allocate resources to channels with verified human traffic. This focus on real users optimises campaign spend and improves return on investment across marketing initiatives.

Strategy development depends on clean, accurate data. Teams make decisions based on genuine user behaviour rather than bot interactions. Campaign planning targets real audience segments with messages that resonate. Performance measurement reflects actual results rather than artificial engagement.

Taking Action

Marketing teams need protection measures across three key areas to secure their investments. First, bot protection forms the foundation through deployment of bot management systems. These systems block automated traffic while validating real users and monitoring for suspicious patterns.

The second protection layer focuses on residential proxy detection. Teams implement proxy detection to identify and block proxy networks. This helps ensure traffic comes from real IP addresses and prevents fraud through residential proxies.

The third component centres on protecting ad spend through traffic monitoring. Teams implement systems to block fraudulent clicks and validate impressions. This enables tracking of real engagement from genuine users.

Our Traffic Control solution combines these protection measures to help marketing teams secure their investments and base decisions on real user data.

Conclusion

Click fraud threatens marketing budgets and corrupts campaign data. Lost money cannot be recovered, but understanding and preventing fraud helps marketing teams protect future investment and make better decisions.

Residential Proxies - The Growing Threat to Ad Campaigns

2024-12-30T00:00:00+11:00

Digital advertising fraud costs organisations $42 billion annually through fake clicks and fake impressions. The growth of residential proxy networks has changed how this fraud reaches campaigns: bot traffic can now hide behind legitimate residential IP addresses, putting it outside the reach of many traditional checks.

Hiding in the crowd

Residential proxies make bad traffic harder to separate from real visitors. Unlike data centre IPs that traditional tools can often detect, residential proxies hide behind real households' internet connections. This means the traffic appears to come from genuine users in your target market. When a residential proxy network operates from Sydney suburbs to attack an Australian campaign, existing protection systems can be fooled into treating it as authentic local traffic.

The impact extends beyond direct financial losses. Your analytics may show engagement from what appears to be your target demographic, while the activity is bot traffic masquerading as potential customers. This contaminated data can push marketing strategy in the wrong direction and waste retargeting spend. Competitors can also use fake clicks to drain your budget while gathering intelligence on your campaigns.

Bad data then compounds the spend problem. Once bots are counted as engaged prospects, reporting and optimisation start from the wrong signal. The result is not only wasted media spend, but poorer decisions built on traffic that should never have been treated as customer intent.

A growing threat

The residential proxy industry continues to expand. Services now offer millions of residential IPs with precise geographic targeting capabilities. They rotate IPs automatically and match real browser fingerprints. Without specialised detection methods, the traffic can become indistinguishable from genuine users.

This is a budget problem, not just a technical one. Each day without protection means 30-40% of your ad budget feeds bot networks instead of reaching customers. The corrupted analytics drive decisions that compound these losses. As residential proxy services grow more sophisticated, basic controls fall further behind.

Traditional IP reputation and rate limiting fail against this distributed threat because the IP addresses are not obviously suspicious. Protection requires advanced network fingerprinting that looks beyond IP addresses. Peakhour's Ad Fraud Protection analyses subtle patterns in how residential proxies connect and behave, and detects the signs of proxy traffic that other solutions miss.

Knowledge is power

Peakhour integrates this protection with your existing ad platforms to stop fraud before it affects your campaigns. Our customers have reduced wasted ad spend by 35% while improving campaign performance through cleaner analytics. The system adapts as threat techniques change, so detection keeps pace with new residential proxy methods.

Residential proxies have changed ad fraud because traffic that appears local and legitimate may mask sophisticated bot networks. Protecting your campaigns requires detection that goes beyond IP addresses and treats residential proxy behaviour as its own signal. Contact us to learn how we can help secure your ad spend against residential proxy networks.

Your Anti-Fraud Residential Proxy Detection Sucks

2024-10-04T13:00:00+10:00

Online fraud is big business: account takeovers, chargebacks, scams, even romance scams. It costs businesses billions of dollars every year.

A common way websites fight it is to use an anti-fraud service to calculate the risk of a transaction. Most teams get this intelligence from a third-party service, either through an API or a plugin.

For online stores, ecommerce fraud prevention has to protect checkout and account flows without punishing real customers.

One of the major signals these services use is IP reputation. IP reputation tries to answer questions like:

Is the order coming from a datacentre?
Is it coming from a country other than your target audience?
Is the IP address a known VPN?
Is it a known TOR exit node?
Have lots of fraudulent orders come from this IP address in the past?

Until recently, these services gave teams a useful way to calculate fraud risk from an IP address.

Not anymore.

Fraud traffic has shifted in recent years, away from VPNs and TOR and toward residential proxies. These same anti-fraud services claim they can detect residential proxies, but what if the services many businesses rely on are falling well short?

The results are bad enough that they deserve a blunt look.

The Shocking Truth: Our Results

We took 25 IP addresses that had just been used as residential proxies in an attack on one of our clients, and within 5 minutes of detection ran them through some of the most popular IP intelligence services. The results are not going into anyone's marketing deck.

Here's a summary of our findings:

Service	Detected Proxies	Accuracy
Maxmind	0/25	0%
IP Quality Score	6/25	24%
Seon	1/25	4%
ProxyCheck.io	0/25	0%
ip2proxy	1/25	4%

The best performer in our test, IP Quality Score, detected only 24% of the proxies. The others ranged from 0% to 4%.

Why Your Residential Proxy Detection Service is Failing You

So why are these services performing so poorly? To understand it, we need to look at how proxy usage and detection have changed.

The Good Old Days of Proxy Detection

In the recent past, detecting proxies was much easier. Fraudsters primarily used:

TOR networks
VPN services
Data center proxies

These were relatively static targets. They were tied to a single, stationary IP, or IP ranges. Listing them in IP block lists was straightforward.

The Rise of Residential Proxies: A New Breed of Threat

Now we need to talk about residential proxies, the new go-to tool of fraudsters and scammers. These are not just a new label for old proxies. They behave differently.

What Are Residential Proxies?

Residential proxies come from IP addresses assigned to real residential services by Internet Service Providers (ISPs). These can be:

Home computers
Mobile phones
Tablets
IoT devices

Unlike data center proxies, which use IP addresses from hosting companies, residential proxies use IPs that look just like any other home or mobile user. They have become the tool for avoiding security controls on websites in the last 2-3 years, and they are causing all sorts of headaches for website owners.

How Are Residential Proxy Networks Formed?

This is where the problem starts:

Compromised Devices: Malware can turn innocent devices into proxy endpoints without the owner's knowledge.
Incentivised Programs: Some companies offer users benefits (like free VPN services) in exchange for using their device as a proxy endpoint. Hola VPN and Brightdata are prominent examples.
APP SDKs Quite often, proxy providers will incentivise app developers to include their proxy toolkit in their apps. The user is totally unaware that their device's internet connection is now being resold.

So your personal device, be it a computer or phone, could have its internet connection used to carry out a crime without you knowing. The police could come knocking on YOUR door one day.

Why Are They So Dynamic?

Since the proxy is formed by reusing the internet connection of a device, it is inherently much more dynamic than a proxy formed on a server.

Device Mobility: A mobile phone can connect from home Wi-Fi, then a coffee shop, then a cellular network – all in one day.
ISP IP Rotation: Many ISPs dynamically assign IP addresses, changing them periodically.

Depending on the type of fraud being carried out, the attacker might also rotate the device being used, popping out of a different location. Also, due to the way these proxies are formed, i.e. via an app on a computer or phone, that particular exit point on the proxy network might depend on that app being open.

This dynamic nature is what makes residential proxies so hard to detect using traditional methods.

Shared IPs: The Needle in the Haystack Problem

Residential proxy IPs are not just dynamic. They are typically shared. This means that a single IP address could be used by both legitimate users and proxy traffic:

ISP IP Pools: Internet Service Providers often use large pools of IPs that are dynamically assigned to users. This means that an IP used by a proxy one minute could be assigned to your grandmother's iPad the next.
Carrier-Grade NAT (CGN): Mobile carriers frequently use CGN, which can make hundreds or thousands of users appear to come from the same IP address.
Compromised Routers: A single compromised home router could serve both the legitimate traffic of the homeowner and proxy traffic from the attacker.

If you simply blocked any IP that shows proxy behavior, you would end up blocking legitimate users too.

Why Traditional Methods Are Failing (Revisited)

Now that we understand residential proxies better, let's revisit why old-school detection methods are not enough.

1. Port Scanning

Traditional proxy detection often relies on scanning for open proxy ports. Here's a simple port scanner:

import socket

def port_scan(ip, port):
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    result = sock.connect_ex((ip, port))
    sock.close()
    return result == 0

# Example usage
ip = "123.45.67.89"
proxy_ports = [80, 8080, 3128]  # Common proxy ports

for port in proxy_ports:
    if port_scan(ip, port):
        print(f"Port {port} is open - potential proxy detected")

Why it fails: Residential proxies don't typically have these ports open. They route traffic through standard web ports, making them indistinguishable from normal traffic.

2. Honeypots

Honeypots try to lure and identify proxy traffic.

Why it fails: Sophisticated residential proxy networks can identify and avoid known honeypots. Plus, since they're using real residential IPs, even if they do hit a honeypot, the IP itself isn't a reliable indicator of proxy usage.

3. Client-Side Detection

Detection services may also try to detect proxies by executing Javascript in the browser and checking the result for inconsistencies. These are the common techniques.

3.1 WebRTC Leak

WebRTC can sometimes reveal a user's true IP address:

function detectRealIP(callback) {
    var RTCPeerConnection = window.RTCPeerConnection || window.mozRTCPeerConnection || window.webkitRTCPeerConnection;
    var pc = new RTCPeerConnection({iceServers:[]}), noop = function(){};
    pc.createDataChannel("");
    pc.createOffer(pc.setLocalDescription.bind(pc), noop);
    pc.onicecandidate = function(ice) {
        if(!ice || !ice.candidate || !ice.candidate.candidate) return;
        var myIP = /([0-9]{1,3}(\.[0-9]{1,3}){3}|[a-f0-9]{1,4}(:[a-f0-9]{1,4}){7})/.exec(ice.candidate.candidate)[1];
        pc.onicecandidate = noop;
        callback(myIP);
    };
}

detectRealIP(function(ip) {
    console.log("Your real IP address is: " + ip);
});

3.2 Geolocation Inconsistencies

Comparing IP-based geolocation with browser-reported location.

navigator.geolocation.getCurrentPosition((position) => {
  const browserLat = position.coords.latitude;
  const browserLong = position.coords.longitude;
  // Compare with IP-based geolocation from server
});

3.3 DNS Leaks

Check whether DNS requests are routed through the proxy or are leaking:

const image = new Image();
const uniqueDomain = `test-${Date.now()}.example.com`;
image.src = `http://${uniqueDomain}/pixel.gif`;
// Monitor DNS requests server-side to detect leaks

3.4 Browser Fingerprinting

Check whether there are inconsistencies with the browser, e.g. timezone, and the geolocation of the IP address

const fingerprint = {
userAgent: navigator.userAgent,
screenResolution: `${screen.width}x${screen.height}`,
colorDepth: screen.colorDepth,
timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
plugins: Array.from(navigator.plugins).map(p => p.name),
// ... other characteristics
};
// Analyze fingerprint for proxy indicators

Why these techniques fail

Proxy services can work around all of these methods. Many browsers now allow users to disable WebRTC or use extensions that prevent this leak. Some residential proxy services are sophisticated enough to handle WebRTC requests without leaking the real IP.

Finally, relying on client-side detection means: * Your detection can be reverse engineered and bypassed. * You've already served the content the attacker wants. * It requires Javascript execution, something that won't always be available, for instance on an API.

4. Threat Intelligence

Threat intelligence involves maintaining databases of known proxy IP addresses:

import requests

def check_ip_threat_intel(ip):
    api_key = "your_api_key_here"
    url = f"https://api.threatintelligence.com/v1/ip/{ip}?key={api_key}"
    response = requests.get(url)
    if response.status_code == 200:
        data = response.json()
        return data.get('is_proxy', False)
    return False

# Example usage
ip = "123.45.67.89"
if check_ip_threat_intel(ip):
    print(f"{ip} is a known proxy according to threat intelligence")

Why it fails: As our results show, threat intelligence databases are struggling to keep up with the dynamic nature of residential proxies. By the time an IP is identified and added to a database, it may no longer be in use as a proxy.

Why IP-Based Blocking Is No Longer Enough

Given the shared nature of IPs in the age of residential proxies, simply identifying and blocking "bad" IPs is too blunt. Here's why:

False Positives: Blocking an IP used by a proxy might also block legitimate users sharing that IP.
Ineffectiveness: Proxies can quickly switch to new IPs, so IP-based blocking turns into a chase.
Collateral Damage: You might end up blocking entire ISPs or mobile carriers, cutting off large swaths of legitimate users.

The Need for Connection-Level Detection

Instead of focusing only on IPs, we need to look at the connections themselves. Here's what this means:

Deep packet inspection: Analyses traffic patterns and characteristics beyond surface-level indicators.
Protocol behaviour analysis: Identifies subtle anomalies in how network protocols are implemented across the proxy chain.
TLS/TCP fingerprinting: Examines characteristics of TLS handshakes to detect proxy usage.
Timing analysis: Measures minute differences in network latency that can indicate the presence of a proxy.

Final Thoughts

Proxy usage has evolved, and detection methods need to keep up. Simple IP-based blocking and static lists of "bad" addresses are no longer enough. Residential proxy detection needs real-time analysis of each connection.

Peakhour's residential proxy detection service uses algorithms and machine learning to analyse connections on the fly. We don't just look at where a connection is coming from, but how it behaves, allowing us to spot proxy usage even when it's hiding behind seemingly innocent IP addresses.

Lists of suspect IPs still have a place, but they cannot be the whole answer. Modern proxy detection has to understand the behaviour of network connections.

If you're still treating IP reputation as the main answer, you're already behind. It's time to stop blocking IPs and start understanding connections.

Want a demo of our residential proxy detection? Contact us for a live demo of our service.

The Australian epidemic of Account Takeover attacks

2024-07-29T10:00:00+10:00

In recent months, credential stuffing attacks have hit a number of Australian businesses, leading to compromised accounts, fraudulent purchases, and customer complaints. The pattern is a reminder that account protection cannot stop at password policy or MFA alone.

A Case Study in Credential Stuffing

Security researcher Jacob Larsen has documented a credential stuffing operation targeting Australian businesses. Larsen's research, detailed in his blog post, describes the activity of a threat actor known as "Crabby," who has sold compromised Australian accounts since July 2023.

Larsen's findings show:

The operation began with a threat actor called "Based" selling compromised accounts via Discord and dedicated websites.
In November 2023, the operation was acquired by "Juicy," a notorious account vendor, and rebranded as "Crabby."
As of May 2024, over 19,000 compromised accounts from various Australian brands were offered for sale.
Low-level fraudsters purchasing these accounts have used them to make unauthorised purchases, often targeting high-value items for resale.

The Crabby operation shows how credential stuffing has moved beyond isolated login attempts. It now includes account marketplaces, low-level fraud buyers, and the challenges businesses face once compromised accounts are monetised.

The Difficulty of Defense

Credential stuffing defence is harder when attacks are spread across residential proxies and kept to single attempts per account.

Residential Proxies: The Invisible Threat

Residential proxies weaken traditional IP-based controls. These proxies use IP addresses assigned to real residential internet connections, so malicious traffic can look like normal customer traffic. That helps attackers bypass simple rate limiting and geolocation checks.

That distribution makes login traffic harder to classify. Signals such as a high volume of attempts from one IP address become less useful when attackers can spread requests across a pool of residential IPs.

Single-Hit Attacks: Precision Strikes

Single-hit attacks are another way attackers avoid noisy patterns. In this approach, each stolen credential is used only once per target site, reducing the chance of detection by traditional rate-limiting or anomaly detection systems.

By limiting each credential to one attempt, attackers avoid controls tuned to repeated login failures. A business can have rate limiting in place and still miss credential stuffing that never crosses those thresholds.

The Mobile API Conundrum

As mobile applications become a primary user interface, credential stuffing also moves into mobile API traffic. Traditional bot protection often relies on JavaScript challenges or browser fingerprinting, which does not apply cleanly to attacks against mobile APIs.

Mobile applications typically communicate with backend services via APIs, bypassing the browser environment where many bot detection techniques run. This creates several challenges:

Lack of JavaScript Execution: Mobile APIs don't execute JavaScript, making it impossible to use browser-based bot detection techniques.
Limited Fingerprinting Capabilities: Standardised mobile API requests make it difficult to distinguish between legitimate user activity and automated attacks based on request characteristics.
Increased Attack Surface: More mobile apps means more potential entry points for attackers, making comprehensive protection more complex.
Authentication Simplification: To improve user experience, mobile apps often use simplified authentication flows, which can create weaker controls against automation.

This gap needs API-centred controls that can assess mobile login behaviour without relying on browser-only signals.

Framing Credential Stuffing as a Business Risk

Credential stuffing should be treated as a business risk, not just an authentication issue. The impact can include refunds, chargebacks, customer support load, reputational damage, and regulatory disclosure work.

Risk Quantification and Disclosure

Risk quantification gives security teams a way to explain credential stuffing in business terms. By applying frameworks like FAIR (Factor Analysis of Information Risk), businesses can:

Quantify the potential financial impact of credential stuffing attacks.
Prioritise security investments based on risk reduction potential.
Communicate the importance of cybersecurity measures to non-technical stakeholders.

CPS 234 in Australia adds a disclosure dimension for regulated entities. Businesses need to protect against credential stuffing and be able to explain their exposure, controls, and mitigation strategy.

The State of Credential Stuffing Defense in Australia

Our recent survey of Australian businesses shows uneven adoption of credential stuffing defences:

While 77% of respondents use Multi-Factor Authentication (MFA), only 40% have implemented bot protection measures.
15% of companies chose not to respond to questions about their security measures, suggesting potential gaps in protection.
Just 29% of businesses check credentials against known breaches, leaving a large window of opportunity for attackers using stolen credentials.
Only 15% of organisations use residential proxy detection, a critical component in identifying and mitigating modern credential stuffing attacks.

These results suggest a gap between how credential stuffing is run now and the controls many Australian businesses have in place.

Recommendations for Enhanced Protection

Based on our analysis and survey results, businesses should review the following controls:

Implement Advanced Bot Protection: Deploy controls that detect and mitigate bot attacks, including attacks using residential proxies.
Enhance Mobile API Security: Use mobile API controls that focus on anomaly detection and behavioural analysis rather than browser-based techniques.
Adopt Risk-Based Authentication: Implement dynamic authentication mechanisms that adjust based on the assessed risk of each session or transaction.
Utilise Breached Credential Databases: Check user credentials against known breach databases and enforce password changes for compromised accounts.
Implement Residential Proxy Detection: Use technology that identifies and mitigates traffic from residential proxy networks. This is a key control for modern credential stuffing attacks.
Apply Advanced Rate Limiting: Utilise device fingerprinting and other identifiers beyond IP addresses to implement more effective rate limiting, particularly for single-hit attacks.
Employ Contextual Security: Use signals such as user behaviour patterns, device characteristics, and historical usage to identify anomalies that may indicate credential stuffing attempts.
Quantify and Communicate Risk: Use frameworks like FAIR to quantify the potential impact of credential stuffing attacks and communicate this risk to stakeholders.
Implement Continuous Monitoring: Deploy real-time monitoring that detects patterns indicative of credential stuffing attacks, and update defences as attack methods change.

These controls address the specific problems created by residential proxies, single-hit attempts, mobile API traffic, and weak credential hygiene. They also reflect the limits of IP-only rate limiting and browser-only bot detection.

Credential stuffing defence works best as a layered programme: bot detection, residential proxy detection, breached credential checks, mobile API coverage, and risk reporting. The practical goal is to stop account takeover attempts earlier, reduce fraud exposure, and give security teams evidence they can act on.

The Challenge of Proxy Detection

2024-07-19T10:00:00+10:00

Our recent survey found that only 15% of Australian organisations use residential proxy detection. That leaves many teams relying on controls that were not built for current proxy traffic, especially where CGNAT and NAT make IP-level decisions unreliable.

The Shortcomings of Traditional Methods

Legacy bot protection providers often combine IP reputation, network characteristics, header analysis, and JavaScript-based checks to identify proxy usage. These methods struggle against well-run residential proxies:

IP and ASN categorisation: Ages quickly as new proxy networks emerge.
Network-level checks: Well-configured proxies can work around them.
Header analysis: Proxies can alter HTTP headers to mimic legitimate traffic.
JavaScript-based detection: Struggles against headless browsers and leaves API endpoints vulnerable.

The CGNAT and NAT Challenge

A practical limit of traditional methods is their inability to distinguish legitimate traffic from proxy traffic when both originate from the same IP address. Carrier-Grade NAT (CGNAT) and Network Address Translation (NAT) make this common:

CGNAT: Used by ISPs to conserve IPv4 addresses, resulting in multiple users sharing a single public IP.
NAT: Commonly used in home and business networks, allowing multiple devices to use one public IP address.

As a result, legitimate users and residential proxy traffic can appear to come from the same IP address. IP reputation and geolocation alone cannot separate these traffic types.

This creates a difficult tradeoff:

Blocking suspicious IPs risks denying service to legitimate users.
Allowing all traffic from these IPs opens the door to potential abuse via residential proxies.

Traditional methods cannot reliably pull apart these different types of traffic, so teams either block too much legitimate traffic or allow too much proxy traffic through.

The Need for Sophisticated Network Fingerprinting

To detect and mitigate residential proxy threats while allowing legitimate traffic from shared IPs, detection needs to move beyond IP identity. Network fingerprinting addresses the limits of traditional methods:

Deep packet inspection: Analyses traffic patterns and characteristics beyond basic IP or header indicators.
Protocol behaviour analysis: Identifies subtle anomalies in how network protocols are implemented across the proxy chain.
TLS fingerprinting: Examines unique characteristics of TLS handshakes to detect proxy usage.
Timing analysis: Measures small differences in network latency that can indicate the presence of a proxy.

Used together, these techniques can detect proxy usage on a per-connection basis for both web traffic and API calls, even when traffic originates from shared IP addresses. This approach provides several advantages:

Improved accuracy: Significantly reduces false positives and negatives compared to traditional methods, including in CGNAT and NAT scenarios.
API protection: Secures API endpoints, which are often overlooked by JavaScript-based solutions.
Real-time detection: Allows for immediate action against detected proxy usage without impacting legitimate users.
Adaptability: Can be updated to detect new proxy technologies as they emerge, regardless of IP sharing.

Implementing Effective Proxy Detection

To implement proxy detection that accounts for modern network complexity, organisations should consider the following:

Deploy solutions that use network fingerprinting techniques capable of distinguishing between different types of traffic from the same IP.
Ensure protection covers both web applications and API endpoints, as both are vulnerable to proxy-based attacks.
Implement real-time mitigation capabilities to respond swiftly to detected threats without impacting legitimate users.
Regularly update and tune detection algorithms to keep pace with evolving proxy technologies and network architectures.

Together, these practices improve an organisation's ability to detect and mitigate residential proxy threats across credential stuffing, account takeover, and related activity, while keeping access available for legitimate users.

Learn more about our proxy detection solution, which uses network fingerprinting to address the challenges posed by CGNAT and NAT.

For more detail, explore our learning resources:

As proxy technologies and network architectures change, detection and mitigation need to change with them. Network fingerprinting gives organisations a more reliable way to identify residential proxy abuse without treating every shared IP as suspicious.

Quantifying The Residential Proxy Threat

2024-07-18T10:00:00+10:00

Our 2024 survey found that only 15% of Australian businesses use residential proxy detection. That leaves a measurable blind spot in many security programmes: traffic routed through real consumer connections is harder to separate from legitimate users. This article looks at why residential proxy detection is difficult and how to quantify the risk before choosing controls.

Understanding the Residential Proxy Threat Landscape

Residential proxies use IP addresses assigned to residential internet connections, so malicious traffic can look legitimate. This weakens controls built around IP reputation, GeoIP, and simple request thresholds, and creates a specific detection problem for security teams.

The effectiveness of residential proxies stems from their ability to:

Use legitimate IP addresses, often from unsuspecting users
Bypass IP-based rate limiting and traditional bot detection methods
Evade geolocation restrictions, making GeoIP filtering less reliable
Support large-scale attacks without triggering typical alarm thresholds
Mimic legitimate user behaviour, which makes detection more difficult

These capabilities make residential proxies useful infrastructure for credential stuffing, data scraping, and attempts to bypass fraud detection systems. Because the traffic is distributed across many residential connections, attacks can stay below the thresholds that conventional controls rely on.

Limitations of Conventional Security Approaches

Conventional controls have clear gaps when they are applied to residential proxy traffic:

IP-based detection misses constantly changing, legitimate-appearing IP addresses.
GeoIP filtering becomes less useful against globally distributed residential IPs.
User agent analysis struggles because proxies can mimic legitimate browsers.
Standard rate limiting falters when attacks appear to originate from many unique IPs.
Behavioural analysis based on known bot patterns may miss more careful proxy-based attacks.

These limitations point to a practical requirement: security teams need controls that assess context, not just static request attributes. Residential proxies make simple rule-based decisions less reliable, especially when attacks are distributed and deliberately low-noise.

Quantifying the Risk

To make a sensible decision about residential proxy controls, organisations need to quantify the risk. This involves:

Assessing the potential financial impact of successful attacks via residential proxies
Evaluating the likelihood of such attacks based on industry trends and organisational attractiveness to attackers
Determining the effectiveness of current security measures against this specific threat
Calculating the return on investment for implementing advanced detection and mitigation strategies

Risk quantification gives businesses a clearer basis for investing in residential proxy detection. It aligns security spending with actual threat levels and potential impacts, rather than broad concern or industry pressure alone.

Reframing Security

The challenge of residential proxy detection is less about one new control and more about how signals are combined. A useful approach includes:

Contextual Analysis: Analyse the full context of each request, not just its origin. This includes examining patterns of behaviour across multiple sessions and users.
Continuous Monitoring and Adaptation: Use real-time monitoring systems that can detect subtle patterns indicative of proxy use. These systems should continuously adapt to new attack vectors.
Risk-Based Authentication: Use dynamic authentication mechanisms that adjust based on the assessed risk of each session or transaction.
Holistic Data Analysis: Correlate data from multiple sources - including login attempts, transaction patterns, and user behaviour - to identify anomalies that may indicate proxy use.
Proactive Threat Hunting: Actively search for indicators of residential proxy use within your network and user base, rather than waiting for attacks to trigger alerts.

This approach moves beyond simple allow/block decisions and gives teams a better view of user and network behaviour.

Implementing Advanced Detection Strategies

Residential proxy threats need detection that looks beyond the source IP:

Machine Learning-Based Behavioural Analysis: Use AI and machine learning to identify patterns consistent with proxy use, even when individual actions appear legitimate.
Device Fingerprinting Beyond IP: Use advanced fingerprinting techniques that identify individual devices based on a combination of factors, making it harder for proxies to mimic legitimate users.
Network Traffic Analysis: Analyse network behaviour at a granular level to identify patterns consistent with proxy network traffic.
Adaptive Challenge Mechanisms: Deploy targeted challenges based on risk assessment, without disrupting legitimate user experiences.
Cross-Organisational Data Sharing: Participate in threat intelligence sharing networks to gain broader insights into residential proxy activities and emerging attack patterns.

When used as part of the broader security stack, these strategies improve defence against residential proxy threats.

Elevating Security Through Risk Quantification

Residential proxies are not only a technical detection problem. They change the risk model for web applications because attacker traffic can borrow the appearance of ordinary residential users. By adopting a risk quantification approach and implementing advanced detection strategies, organisations can:

Align security investments with actual threat levels
Improve detection of sophisticated, proxy-based attacks
Strengthen overall security posture against evolving threats
Make data-driven decisions about security priorities and resource allocation

Organisations that handle this well will be able to quantify their risk, adapt their security strategies, and implement intelligent detection mechanisms. The goal is practical: identify, analyse, and mitigate sophisticated threats before they cause material damage.

Effective protection starts with understanding the risk well enough to measure it.

The Cost of Credential Stuffing

2024-07-17T00:00:00+10:00

In recent months, Australian businesses have faced a wave of credential stuffing attacks. These attacks do not require the affected website itself to be breached. They target customer accounts, leading to fraudulent transactions. The damage is practical as well as reputational: disputed purchases, refunds, locked accounts, and customers asking how someone else was able to use their account.

What is Credential Stuffing?

Credential stuffing occurs when attackers use login details obtained from a data breach to access accounts on other sites. Criminals test millions of credentials against a target website to identify working combinations. This attack affects users who reuse passwords across multiple services [1].

The Scale of the Problem

Tens of thousands of Australian online accounts are reported to have been accessed since late November 2023 [2]. The attacks affected major retailers and service providers, including:

The Iconic
Guzman y Gomez
Dan Murphy's
Event Cinemas
Stan

The Impact

While reusing passwords between sites has long been considered poor security practice, users still do it. Blaming the customer, as 23andMe did in its response to an attack, is not a serious account protection strategy. Over 70% of Americans believe that websites have a responsibility to prevent account takeovers via stuffing attacks. Not doing so can negatively impact a business in several ways.

Financial Impact

The cost can fall on either the affected business or the affected customer. Fraudsters made significant purchases using compromised accounts. One scammer claimed to have spent over $800 on high-end alcohol at Dan Murphy's [2]. Others bought iPhones and clothing. Either the customer will be out of pocket, or the business when the customer issues a chargeback on the purchase.

Reputation Damage

The attacks leave businesses dealing with customer complaints, refunds, and visible questions about account security. The Iconic pledged to refund affected customers [1]. Dan Murphy's confirmed that a "small number of user accounts were subject to fraudulent transactions" [3].

Customer Trust

These incidents erode customer trust. Users expect businesses to make account abuse difficult, even when the original password leak happened somewhere else. When accounts are taken over, customers question the security practices of the affected companies.

Business Response

Companies responded by:

Locking compromised accounts
Issuing refunds
Encouraging customers to change passwords
Implementing stronger security measures

Dan Murphy's advised customers to "practise good password hygiene, using a strong password and changing it periodically" [3].

Prevention Strategies

To protect against credential stuffing, businesses should:

Implement multi-factor authentication
Educate customers about password security
Monitor login behaviour on their website
Implement, and regularly update, security measures, including bot management and advanced rate limiting.

Credential stuffing is not just a password reuse problem. It is an account protection problem, and businesses that sell online need controls that make stolen credentials harder to turn into purchases.

Sources:

[^1^] ABC News: "The Iconic was hit by criminals taking money by 'credential stuffing'. How can you stay safe?" [^2^] Cyber Daily: "Guzman y Gomez, Dan Murphy's customers affected in credential stuffing campaign" [^3^] The Sydney Morning Herald: "Thousands of Australians hacked in 'credential stuffing' credit card scam"

2024 Survey Insights

2024-07-16T10:00:00+10:00

Our recent survey of Australian CISOs and CTOs looked at account protection controls, planned security measures, and how teams are responding to credential stuffing and residential proxies. Key findings:

Multi-Factor Authentication (MFA) Adoption: 76.23% of Australian businesses use MFA, showing broad adoption of a baseline account security control.
Bot Protection: Currently implemented by 39.34% of organisations, with an additional 34.65% planning to adopt it.
Bot Management Solutions: Cloudflare is the most common bot management provider in the survey, used by 48.24% of respondents.
Residential Proxy (Resip) Detection: Only 13.11% of organisations currently use this technology, although many plan to implement it to address residential proxy traffic.
Credential Stuffing Concerns: Businesses are planning measures to reduce credential stuffing risk, including bot protection, MFA, and checking credentials against known breaches.
Mobile Security Gap: Low adoption of Web Application and API Protection (WAAP) suggests gaps in mobile application security.
Executive vs. Engineer Priorities: The survey showed different cybersecurity priorities between executives and engineers.

These findings point to the need for account protection strategies that go beyond MFA and address automated traffic, breached credentials, and residential proxies.

2024 Survey Insights

2024-07-16T10:00:00+10:00

Recent customer account takeovers have put account protection back on the agenda for Australian businesses. Our 2024 survey of Australian CISOs and CTOs shows how respondents are using MFA, bot protection, WAAP and residential proxy detection to manage credential stuffing and account takeover risk.

Account Protection: Current State and Future Plans

Our survey found 76.23% of Australian businesses use Multi-Factor Authentication (MFA). MFA is widely adopted, but it is not a complete account protection strategy on its own.

39.34% of organisations currently use bot protection. That matters because credential stuffing is automated by design. Another 34.65% of businesses plan to implement bot protection in the future.

The pattern is clear: many organisations are treating MFA as a baseline and looking at additional controls around it.

Current Bot Management Solutions

The survey also asked which bot management solutions Australian businesses currently use. Cloudflare was the clear leader, with nearly half of respondents using its services.

The breakdown of bot management solutions is as follows:

Cloudflare: 48.24%
AWS WAF Bot Ruleset: 10.59%
Other solutions make up the remaining percentage

This distribution is concentrated around Cloudflare. Outside that, the remaining respondents are spread across other solutions rather than one clear alternative.

Tooling matters here. Residential proxy traffic weakens IP reputation and simple rate limits, so detection capability, request grouping and response controls matter as much as vendor name. If residential proxies continue to feature in credential stuffing tooling, this mix may shift as teams look for more advanced protection measures.

The Rising Threat of Residential Proxies

A key finding from our survey is the low adoption rate of residential proxy (resip) detection, with only 13.11% of organisations currently using this technology. Planned adoption suggests teams are starting to account for the risk, but current coverage is still low.

Resips are difficult for account security teams because malicious traffic can look like normal ISP traffic. They enable attackers to:

Bypass traditional IP-based rate limiting
Evade geolocation-based restrictions
Conduct large-scale credential stuffing attacks
Scrape sensitive data undetected

The planned adoption of resip detection points to a shift in security strategies, away from simple IP-based controls and towards more specific network signals.

Learn more about the threat of residential proxies and how to detect them

Credential Stuffing: A Persistent and Growing Concern

Credential stuffing attacks continue to be a major concern for businesses. These attacks exploit password reuse across multiple sites, allowing attackers to gain unauthorised access to user accounts.

Respondents said they plan to implement several measures to reduce credential stuffing risk:

34.65% plan to implement bot protection
32.67% intend to add multi-factor authentication
31.68% aim to check credentials against known breaches

These plans point to layered account protection rather than reliance on one control.

Mobile Applications: An Emerging Attack Surface

While mobile applications were not directly addressed in our survey, the data suggests a possible gap in mobile security strategies. The low adoption rate of Web Application and API Protection (WAAP) - implemented by only 27.87% of respondents - indicates many businesses may be underprepared to protect their mobile assets.

As mobile apps become primary interfaces for critical operations, this gap leaves businesses exposed to attacks that use the same automation and resip infrastructure seen on web login flows.

Balancing Security and User Experience

The operational problem is familiar: increase assurance without making login unusable. Key considerations for enhancing account protection while preserving usability include:

Expanding beyond MFA
Implementing bot protection
Adopting WAAP solutions
Monitoring credential leaks
Focusing on API security
Implementing residential proxy detection

Explore strategies for balancing security and user experience

Executive vs Engineer Perspectives

Our survey found differences in cybersecurity priorities between executives and engineers:

Figure 3: Comparison of cybersecurity priorities between executives and engineers

The gap matters because budget, architecture, and incident response are often owned by different teams. Account protection plans need to cover both executive risk concerns and engineering realities, including the threat from RESIPs.

Final Thoughts

Our 2024 survey results point to a simple position: MFA is widely used, but it is not the whole account protection strategy. Bot protection, breached credential checks, WAAP and residential proxy detection are still unevenly adopted. That matters because credential stuffing does not depend on one weakness; it combines reused credentials, automation, proxy networks and weak response controls.

Australian businesses do not need every control at once, but they need a layered plan that reflects how account takeover attacks are run now. For teams reviewing their controls, resip detection and mobile/API coverage are worth checking explicitly because both are easy to miss if the programme is still centred on MFA and IP reputation.

Application Security Beyond MFA

2024-07-15T10:00:00+10:00

Multi-factor authentication (MFA) remains a useful defence against account takeovers, but it is not a complete control. Attackers increasingly work around MFA with social engineering, automation, and infrastructure that makes malicious traffic look ordinary.

MFA answers one narrow question: can the user present a second factor at this point in the flow? That is valuable. It does not prove the password was safe, the session will remain safe, the device is trusted, or the person entering the code has not been manipulated. Account protection needs to cover the request path before MFA, around MFA, and after MFA.

OTP Bots Target the Human, Not the Cryptography

A Kaspersky article describes the rise of OTP bots: tools that call or message users and convince them to hand over one-time passwords. The attacker does not need to break the MFA system. They need the victim to read out a fresh code at the same moment the attacker is logging in.

The usual flow is simple. The attacker obtains a working username and password from a breach, phishing kit, or credential stuffing result. They attempt a login, which triggers an OTP. The victim receives a call or message claiming to be from the bank, retailer, courier, or support team. The story is urgent enough to make the code feel like part of protecting the account, not compromising it.

AI phone assistants such as Lucy are built for legitimate business use, but similar conversational technology lowers the effort required to run more convincing criminal call flows. The security issue is not that AI magically defeats MFA. It is that a fluent, responsive call can make social engineering less scripted and harder for a user to dismiss.

This is why "we have MFA" should not end the account protection conversation. MFA can stop many stolen-password logins, but it cannot reliably stop a user from being tricked in real time.

Residential Proxies Weaken the Surrounding Checks

Attackers also work to make the login itself look unremarkable. Residential proxies route traffic through IP addresses assigned to ordinary home or mobile internet connections. That lets malicious traffic borrow the appearance of normal customer traffic.

Traditional controls often lean too heavily on IP address, geolocation, and request volume. Residential proxy networks weaken all three. An attacker can rotate through many IPs, keep each source below a simple rate limit, and choose an exit location that roughly matches the victim's country or city. If the login looks local enough, the MFA challenge may be the only control left.

That is a poor place to put all the risk. A login with a correct password, a plausible IP address, and a successful OTP can still be an account takeover. The system needs to keep evaluating the request: device and browser signals, network fingerprint, known breached credentials, velocity across accounts, and behaviour after login.

Automation Happens Before and After MFA

MFA is usually visible at the point of login, but account takeover campaigns are broader than one prompt. Bots test credential pairs across login forms and APIs. Tools such as OpenBullet and similar automation frameworks can replay login flows at scale. Breached credential lists give attackers a cheap starting point because password reuse remains common.

Once an attacker gets through, the next actions matter. They may change the email address, add a device, disable notifications, alter delivery details, use stored payment methods, transfer value, or test what the account can access. If monitoring treats a successful MFA as the end of risk, those actions can happen inside a trusted session.

The defence needs to be layered around the actual attack path:

Check credential risk before and during login, especially known breached username and password pairs.
Use bot and browser signals to detect automation even when traffic is distributed.
Rate limit on better keys than IP alone, such as TLS or HTTP/2 fingerprints, headers, routes, ASNs, countries, and account behaviour.
Treat residential proxy evidence as a risk input, not just an allow-or-block label.
Monitor session and account changes after MFA, then challenge, hold, revoke, or review when behaviour changes.

This does not mean every login needs more friction. It means the system should have more choices than "ask for MFA" or "allow". A low-risk login from a known device can keep moving. A login using breached credentials through proxy infrastructure can be slowed, challenged, or blocked before the user receives a confusing call. A successful login followed by high-risk account changes can trigger fresh verification or session invalidation.

Controls Around MFA

Peakhour's Advanced Rate Limiting helps reduce reliance on IP address by grouping and limiting requests using signals such as HTTP/2 and TLS fingerprints, ASNs, countries, request headers, and route context. That matters when credential stuffing is spread across residential proxies.

Peakhour's Bot Management adds another layer by looking for automation, browser inconsistency, suspicious device patterns, and residential proxy use. The aim is to identify the machinery behind the attack before it becomes a clean-looking login attempt.

Peakhour's Account Protection brings those signals closer to the account decision. Breached credential checks, bot evidence, rate limits, proxy context, custom rules, and monitoring should all feed the decision to allow, challenge, rate limit, block, log, or review.

User education still has a place, especially around OTP sharing and unexpected calls. It should not be the main control. Users are asked to make security decisions at bad moments, often under pressure, with limited context. Technical controls should reduce the number of times an attacker can create that moment.

MFA Still Belongs in the Stack

The point is not to remove MFA. Strong MFA, especially phishing-resistant methods, raises the cost of account takeover and should remain part of the stack. The mistake is treating MFA as proof that the account is safe.

Account protection works better when MFA is one decision point inside a wider system. The login attempt, credential history, network path, device, session, account changes, and transaction behaviour all carry evidence. MFA is useful evidence. It is not the whole case.

Rate Limiting for API Security

2024-01-24T13:00:00+11:00

Rate limiting prevents servers from being overwhelmed by too many requests in a short period of time. Typically, rate limiting is configured using rules made up of a filter, for example a path like /login, and a limit on the number of requests a user can make in a given time, such as 10 requests in a minute. If a user exceeds this limit, they are usually blocked for a timeout period.

But how do you identify a user? Traditionally rate limiting has used the IP address for grouping requests, assuming that requests from the same IP address will be the same user. That assumption is now weak. IP addresses are rarely static and are often shared. For example, an office network might have hundreds of individual computers in it but present a single IP address for all those computers to the internet. Mobile operators commonly use carrier-grade network address translation (CGNAT) to share the same IP across thousands of devices or users. Bot networks, seeking to avoid security controls like rate limiting, will rotate their requests through thousands of different IP addresses. This makes rate limiting based on IP addresses a poor choice from both a functional and a security perspective.

Introducing Advanced Rate Limiting

Peakhour's Advanced Rate Limiting service lets you create filters using any HTTP request characteristic, for example URI, request method, headers, cookies, country, network fingerprints and more. You can also use response headers and response codes, so a rule can count failed login attempts, repeated 404s from a scraper, or traffic that crosses an API threshold.

For counting requests you can use the following fields for grouping:

IP Address
ASN
Country Code
HTTP/2 Fingerprint
TLS Fingerprint
Any combination of Request Headers

You can use one of those fields, or a combination of them, to identify users with more control than IP address alone.

You can also separate the filter and mitigation expression. For example excessive attempts to /login can be blocked on the entire site.

This matters because rate limiting is not just a request counter. In Peakhour it sits beside bot management, WAF, DDoS protection, traffic controls, and origin shielding on the same managed edge path. That gives operators a practical way to set different thresholds for verified crawlers, suspicious automation, authenticated API clients, and normal visitors without pushing every policy change into the application. It also gives them allowed, blocked, and threshold-hit evidence to tune the rule after it is deployed, whether Peakhour is the active edge or adding controls beside an existing CDN or cloud edge.

Putting it into action

Advanced Rate Limiting can help protect applications from attacks like Layer 7 DDoS, Account Takeovers, Credential Stuffing, and more. Here are some real world examples you can configure using our dashboard and API.

Protect against general site abuse

Our example website is a medium-sized ecommerce store that has page URLs ending in /. It serves Australian clients and typically sees around 100 page requests a minute from non-search-engine traffic during peak traffic times. With that baseline, we can set up rate limiting to prevent general site abuse and protect against layer 7 DDoS attacks.

Peakhour rate limiting starts with zones. You specify your request limits in these zones.

Here we've specified a maximum of 45 requests in 1 minute. We're going to apply this limit to page loads only. Since our typical maximum for all users on this website is 100 in a minute, it seems reasonable that a real user is not going to view 40 pages in 1 minute. We could also specify a value for error responses in a minute. An error could be a 404, which a scraper might typically get when looking for removed URLs.

Now let's define our filter and our counter. For our filter we mentioned that pages end in /, so we'll use that, but exclude verified bots to make sure they're not restricted when crawling the site. A verified bot is a crawler like Google or Bing, that Peakhour has verified as legitimate by using reverse DNS to confirm they are who they say they are.

Attackers, scrapers, and others looking to abuse a site will launch an attack using a particular piece of software. That piece of software will have a TLS fingerprint (like JA3) that remains the same, even as the attacker rotates their user-agent, IP address, and other characteristics, so we'll use the TLS fingerprint as our request counter.

Rate Limit authenticated API Users

It is common for APIs to require an Authorization header as part of the request to authenticate access. By grouping requests on the value of this header, we can rate limit a specific API client even if it uses multiple applications, or if its credentials are stolen.

Protecting from Account Takeovers

Account Takeover attacks have been in the news recently, with several high-profile websites being victims. Credential Stuffing and Brute Force attacks rely on attempting lots of logins to identify valid credentials. Along with lots of attempts come lots of failures. Attackers will rely on software like openbullet to carry out their attacks, using proxy networks to constantly rotate IP addresses and defeat traditional rate limiting.

The program the attacker is using will present a consistent TLS fingerprint. We can make a special rule for our login form that tracks failed login attempts by TLS Fingerprint, effectively tracking the attacker as they rotate IP address.

If the attack is low and slow, we can track failed attempts over a longer timeframe by using the response from the server when adding to our counting zone.

Final Thoughts

Advanced rate limiting is a practical response to the limits of IP-based controls. IP address rotation is the standard amongst attackers and scrapers, rendering the traditional approach obsolete. Useful protection now needs to identify the actor behind the requests, protect the origin before expensive application work is triggered, and give teams enough evidence to adjust the policy without guesswork. Counting requests against a combination of network fingerprints, request fields, response signals, and bot context is how you stop abuse from scrapers, SEO spiders, and layer 7 attackers without treating every visitor the same.

Google Chrome's "IP Protection" vs Apple Private Relay

2023-10-25T13:00:00+11:00

Google Chrome's "IP Protection" vs. Apple's iCloud Private Relay

Google and Apple are both pushing browser-level privacy features that reduce how much a website can infer from a user's IP address. Google's recent announcement of its "IP Protection" feature for Chrome follows Apple's iCloud Private Relay, but the two approaches are not the same.

Apple's iCloud Private Relay: A Closer Look

In 2021, Apple introduced iCloud Private Relay for paid iCloud+ subscribers. The feature encrypts traffic from the user's device and routes internet requests through two separate relays. The intention is to stop any single party, including Apple, from building a comprehensive user profile from IP address, location, and browsing activity.

However, this feature is specific to Apple's Safari browser. It is not a full VPN; it is a browser-centric service that protects Safari traffic on iOS, iPadOS, and macOS. The user's internet requests are routed first through an Apple server, then through a partner network like Akamai, Cloudflare, or Fastly, before reaching the intended destination. This dual-hop design means neither party has a complete view of both the user's IP address and the browsing destination.

Google's "IP Protection": Playing Catch-up?

Google's "IP Protection" for Chrome appears to be an answer to Apple's initiative. By masking users' IP addresses using proxy servers, Google aims to preserve user privacy while keeping essential web functions working. Unlike Apple's solution, which is limited to Safari, Google's feature potentially has wider application within the Chrome ecosystem.

However, Google's solution is still early, with phased implementation and limited domain application. Apple has already integrated and offered iCloud Private Relay to its users; Google is still testing its feature.

Can Apple Allow Google's Feature on Chrome?

Given the competitive nature of the technology industry, it remains uncertain whether Apple will allow Google's IP Protection feature on Chrome for Apple devices. With iCloud Private Relay already in place, Apple may see Google's feature as redundant or conflicting with its privacy objectives.

The Bigger Picture: Ad Tracking and Platform Control

Both companies present these changes as privacy improvements, but the platform context matters. Hiding IP addresses does not remove ad tracking, and privacy features can also reinforce platform control. By making privacy protections part of their own browsers and ecosystems, Google and Apple can reduce some third-party visibility while keeping users inside platforms they operate and measure.

Apple's iCloud Private Relay and Google's "IP Protection" both improve some aspects of user privacy, with different approaches and coverage. As Google plays catch-up to Apple in this area, users should understand what these features do and what they leave in place. The goal should be genuine online privacy, and as we've discussed in our article on TLS fingerprinting, network-based fingerprinting is becoming increasingly important for protecting services in this changing environment.

Google Chrome's "IP Protection" and Online Privacy

2023-10-24T13:00:00+11:00

Google plans to introduce an "IP Protection" feature in Chrome. The feature is intended to improve privacy by masking IP addresses through proxy servers. It may also affect ad tracking and who controls access to online platforms.

Understanding IP Addresses and Google's Strategy

IP addresses can let websites follow user activity across platforms. Over time, that can build detailed profiles and create real privacy concerns. Google's "IP Protection" is designed to reduce that signal by sending third-party traffic through proxies, hiding user IPs. The feature will start as optional, then focus on domains thought to track users.

At first, Google will use a dedicated proxy for its own domains. As testing continues, the system may change. Google is also considering a 2-hop proxy system for better privacy, with an outside CDN handling the second proxy.

Google wants to use proxy connection IPs to give users broad locations, not exact ones. It will test this on platforms like Gmail and AdServices, in Chrome versions 119 to 225.

VPN Growth and Other Browsers

The growth of VPN use points to demand for online privacy. VPNs, like Google's IP Protection, hide user IP addresses. Firefox and Opera have added VPN features to their browsers. Apple, known for user privacy, has worked with CDN companies on similar privacy improvements.

This change has trade-offs. Sending traffic through Google's, or others', servers can make it harder for security teams to handle threats. Google has suggested fixes like checking users with the proxy and rate-limiting to tackle these problems.

What It Means

Traditional safety tools like IP reputation and GeoIP methods are becoming less reliable. This change highlights the role of network-based fingerprinting now. For more on this, read our article on TLS fingerprinting.

While firms talk about hiding IP addresses, ad tracking is still common. These changes might also push users to certain platforms. Even if users think they're safe, big tech's tracking tools can still watch them. That can give users a false sense of safety. Real privacy still needs practical tools and clear public understanding.

Web scraping another Business' website

2023-10-11T13:00:00+11:00

As businesses continue to build their presence online, screen scraping is becoming more prevalent. Screen scraping is the use of software or code to take data from another website. For example, popular platforms like Skyscanner or booking.com usually take price data on flights and accommodation and display it on their websites. However, Australian copyright laws or the website owner’s terms and conditions may forbid you from screen scraping. This article explains the legal aspects of scraping data from another business’ website and the precautions you should take.

Am I Violating the Law by Screen Scraping?

Australian copyright law safeguards ‘original creative works’, including:

written works;
visual images;
music; and
moving images.

Copyright can also protect documents such as government reports and legal forms. When determining whether copyright protects a creative work, the work does not need to be intricate or of high quality. It only needs to demonstrate originality and not be copied from another source.

Is Data an ‘Original Work’?

Data is usually fact-based and primarily consists of statistics or numbers. As a result, copyright usually does not protect data.

Examples of such data include:

the consumer price index for a particular quarter;
monthly house price increases in a city;
the number of students in a class; or
the count of films released in a year.

Generally, the law does not consider this an original work because it merely represents real-world information.

What Data is an ‘Original Work’?

However, data can be an original work in some circumstances. For example, if you organise data in a unique manner that reflects someone's creativity, the law might consider that data an ‘original work’.

Examples of organised data that copyright protects include;

accounting forms;
sequences of numbers or letters for a bingo game; or
a car parts catalogue.

Consequently, screen scraping data from a website is unlikely to infringe copyright unless it involves protected, creatively organised data. Infringing someone’s copyright means using their copyright-protected material without their permission.

Are There Exceptions to Copyright Law?

In the rare event that your screen scraping infringes copyright, your use could fall under an exception to copyright infringement. Australian copyright law refers to these exceptions as 'fair dealing.'

The four ‘fair dealing’ exceptions include using copyright-protected materials for:

research or study;
review or critique;
parody or satire; and
reporting the news.

For instance, a journalist scraping original data sets to report potential price-gouging among airlines could potentially rely on the exception for reporting the news. However, if you are scraping data for business purposes, the fair dealing exceptions may not apply.

What if a Website Explicitly Bans Screen Scraping?

Even if screen scraping is not always illegal under Australian copyright law, website owners can use their terms of use to prohibit data scraping. These terms of use often appear as website pop-ups. The pop-ups typically state that by continuing to use the website, you accept the terms of use.

These terms can explicitly forbid:

data scraping;
copying;
hacking; or
any form of data extraction.

Violating these terms would result in you breaching the website’s terms of use. As a result, the website owner may take legal action against you. If the data on the website qualifies as original work, copyright infringement claims may also arise.

Therefore, it is advisable not to screen scrape from websites with explicit terms of use against that activity. If you do engage in screen scraping, ensure you only extract factual information.

Key Takeaways

Screen scraping is generally lawful if you extract strictly factual information from other websites. However, if a website's terms of use prohibit screen scraping, even for factual data, it is advisable to avoid data scraping. Otherwise, you could face potential breach of contract and copyright infringement claims.

For assistance with your legal obligations, LegalVision’s experienced IT lawyers can assist as part of their membership. For a low monthly fee, you will have unlimited access to lawyers who can answer your questions and draft and review your documents. Call LegalVision today on 1800 296 912 or visit their membership page.

ZDNS - scan the entire internet

2023-06-20T13:00:00+10:00

The lack of a free Reverse DNS (rDNS) lookup database has made large-scale DNS research harder. To address this, we used ZDNS, an open-source, high-performance DNS toolkit developed by Stanford University, to create our own rDNS database. To reduce UDP timeout issues during rDNS operations, we devised a scan-ordering approach that randomised the IP space and improved the efficiency of the scanning process.

Leveraging ZDNS for rDNS Lookups Across the Internet

Understanding rDNS is useful for internet operations and research. Active DNS measurement helps us inspect how providers advertise the use of their IP address space. One of the components of this ecosystem is Reverse DNS (rDNS), which serves an important role in IP database categorisation and ASN (Autonomous System Number) classification. However, running rDNS across the entire internet is not a trivial task.

Previously, Rapid7 provided a free database for rDNS lookups, but it has discontinued the offering. This situation has prompted the need to create our own database, calling for a robust, efficient, and scalable tool to accomplish the task. ZDNS was the right fit.

Introducing ZDNS

ZDNS, a part of the ZMap.io project, is a capable tool developed by Stanford University to support scalable and reproducible DNS research. ZDNS is an open-source DNS measurement framework specifically optimised for large-scale DNS research on the public internet. It can resolve 50 million domains in 10 minutes and query the PTR records of the complete public IPv4 address space in approximately 12 hours.

This high-performance toolkit offers a modular interface, enabling researchers to safely implement new functionalities. Its architecture is designed to expose DNS lookup chains by performing recursive resolution. ZDNS supports a command-line interface and outputs results in JSON, a machine-parsable format.

Enhancements by ZDNS

ZDNS's architecture and feature set are tailored to the challenges of extensive DNS research. Its guiding principles are that the DNS lookup chain is exposed, and that the tool is safe, easy to use, and extensible.

ZDNS's performance optimisations make it a suitable tool for DNS experiments that require querying a large number of names. Parallelism, UDP socket reuse, and selective caching are some of the critical performance optimisations that enable ZDNS to efficiently handle large volumes of DNS queries.

ZDNS's scalability, execution time, and success rate have been evaluated against several existing tools, showcasing its performance. For instance, when it comes to exposing the DNS lookup chain, ZDNS is 85 times faster than Dig. ZDNS also outperforms other higher-performance tools, achieving 2.6 to 3.6 times more successful queries per second and experiencing about 30% less packet drop than MassDNS.

Our rDNS Journey

When we started scanning the whole internet with rDNS, we hit a practical roadblock: UDP timeouts made the scans slow. The system spent too much time waiting for responses from parts of the internet that were either empty or broken.

We used two changes. Firstly, instead of scanning the internet's addresses in order, we mixed them up and scanned randomly. This spread out our requests and stopped the system from getting stuck on troublesome ranges. Secondly, we checked smaller sections of the internet first, so we did not waste time waiting for big chunks of the internet that weren't responding.

With these changes, we scanned the whole internet in 13 days, finding over a billion addresses. The main lesson was straightforward: scan order matters when timeout behaviour dominates runtime.

Wrapping Up

ZDNS has proven to be a valuable tool for DNS research, especially for substantial tasks like performing a reverse DNS scan of the entire internet. Our experience underscores the value of practical adjustments when dealing with large-scale challenges, like randomising the IP space to avoid delays caused by UDP timeouts.

As an open-source tool, ZDNS is available on Github. For more detail, read the award-winning paper presented at IMC 2022.

Our work with ZDNS shows its value in DNS research and the operational detail involved in large-scale DNS work. By randomising the scan order, we mitigated timeout issues and improved the efficiency of our scanning process.

Izhikevich, L., Akiwate, G., Berger, B., Drakontaidis, S., Ascheman, A., Pearce, P., Adrian, D., & Durumeric, Z. (2022). ZDNS: a fast DNS toolkit for internet measurement. In Proceedings of the 22nd ACM Internet Measurement Conference (pp. 33-43). https://doi.org/10.1145/3517745.3561434 ↩
ZMap Project. (n.d.). ZDNS. GitHub. Retrieved 2023-05-15 13:00, from https://github.com/zmap/zdns. ↩

The Rise of the Dragon

2023-05-17T13:00:00+10:00

Camaro Dragon, a Chinese state-sponsored group, has developed a custom firmware implant for TP-Link routers. Once installed, it can turn compromised routers into residential proxies. That weakens traditional cyber-defences, including GeoIP blocking, because traffic can appear to come from ordinary local connections. This article looks at how the malware works, why residential proxies matter for enterprise security, and where GeoIP security measures fall short.

Understanding the New Malware

Check Point's research describes Camaro Dragon's sophisticated attacks on European foreign affairs entities. The group uses a custom firmware implant, known as 'Horse Shell', designed specifically for TP-Link routers. The malware includes a backdoor that grants the attackers continuous access to compromised networks and allows them to build anonymous infrastructure.

'Horse Shell' can execute arbitrary commands on the infected router, transfer files, and relay communications using SOCKS tunnelling. Its design can be adapted to different vendors' firmware, suggesting the possibility of a wider spread.

The People and Intentions Behind The Malware

Investigations into the origin of the 'Horse Shell' malware by Check Point Research, Avast, and ESET point to a well-known cyber threat actor: Mustang Panda. This advanced persistent threat (APT) group, linked to the Chinese government, is known for complex attacks that often exploit Internet-facing network devices.

The primary function of 'Horse Shell' is to relay traffic between an infected device and the attackers' command and control servers. This method obscures the true source and destination of the communication, making it difficult to trace back to the attackers.

Importantly, Mustang Panda appears to choose router implant targets indiscriminately. The infection of a home router doesn't imply that the homeowner is a direct target. Instead, each infected router becomes a node in a broader chain that connects main infections with command and control operations.

Researchers identified this approach when they found the 'Horse Shell' implant during an investigation of targeted attacks against European foreign affairs entities. The implant allows the attackers to maintain ongoing access, establish anonymous infrastructure, and move laterally within compromised networks.

The Implications of Residential Proxies

Residential proxies serve as intermediaries, using real IP addresses issued by Internet Service Providers (ISPs). They are used across a range of applications, including business web scraping and anonymising user online activity.

Residential proxies become more serious when malware such as 'Horse Shell' is involved. This malware infects routers, turning them into a network of residential proxies that can then be used for malicious activity, including data breaches and distributed denial-of-service (DDoS) attacks.

Most importantly, this use of residential IP space can make an attack look as if it originates from a domestic source within the target's location. That undermines traditional cyber-defences.

GeoIP Security Measures and Their Limitations

GeoIP blocking, a traditional cyber security tool, works by limiting access from specific geographical regions or networks frequently associated with cyber threats. However, this method is becoming less effective against the rising use of residential proxies.

Residential proxies can disguise the actual origin of a cyber attack, giving the illusion that it's originating from a trusted, usually local, location. This capability allows them to effectively bypass GeoIP blocking measures. Consequently, malicious actors using residential proxies can carry out their activities with less obvious attribution and often go undetected.

The key operational issue is the exploitation of home routers by malware like 'Horse Shell,' which turns these devices into unwitting participants in cyber attacks. This manipulation means an attack could appear to originate from a seemingly trusted domestic source, which can render GeoIP blocking ineffective.

This threat shows why cyber security needs a more layered approach. Sole reliance on GeoIP blocking is no longer enough. As malware evolves to exploit residential proxies, detection and defence strategies need to adapt. Specifically, it's important to recognise that relying solely on GeoIP blocking, or trusting apparently local connections and deny-listing countries like Russia and China, can create a false sense of security.

Detecting Residential Proxies: The Role of Network Fingerprinting

The rise of residential proxy malware makes network fingerprinting important for identifying these threats. Five techniques can help detect residential proxies:

TCP Fingerprinting: Proxied requests may generate TCP fingerprints that don't match the expected device type. For example, a request from a residential IP address that bears the fingerprint of a server OS could be a strong signal of a proxy.
TLS and HTTP/2 Signatures: As with TCP fingerprints, unusual TLS and HTTP/2 signatures could reveal proxies. An incoming request using a version of TLS or HTTP/2 not commonly used in residential networks might indicate a proxy.
JavaScript-based Fingerprinting: This method identifies the specific browser in use. Discrepancies in JavaScript fingerprints, or the absence of a fingerprint, could suggest the presence of a residential proxy.
Timing Analysis: The timing of requests can also be a signal. Proxied requests might exhibit longer or inconsistent intervals between requests, indicating a residential proxy.
Port Scanning: This technique can detect open ports that could indicate the presence of SOCKS or other proxies, revealing possible exposure to threats.

While residential proxies have legitimate uses, such as web scraping, those applications sit beside a more serious risk: compromised trusted or local networks can be turned into proxy infrastructure at scale. Cyber threats like 'Horse Shell' use residential proxies to undermine traditional GeoIP security measures, which means defence strategies need to keep evolving.

In Part 1 of our series on residential proxies, we provide an overview of this topic and why it matters to security teams. From basic uses to their role in complicated cyber attacks, we cover the key points.

Learn how Peakhour's Application Security Platform protects against account takeovers and credential stuffing. Contact our team to secure your user accounts.

Cohen, I., Madej, R., & Threat Intelligence Team (2023). The Dragon Who Sold His Camaro: Analyzing Custom Router Implant. Check Point Research. Retrieved from https://research.checkpoint.com/2023/the-dragon-who-sold-his-camaro-analyzing-custom-router-implant/ ↩
Goodin, D. (2023, May 17). Malware turns home routers into proxies for Chinese state-sponsored hackers. Ars Technica. Retrieved from https://arstechnica.com/information-technology/2023/05/malware-turns-home-routers-into-proxies-for-chinese-state-sponsored-hackers/ ↩

Residential Proxies and MITRE Framework

2023-05-17T13:00:00+10:00

Residential proxies act as intermediaries, routing traffic through real-world IP addresses. That can mask user identity, bypass geographical restrictions, and improve privacy. The MITRE ATT&CK framework, a matrix of cyber adversary tactics and techniques, categorises proxy use under technique T1090. The classification helps explain how attackers use proxies to maintain command and control across target environments, including Linux, Windows, and macOS.

Residential proxies are useful and risky in equal measure. They support anonymity and data collection, but misuse creates ethical and security concerns, including credential stuffing and account takeovers. MITRE ATT&CK's treatment of proxy use gives security teams a clearer way to reason about those risks and plan mitigations.

Looking at residential proxies through the MITRE framework keeps the discussion grounded. It shows where proxy use fits into attacker tradecraft, and where defenders need practical controls rather than broad assumptions about intent.

From Credential Stuffing to Account Takeover and Data Exfiltration

Credential stuffing and account takeover incidents, including the Ubiquiti breach, show how exposed digital defences can be. Attackers use residential proxies to mask activity, which aligns with MITRE ATT&CK technique T1090. This technique describes proxy use for discreet command and control. In the Ubiquiti case, adversaries utilised proxies to test and apply stolen credentials across systems without revealing their true locations, a direct application of T1090's principles.

The Camaro Dragon malware demonstrates residential proxy exploitation for account takeovers. By infecting devices and incorporating them into a botnet, the malware facilitated remote control over victims' accounts, aligning with MITRE's T1090 for proxy-managed network communications. Camaro Dragon's operation reflects the tactic of maintaining anonymity while executing unauthorised access and control, a strategy documented within the MITRE framework.

Volt Typhoon's activities present a sophisticated use of residential proxies in data exfiltration. This group, known for targeting infrastructure, manipulated proxies to move data discreetly from compromised networks, a tactic that falls under MITRE's T1090. The operation shows how adversaries use residential proxies to obscure the digital footprint of data theft, complicating traceability and detection.

Viewed through MITRE ATT&CK, these examples show how residential proxies support credential stuffing, account takeovers, and data exfiltration. They also point to the need for integrated defence strategies that account for different forms of proxy misuse, rather than treating proxy traffic as a single problem.

The Role of Residential Proxies in Web Scraping

Residential proxies are common in web scraping because they let operators simulate requests from different geographic locations. That capability is especially useful when gathering data from websites with GeoIP restrictions or anti-scraping measures. In the MITRE ATT&CK framework, residential proxy use in web scraping aligns with several techniques that describe how adversaries gather information and evade detection.

Technique T1090, which details proxy use, illustrates how adversaries utilise residential proxies to disguise web scraping activity. By routing requests through proxies, they can avoid IP bans and rate limits, enabling the collection of large amounts of data without detection. This technique shows the practical advantage of residential proxies in bypassing network defences and aggregating targeted information discreetly.

Web scraping through residential proxies also intersects with the MITRE framework's emphasis on reconnaissance techniques. Adversaries use reconnaissance to gather valuable data about targets, and residential proxies help them do it discreetly. By presenting requests as coming from different residential IPs, attackers can compile detailed profiles on organisations, their operations, and vulnerabilities without revealing their intent or location.

For defenders, residential proxy use in web scraping creates a dual challenge. It can support legitimate data collection and market research, and it can also help adversaries gather intelligence before further attacks. That makes proxy handling a judgement problem as well as a blocking problem: organisations need to balance access to information with protection against unauthorised data extraction.

Understanding residential proxy use in web scraping through MITRE ATT&CK helps define the detection problem more precisely. Defenders need mechanisms that can distinguish legitimate proxy-backed activity from malicious use, and policies that can respond without over-blocking normal traffic.

Defending Against Proxy-Related Cyber Attacks Informed by MITRE

Defending against cyber attacks that use residential proxies requires layered controls informed by MITRE ATT&CK. Technique T1090, which focuses on proxy use for command and control activity, provides a useful base for designing those defences.

Network Monitoring and Analysis

A core defence is stronger network monitoring and analysis. By scrutinising network traffic, organisations can identify unusual patterns that may indicate malicious proxy use. This includes monitoring for excessive requests from varied geographic locations that do not align with normal user behaviour. The MITRE framework suggests network intrusion detection systems (NIDS) to detect suspicious activity, including anomalous residential proxy use.

Implementing Access Controls and Rate Limiting

To mitigate credential stuffing and account takeover through proxies, organisations need strict access controls and rate limiting. These measures can reduce automated attacks by limiting how many requests a user can make within a set timeframe, weakening distributed attempts to breach systems via residential proxies.

Application of Web Application Firewalls (WAFs)

Web Application Firewalls (WAFs) help defend against proxy-related attacks. When configured to recognise and block requests with patterns typical of proxy misuse, such as rapid request rates or known malicious IP addresses, WAFs provide a barrier against unauthorised data scraping and other proxy-facilitated intrusions.

Proxy Detection and Blocking

Advanced proxy detection tools help organisations identify and block traffic coming through known residential proxies. Techniques include analysing originating IP addresses for known proxies and using behaviour analysis to detect patterns indicative of proxy use. Once identified, these IP addresses can be blocked or subjected to additional scrutiny.

User Behavior Analytics (UBA)

User Behavior Analytics (UBA) helps detect anomalies that may signal a proxy-based attack. By establishing baselines of normal user activity, UBA systems can flag deviations that suggest malicious activity, such as multiple failed login attempts or unusual data access patterns, which are indicative of credential stuffing or data exfiltration attempts.

Educating Users on Security Hygiene

Educating users on security hygiene can help prevent inadvertent participation in malicious proxy networks. Users should understand the risks of downloading unverified software or browser extensions, which could turn their devices into nodes within a residential proxy network.

Informed by MITRE ATT&CK, these defence strategies give organisations a practical way to reduce exposure. Understanding the tactics and techniques used by adversaries helps teams strengthen controls against sophisticated residential proxy use in cyber attacks.

Detecting Malicious Use of Residential Proxies

Detecting malicious residential proxy use requires both technical controls and threat intelligence. The MITRE ATT&CK framework, particularly technique T1090, gives defenders a reference point for how adversaries use proxies and where detection mechanisms should focus.

Traffic Pattern Analysis

One primary method for detecting malicious residential proxy use is traffic pattern analysis. This includes monitoring for spikes in traffic from geographical locations that do not match the service's typical user profile. Anomalies in request rates or patterns that suggest automation, such as regular intervals between requests, can also indicate proxy abuse.

Behavioral Anomaly Detection

Behavioural anomaly detection systems identify actions that deviate from normal activity. These systems can flag unusual behaviour that might indicate malicious residential proxy use, such as repeated login attempts from different IP addresses in a short period, which could signify a credential stuffing attack.

IP Reputation and Proxy Lists

Utilising IP reputation databases and known proxy lists can help identify and block requests from suspicious sources quickly. These lists include IP addresses known to be part of residential proxy networks or previously implicated in malicious activity. Integrating this intelligence into security systems allows for real-time blocking or flagging of potentially harmful traffic.

Endpoint Detection and Response (EDR) Systems

Endpoint Detection and Response (EDR) systems help spot compromised devices within an organisation that could unknowingly be part of a residential proxy network. By monitoring endpoints for signs of malware or unexpected network traffic, organisations can detect and isolate infected devices before they are used in cyber attacks.

Advanced Machine Learning Models

Advanced machine learning models can be trained to recognise subtle signs of proxy misuse. By analysing large datasets of network traffic, these models can identify patterns that human analysts might miss. This includes detecting sophisticated attempts to mimic legitimate user behaviour through proxies, which could indicate reconnaissance or data exfiltration efforts.

Collaboration and Information Sharing

Collaboration and information sharing among organisations and cybersecurity entities can improve detection of malicious proxy use. Sharing indicators of compromise (IoCs) and tactics, techniques, and procedures (TTPs) associated with proxy misuse can help develop stronger detection strategies across the board.

Incorporating these detection methods, informed by MITRE ATT&CK, helps organisations identify and mitigate risks associated with malicious residential proxy use. The goal is not to label every proxy request as hostile, but to detect the patterns that matter when residential proxies are exploited for cyber attacks.

Residential Proxy Detection

2023-05-17T13:00:00+10:00

Residential proxies are under increasing scrutiny, both for how their IP addresses are obtained and for how those networks are used. They also expose how heavily many online services rely on GeoIP data, from content customisation to security controls.

That scrutiny reveals a complicated reality. Residential proxies can help businesses, researchers, and individuals preserve anonymity or work around GeoIP-dependent restrictions. The same properties also create ethical problems, particularly when the networks are misused.

This article explains what residential proxies are, how they work, where they are useful, and where the risk sits. The same properties that make them attractive for legitimate monitoring and research also make them useful for abuse.

Demystifying Residential Proxies

These proxies connect automated software to the internet through IP addresses tied to real-world residential locations. That lets the software look closer to ordinary internet usage, which can help it bypass geographical and network restrictions while adding a layer of anonymity.

Residential proxies need a clear legal and ethical distinction. Their use can be lawful, including for web scraping and data gathering, while still enabling activity that may breach the intended usage policies of some online services. This could include mass consumption of data intended for general use, such as scraping websites for machine learning datasets. These actions may not be strictly illegal, but they raise substantial ethical questions and are often unwelcome to the data providers.

Applications of Residential Proxies

The defining characteristic of residential proxies is that requests can appear to originate from local residential networks. That supports a wide range of use cases, including:

Concealing True IP Addresses: Residential proxies allow third parties to hide genuine IP addresses and location, making identity and origin harder to determine. By routing internet traffic through residential IP addresses, they can evade detection, bypass security rules, and access geo-restricted content.
Research and Monitoring: Residential proxies are often used by researchers, analysts, and market intelligence professionals to gather data and monitor online activity. By utilising residential IP addresses, they can emulate real user IP addresses and bypass restrictions.
Web Scraping and Data Gathering: Residential proxies are central to many web scraping and large-scale data collection workflows. With the capacity to rotate IP addresses and access a wide range of residential locations, third parties can scrape valuable data from websites without triggering anti-scraping measures. Residential proxies can make data scraping more discreet, with fewer access interruptions and cleaner collection results.
Ad Verification: Residential proxies are widely used for ad verification. Ad verification companies utilise residential IP addresses to confirm the accuracy and legitimacy of online advertisements. By mimicking genuine residential connections, they can check that ads are correctly displayed and monitor the performance and integrity of advertising campaigns.
Ad Fraud: Residential proxies can also be misused for ad fraud. Competitors or their agents may utilise residential IP addresses to falsely inflate the views of a rival's online advertisements. By using genuine residential connections, these entities can manipulate advertising metrics, compromising the accuracy and integrity of the ad's performance data. This abuse of residential proxies for ad fraud poses a significant concern for the online advertising industry.
Last Mile Monitoring: Last mile monitoring is another application for residential proxies, allowing companies to assess the user experience from a residential viewpoint. By using residential IP addresses, they can monitor website loading speeds, test service availability, and evaluate the performance of online platforms more accurately. This helps organisations pinpoint and rectify issues that may negatively affect user satisfaction.

Navigating the Risks and Concerns

Residential proxies create material risks, particularly when users are unaware that they are hosting one. Their use can introduce practical limits and security vulnerabilities that are easy to miss.

Despite their valid uses, residential proxies can be used for cybercriminal activity. Malicious actors may exploit them for account takeovers, fraud, or other targeted attacks.

Using residential proxies without the knowledge or consent of residential users creates serious security issues. These users, unaware of how their connections are being utilised, could face legal exposure, compromised privacy, and cyber threats. Their devices could unwittingly participate in malicious activity, leaving them exposed to legal consequences and reputational damage.

Exploring the Creation of Residential Proxies and their Implications

Residential proxy providers build their networks in several ways, some of which can have significant security implications.

Providers can obtain residential proxies through partnerships with Internet Service Providers (ISPs) or by leasing IP addresses from legitimate residential users. At the same time, some providers or private groups may use questionable practices to obtain residential proxies.

SDKs: Certain applications may include Software Development Kits (SDKs) that gather and sell user data, including their IP addresses. In some instances, these SDKs can be exploited by residential proxy providers to acquire residential IPs without the explicit consent or knowledge of the users.
Malware Exploitation: Malware, including botnets, can infiltrate the devices of unsuspecting residential users. Attackers may then exploit these infected devices as part of a broader residential proxy network, without user awareness. This unauthorised use of residential IPs poses significant security threats to both the affected users and the wider internet ecosystem.
Free VPN Services: Some free Virtual Private Network (VPN) services, which promise anonymity and privacy, may use users' connections as part of their residential proxy networks. Users unknowingly become exit nodes for other users' internet traffic, potentially exposing their connections to malicious activities.

Using residential proxies without the knowledge or consent of residential users raises serious security concerns. These users may not understand how their connections are being used, which can lead to legal consequences, compromised privacy, and exposure to cyber threats. Their devices might unknowingly participate in malicious activities, exposing them to potential legal consequences and reputational damage.

The Birth of 'Ethical' Proxies

An important part of the residential proxy discussion is the rise of providers claiming that their IP address pools are ethically sourced. These companies argue that they have obtained the consent of the original IP owners and provide transparency in how these connections are utilised. By positioning themselves as 'ethical' residential proxy providers, they aim to mitigate the associated risks and concerns.

Even where consent is obtained, the potential for misuse remains a significant issue. This is largely due to the inherent anonymity of residential proxies and the difficulty of tracing activity back to the original user. Despite claims of ethical sourcing, the complexity and opacity of the residential proxy environment mean that it remains a grey area, inviting scepticism and demanding further scrutiny.

The result is a nuanced market that consumers, providers, and regulators need to understand as the digital landscape continues to evolve.

From Hola VPN to the Camaro Dragon

Several publicised incidents show how residential proxies are formed and the impact they have had on the industry and users. These examples show the different ways residential proxies can be created and used, legitimately and otherwise.

Hola VPN is a well-known free VPN service that promises privacy, security, and access to blocked content. However, it fell under scrutiny when it was revealed that it was selling its users' bandwidth to its sister company, Luminati, which operates a residential proxy network. Users of Hola VPN unknowingly became part of a residential proxy network, with their connections being utilised by third parties. This raised significant ethical and security concerns, as users' devices could be implicated in illegal activities carried out using their IP addresses.
The residential proxy service known as 911 has been selling access to hundreds of thousands of Microsoft Windows computers for the past seven years. This service enables customers to route their internet traffic through these computers, allowing them to appear as if they are browsing from any country or city around the world. While 911 claims that its network comprises users who voluntarily install its "free VPN" software, recent research indicates that the proxy service has a history of obtaining installations through questionable "pay-per-install" affiliate marketing schemes, some of which were operated by 911 itself. The service primarily targets users in the United States but has a global user base. Residential proxy networks like 911 can serve legitimate business purposes, but they are often abused for cybercriminal activities due to the difficulty in tracing malicious traffic back to its source.
Cybercriminals are increasingly leveraging residential broadband and wireless data connections to anonymise their malicious traffic. One notable type of network, referred to as "bulletproof residential VPN services", has gained attention. These networks are constructed by acquiring discrete blocks of internet addresses from major internet service providers (ISPs) and mobile data providers. An investigation into one such company, Residential Networking Solutions LLC (also known as Resnet), unveiled that it had obtained a significant number of IP addresses, some of which were previously controlled by AT&T Mobility. Resnet leased these IP addresses, enabling it to resell data services for major providers such as AT&T, Verizon, and Comcast Cable. However, the precise nature of the relationship between Resnet and AT&T remains unclear, and the matter has been referred to law enforcement. Cases like this emphasise the potential abuse of IP addresses within residential proxy networks.
Infatica.io, a Singapore-based company, has developed a network of over 10 million web browsers that clients can rent to conceal their true internet addresses. The company achieved this by compensating browser extension developers to incorporate its code into their extensions. Many extension developers struggle to earn fair compensation for their work, making offers like these enticing. Infatica seeks extensions with at least 50,000 users and offers to pay developers between $15 and $45 per month for every 1,000 active users with the code included in their extensions. Infatica's code routes web traffic through the browsers of extension users, providing anonymity to the company's customers. The service's pricing depends on the volume of web traffic a customer wishes to anonymise. However, this approach raises concerns about privacy and the potential misuse of users' browsers for malicious activities. Developers, particularly those who author free software, can find the monetisation opportunity offered by residential proxies extremely tempting. The potential to earn revenue from their existing user base by incorporating such code into their extensions can present a persuasive proposition.
Camaro Dragon, a form of malware, provides a recent example of residential proxies being acquired through malicious means. This malware infects the devices of unsuspecting users, forming a botnet that can then be utilised as a residential proxy network. Infected devices can then be exploited for various cybercriminal activities without the knowledge or consent of the device owners. This example highlights the significant cybersecurity risks associated with residential proxies and emphasises the importance of robust protection measures.
Volt Typhoon is a state-sponsored actor based in China that typically focuses on espionage and information gathering. Volt Typhoon proxies all its network traffic to its targets through compromised SOHO network edge devices (including routers). Microsoft has confirmed that many of the devices, which include those manufactured by ASUS, Cisco, D-Link, NETGEAR, and Zyxel, allow the owner to expose HTTP or SSH management interfaces to the internet. Volt Typhoon has been active since mid-2021 and has targeted critical infrastructure organisations in Guam and elsewhere in the United States.

These examples illustrate the ethical, security, and legal issues surrounding residential proxies. They put transparency and consent at the centre of how proxy networks are acquired and used. The implications for users, the security industry, and the broader digital landscape are substantial, which is why regulation, user education, and responsible practices matter for protecting privacy, security, and the integrity of the internet.

Legal Consequences of Residential Proxies in Data Scraping Operations

Residential proxies are a concern because of their potential for misuse and their legal implications. Two notable cases, the Ticketmaster Case and the Meta vs Bright Data Case, have drawn attention to the challenges posed by the unauthorised use of residential proxies in commercial settings and data scraping operations. These cases show why the legal ramifications of residential proxy use need to be understood in real-world scenarios.

The Ticketmaster Case: In 2018, a major international case came to light when Ticketmaster sued Prestige Entertainment for using residential proxies to circumvent ticket-purchasing limits and scoop up large numbers of tickets for resale. This case underscores the potential misuse of residential proxies in commercial settings, and how they can be used to breach the terms of service of websites.
The Meta vs Bright Data Case: The legal case between Meta Platforms, Inc. (formerly Facebook) and Bright Data Ltd. demonstrates a contentious and potentially unlawful use of residential proxies in the real world. In this case, Meta accused Bright Data of operating a business designed to use automated software to scrape and sell data from various online platforms, including Facebook and Instagram. This scraping was allegedly facilitated using unauthorised tools and services that bypassed detection by Meta's security measures. Despite Meta's efforts to halt these activities, Bright Data purportedly continued its operations. The data involved included user profiles, follower counts, and shared posts. Bright Data was alleged to not only scrape this information but also advertised the sale of the scraped data. The scope of this operation was extensive, with the Instagram data set alone priced at $860,000.

These cases show how residential proxies are used in practice, the challenges they present, and why their use remains legally and commercially contested.

The Wider Implications for the Security Industry

The growth of residential proxies, and the way some networks are acquired, has broader implications for the security industry. It raises questions about transparency, ethical practices, and the responsibility of proxy providers.

Ethical and Regulatory Implications: The questionable practices some providers use to acquire residential proxies highlight the need for stronger regulation and industry standards. This would help ensure that residential proxies are obtained and used in a lawful and ethical manner, protecting users' privacy and the wider internet ecosystem. There is a clear demand for more transparency in how these services operate and procure their proxies.
Cybersecurity Implications: Residential proxies can enable malicious cyber activity, ranging from fraud to targeted attacks. This can increase the need for cybersecurity measures and protections, potentially reshaping strategies and priorities within the cybersecurity industry.
Legal and Reputational Implications: If individuals unknowingly become part of a proxy network, there could be legal repercussions for them if their connections are utilised for malicious activities. This could lead to greater scrutiny and liability for companies operating within this space.
State Actors and Residential Proxy Networks: State-sponsored actors have been known to establish their own residential proxy networks within foreign countries for various campaigns, including information warfare, disinformation campaigns, and surveillance, adds another layer of complexity to the issue. These activities pose significant geopolitical and security risks, requiring increased international cooperation and robust defence mechanisms.

The rise of residential proxies exposes a weakness in common security models: the assumption that residential and mobile IPs are inherently more trustworthy, and that GeoIP is a reliable reputation or security control. Widespread proxy use has shown how brittle that assumption can be.

Uncertain or unethical sourcing makes that trust problem harder. It can make online interactions less reliable and introduce security risks.

Residential proxies are not just tools; they highlight a deeper issue in how we approach digital access and security. Understanding what is already known, questioning current practices, and building better controls are practical steps towards using residential proxies responsibly and ethically. Recognising the false sense of security GeoIP restrictions can provide is part of that work.

Part 1 ends here. In Part 2: the Camaro Dragon malware, we look more closely at a specific case. This sophisticated malware uses residential proxies in a way that shows their potential for misuse. The next article covers how Camaro Dragon works, its impact on cybersecurity, and practical protection measures.

Mi, X., Tang, S., Li, Z., Liao, X., Qian, F., & Wang, X. (2021). Our Phone is My Proxy: Detecting and Understanding Mobile Proxy Networks. Retrieved from https://xianghang.me/files/ndss21_mobile_proxy.pdf ↩
Mi, X., Feng, X., Liao, X., Liu, B., Wang, X., Qian, F., Li, Z., Alrwais, S., Sun, L., & Liu, Y. (2019). Resident Evil: Understanding Residential IP Proxy as a Dark Service. Retrieved from https://www-users.cse.umn.edu/~fengqian/paper/rpaas_sp19.pdf ↩
Krebs, B. (2019, August 19). The Rise of "Bulletproof" Residential Networks. Retrieved from https://krebsonsecurity.com/2019/08/the-rise-of-bulletproof-residential-networks/ ↩
Krebs, B. (2022, July 18). A Deep Dive Into the Residential Proxy Service '911'. Retrieved from https://krebsonsecurity.com/2022/07/a-deep-dive-into-the-residential-proxy-service-911/ ↩
Krebs, B. (2021, March 1). Is Your Browser Extension a Botnet Backdoor? Retrieved from https://krebsonsecurity.com/2021/03/is-your-browser-extension-a-botnet-backdoor/ ↩
Meta Platforms, Inc. v. Bright Data Ltd. Retrieved from https://unicourt.com/case/pc-db5-meta-platforms-inc-v-bright-data-ltd-1374026 ↩
Volt Typhoon targets US critical infrastructure with living-off-the-land techniques. Retrieved from https://www.microsoft.com/en-us/security/blog/2023/05/24/volt-typhoon-targets-us-critical-infrastructure-with-living-off-the-land-techniques/ ↩

When Bots Break Bad

2023-05-16T13:00:00+10:00

Bots account for a large share of web traffic. Recent studies put automated traffic at nearly 50% of all internet requests. Some bots are useful, such as search engine crawlers that index your site. Some are clearly harmful, such as scrapers and sneaker bots. Others sit in a grey area, including backlink and marketing bots from services such as Ahrefs and SEMrush. Even useful bots can create problems when they crawl too hard. This article looks at the main bot types and how to manage them with robots.txt and bot management tools.

Understanding the Different Types of Bots

'Good Bots'

Good bots perform legitimate work. Search engine crawlers like Googlebot and Bingbot index webpages so search results can stay current and relevant. Other examples include uptime and performance monitoring bots.

'Bad Bots'

Bad bots harm websites, users, or both. Common examples include:

Scraping content, copying and repurposing data from websites.
Sneaker bots, automatically purchasing limited-edition products (like sneakers) before human users can.
Spam bots, posting unsolicited messages and advertisements in comment sections or forums.
Vulnerability Scanners, trying thousands of website URLs to find security vulnerabilities.
Account Takeover, attempting to gain access to existing user/admin accounts using either credential stuffing or brute-force attacks.

'Grey Bots'

Grey bots sit between good and bad. They often serve a useful purpose and may follow crawling directives in robots.txt, but they can still cause problems when they crawl too aggressively. Common examples include:

AhrefsBot: A backlink analysis bot used by Ahrefs, an SEO tool.
SEMrushBot: A bot used by SEMrush, another popular SEO and digital marketing tool.
MJ12bot: A bot used by Majestic, a service that provides backlink data and analysis.
ScreamingFrog: An SEO analyser run from a local desktop.

When Grey bots (and even Good Bots) go bad.

Left unattended, grey bots can create practical problems:

Slow page loading times, which affect user experience.
Strain on server resources, potentially causing crashes, downtime, and higher costs.
Distorted website analytics, when bot traffic is mistaken for human traffic.

Managing Grey Bots with Robots.txt

The robots.txt file is a simple text file that tells web crawlers which parts of your site they can or cannot access. You can use it to manage bot behaviour and protect your website from aggressive crawling. Useful controls include:

Disallowing specific bots: You can block specific bots from accessing your site by adding a "User-agent" and "Disallow" directive to your robots.txt file. For example:

User-agent: AhrefsBot
Disallow: /

Limiting crawl rate: You can ask bots to slow down their crawling by adding a "Crawl-delay" directive:

User-agent: SEMrushBot
Crawl-delay: 10

Not all bots will follow robots.txt. ScreamingFrog, for example, can be instructed to ignore robots.txt and crawl a site as quickly as possible. You would not want a competitor doing this to your site.

Bot Management Tools

In addition to robots.txt, bot management tools (like those provided by Peakhour) can protect your website from abusive bots. Good bot management tools automatically block most unwanted traffic using a combination of Threat Intelligence, Fingerprinting techniques, Reverse DNS verification, and Header Inspection.

Advanced techniques like rate limiting and machine learning can help identify more sophisticated bad bots.

Search Bots and Double Crawling

Search bots like Bingbot can sometimes blindly follow links and crawl the same page multiple times due to different URL parameters. This double, triple, or worse crawling can increase server load and make indexing less efficient. eCommerce sites are especially exposed because product catalogues often have several filtering paths. We've seen Bing go haywire on a number of sites. Most recently, it was issuing around 50,000 requests per day to the search function of a Magento 2 store while cycling through parameters. This dropped to 2-3k requests per day when fixed. On another store, Bing was responsible for nearly half of all page requests (40k page requests) on a busy OpenCart store. Configuring it to ignore parameters dropped this to around 4k per day.

Configuring Search Bots to Ignore Query Parameters

Note: Since publishing both Google and Bing have removed the ability to ignore parameters when crawling via their webmaster/search console tools. See using robots.txt to instruct search engines to ignore query string parameters

To help search bots crawl your site efficiently, you can configure them to ignore specific query parameters. Use these methods:

Configuring Bing Webmaster Tools

Bing Webmaster Tools provides an option to specify URL parameters that should be ignored during the crawling process. To configure this setting, follow these steps:

Sign in to your Bing Webmaster Tools account and select the website you want to manage.
Navigate to the "Configure My Site" section and click on "URL Parameters."
Click on "Add Parameter" and enter the parameter name you want Bingbot to ignore.
Select "Ignore this parameter" from the dropdown menu and click on "Save."
Configuring Bing Webmaster Tools this way helps stop Bingbot double crawling pages with specific URL parameters, reducing server load and improving indexing efficiency.

Managing Other Search Bots

For other search engines like Google, use the relevant webmaster tools to manage URL parameters. In Google Search Console, follow these steps:

Sign in to your Google Search Console account and select the property you want to manage.
Navigate to the "Crawl" section and click on "URL Parameters."
Click on "Add Parameter" and enter the parameter name you want Googlebot to ignore.
Choose "No URLs" from the "Does this parameter change page content seen by the user?" dropdown menu.
Click on "Save."
Specifying the parameters you want search bots to ignore can prevent double crawling and make indexing more efficient.

Final Thoughts

When good or grey bots crawl too aggressively, they can cause the same operational problems as malicious bots: overloaded servers, slower pages, and worse user experience. Monitor website traffic and server load, set clear robots.txt rules, and use the major search engines' webmaster tools to control inefficient crawling. Done properly, this improves website performance and can lower infrastructure costs.

Double MAD?

2023-05-15T13:00:00+10:00

This article explores the use of Double Median Absolute Deviation (Double MAD) for [anomaly detection](/learning/threat-detection/what-is-anomaly-detection/) in time series
data, particularly in skewed or non-symmetric distributions. Double MAD, which calculates two median absolute
deviations — one for data below the median and one for data above — provides a more nuanced approach than traditional
MAD, allowing for accurate detection of anomalies even in skewed data distributions. We also delve into its application
in identifying slow abuse, like bots, by catching lower range anomalies. However, it's important to note Double MAD's
limitations such as not capturing seasonal data shape and trends over time. A comparison is also drawn with the Z-score
method, highlighting that the choice between the two depends on the nature of your data. The article provides insights
into the practical implementation of Double MAD and its potential to improve your data analysis toolkit.

Operational systems increasingly rely on time-series data for decisions. Anomaly detection is one practical use: by identifying patterns that deviate from the norm, businesses can investigate potential issues early or understand unexpected opportunities.

One useful technique for anomaly detection is the Median Absolute Deviation (MAD) and, more specifically, its extension, the Double MAD. This article explains where Double MAD fits in time-series anomaly detection and how it can help identify anomalous clients.

Understanding MAD and Double MAD

MAD, a robust measure of variability, is less susceptible to outliers than standard deviation. It calculates the median of absolute deviations from the data's median, often providing a better representation of 'normal' behaviour in datasets with skewed distributions or outliers.

Double MAD is an extension of MAD, where two MADs are calculated — one for the data below the median and another for the data above. This split gives the detection process a better fit for asymmetric data, which is common in real-world time series data.

Why Double MAD?

While MAD provides a robust way to understand the 'normal' range of a dataset, it assumes a symmetric distribution of data around the median, which may not always hold true. Double MAD is useful where that assumption breaks down, offering an improved anomaly detection process for skewed or asymmetric datasets.

In time-series analysis, especially with 24-hour cycles like web traffic or server usage, patterns can exhibit seasonality and trend components. These patterns can often be asymmetric, making Double MAD a valuable tool for capturing the variability in different parts of the data.

Using Double MAD in Anomaly Detection

The Double MAD implementation provided uses Rust, a system programming language known for speed and memory safety. The code calculates the lower and upper MAD values, along with their respective thresholds. Anomalies can then be detected by comparing each data point to these thresholds.

An anomaly is defined as a data point that deviates significantly from the expected range. If a data point falls below the lower MAD threshold or above the upper one, it can be flagged as an anomaly. This approach is especially effective when handling datasets with high variability or extreme values.

Double MAD for Anomalous Client Detection

Beyond time-series data, Double MAD can also be instrumental in identifying anomalous behaviour among clients. By comparing each client's behaviour against the Double MAD of the time-series data, teams can pinpoint clients that deviate from the norm.

For instance, in the context of web service usage, an anomalous client might be one that is sending an unusually high or low number of requests. By using Double MAD, you can flag such outliers and take appropriate action, such as investigating potential misuse or reaching out to understand and address any issues they may be facing.

Detecting Lower-Range Anomalies: A Case of Slow Abuse

An interesting application of Double MAD is in detecting lower-range anomalies, a pattern often associated with slow abuse such as bots or Distributed Denial of Service (DDoS) attacks. These abuses are characterised by an unusually low frequency of activity that is consistent over a prolonged period. This consistent, low-level activity can fly under the radar of typical anomaly detection systems.

By setting a lower MAD threshold, Double MAD can effectively detect these lower-range anomalies, providing early warning of slow abuse. Its ability to detect both high and low anomalies makes Double MAD a flexible tool for anomaly detection.

The Math Behind Double MAD

To illustrate the power of Double MAD, let's consider a dataset from a right-skewed distribution. Applying the conventional MAD approach might lead to false positives where normal data points are marked as outliers. This is because MAD uses a symmetric interval around the median, which doesn't account for the skewed nature of our data.

With Double MAD, we instead calculate two MADs — one for the data below the median (MAD-lower) and another for the data above (MAD-upper). Outlier thresholds are then defined using these two MADs. The lower threshold is calculated as the median minus a multiplier (k) times MAD-lower. The upper threshold is the median plus k times MAD-upper.

This approach takes into account the asymmetric nature of our data, providing more accurate anomaly detection. For example, in a right-skewed distribution, Double MAD would correctly identify only the extreme right tail values as outliers without incorrectly flagging data points on the left tail.

Wrapping Up

Accurate anomaly detection matters when teams rely on time-series data to operate and investigate systems. The Double MAD approach provides a robust method for this, allowing businesses to better understand their data, spot potential issues early, and make more informed decisions.

Whether you're monitoring web traffic, server usage, or client behaviour, leveraging Double MAD can offer valuable insights and help ensure your operations continue to run smoothly. The ability to detect both high and low anomalies makes it especially powerful, providing protection against potential threats like slow abuse.

Understanding and implementing Double MAD gives your data analysis toolkit a more complete view of asymmetric data and helps you detect potential anomalies earlier.

Efficiently Generating and Printing All IPv4 Addresses in a Random Order

2023-05-15T13:00:00+10:00

In this article, we explored an efficient way to generate and print all possible IPv4 addresses in a random order using
a Linear Congruential Generator (LCG). The LCG, a pseudorandom number generator, helps generate the full range of IP
addresses without consuming vast amounts of memory, making this approach suitable for systems with memory constraints.
We also provided a Python script demonstrating the concept, along with a test case to verify its correctness.

We then delved into the importance of randomising IP addresses, highlighting its critical role in areas like security
testing, load balancing, enhancing privacy, and web scraping. However, while using this technique, it's essential to
respect privacy and legality, as misuse can lead to legal repercussions.

In summary, the ability to generate and print all IPv4 addresses in a random order is a powerful tool, especially in the
realm of networking and cybersecurity, and can be achieved efficiently using the LCG approach.

In networking, some tasks require generating and printing every possible IPv4 address. Doing that in random order without a large memory footprint is less straightforward. The IPv4 address space contains 2^32, or 4,294,967,296, values. Storing all of them in memory at once is not feasible for most systems.

This article uses a Linear Congruential Generator (LCG) to generate the full range without holding it in memory.

Linear Congruential Generator

A Linear Congruential Generator is a type of pseudorandom number generator that can run without storing the whole sequence. It generates each next value from a linear equation based on the previous value. The basic form of the LCG is:

X_(n+1) = (a*X_n + c) mod m

Here, a, c, and m are constants, and X_n is the nth number in the sequence. The initial seed or starting point of the sequence is X_0.

If we choose parameters such that the period of the LCG is maximum (equal to the modulus), and the modulus equals the range of numbers we're generating (the number of possible IPv4 addresses in this case), then the LCG should generate each number in the range exactly once before repeating.

Here is that idea in Python:

import ipaddress

def lcg(modulus, a, c, seed):
    """Linear congruential generator."""
    while True:
        seed = (a * seed + c) % modulus
        yield seed

start_ip_str = '0.0.0.0'
end_ip_str = '255.255.255.255'

start_ip = int(ipaddress.IPv4Address(start_ip_str))
end_ip = int(ipaddress.IPv4Address(end_ip_str))

modulus = end_ip - start_ip + 1
a = 1664525
c = 1013904223
seed = 1  # Arbitrary seed

generator = lcg(modulus, a, c, seed)

for _ in range(modulus):
    ip_int = start_ip + next(generator)
    ip = ipaddress.IPv4Address(ip_int)
    print(ip)

The script first defines the parameters of the LCG. a, c, and seed are set to values used in Numerical Recipes' LCG, a well-known and widely used LCG. The modulus is set to the total number of possible IPv4 addresses.

The function lcg() is implemented as a Python generator, yielding the next number in the sequence each time it is called.

The loop then generates and prints each IP address. It adds the output of the LCG to the start IP address, converts it back to an IP address string, and prints it.

This script generates and prints each IP address in random (more precisely, pseudorandom) order using very little memory. Each IP address is printed exactly once, assuming the period of the LCG is maximum.

The point is that a small pseudorandom number generator can walk a large range without materialising the whole list. The code can still be tweaked and optimised for specific requirements and constraints.

The Importance of Randomising IP Addresses

Randomising IP addresses has practical uses in several networking workflows:

1. Security Testing and Penetration Testing

In cybersecurity, randomising IP addresses can help simulate attacks on a network from various sources. By using a range of IP addresses in no particular order, penetration testers can mimic the unpredictable nature of real-world cyber threats and build more robust test scenarios.

2. Load Balancing and Network Traffic Simulation

Randomising IP addresses is also useful in network traffic simulations. Network engineers and administrators can use this approach to test network resilience and capacity. By sending requests to servers from randomised IP addresses, they can evaluate how well their load balancing strategies are functioning and whether the network can handle high traffic loads from various sources.

3. Anonymity and Privacy

In some cases, randomising IP addresses can help with privacy and anonymity. While it is not a foolproof method, using a different IP address for each request can make it more challenging for online trackers to monitor user activity. It is a common practice among privacy-focused internet users and is also used in some VPN (Virtual Private Network) services.

4. Web Scraping

Web scraping is another area where randomising IP addresses is useful. To prevent being blocked by anti-bot measures, web scrapers often need to rotate their IP addresses. By using a different IP address for each request, they can avoid being detected and blocked by the sites they are scraping.

Randomising IP addresses can be useful in these cases, but privacy and legality still matter. Unauthorised network scanning, privacy breaches, and cyberattacks are illegal and punishable under law.

Generating and printing every possible IPv4 address in a random order is a valuable technique with various applications, from network testing to privacy enhancement. With the Linear Congruential Generator approach, we can do it efficiently.

Origin shield

2022-06-10T13:00:00+10:00

CDN providers often promote the size of their network, and how many Points of Presence (POPs) they have. Higher capacity, more resilient networks are useful from a security point of view (think DDoS attacks), but more POPs can also work against what the CDN was designed to do: take load off an origin and improve performance for end users.

The POP Problem

Modern CDNs are what's called 'Pull' CDNs. That means the CDN won't store content/resources until a user requests it. The first time a user requests a resource, it goes to the CDN POP, checks its local cache, gets a miss, and then passes the request through to origin. As the resource is returned, the CDN stores a copy for the next time someone wants it. If your CDN has 100 POPs, then this process has to be repeated 100 times to fully 'warm' the CDN for that specific resource. That's 100 requests to origin. The more POPs your CDN has, the more likely you are to get a miss and hit the origin.

When the caches at POPs are fully populated, the effect on your application can be minimal. During a cache MISS event, typically either due to resource expiration or a manual purge, many requests can be sent to the origin server concurrently while the individual POPs rebuild their caches. The more POPs, the longer the process takes.

This can be a problem, especially when caching dynamic pages that need to be server side rendered, large resources, or transformed resources. For example, take a busy ecommerce store running Magento during a sale, Magento will purge content when sales are made, forcing the cache to rebuild each time. During a busy period it can reduce your cache hit rate and degrade site performance.

Enter Origin Shield

CDN Origin Shield is a feature that lets you nominate the CDN Point of Presence closest to your server as a shield. All requests that hit other POPs and receive a cache miss will then go to the nominated shield before hitting the origin. The shield becomes a 'super cache' and can reduce the amount of requests to your origin in a cache miss.

Peakhour.IO implements origin shield as a simple dropdown on an origin pool where you can select the geographic location that should be used as a shield. Requests to your origin are now routed through this geographic location before reaching your origin in a cache miss scenario.

Clients who use multiple geographic origins can also benefit from Origin shield. Peakhour.IO allows the specification of an origin shield per origin. For geographic load balancing, you will need to contact support for setup.

Seeing is believing

The Peakhour.IO summary now includes your edge CHR, your shield CHR and your overall CHR so that you can see the effect in action.

Some of our clients have seen typical increases of 10-20% of their overall Cache Hit Rate, and greater than 40% when specifically looking at often flushed dynamic content.
Quicker cache convergence
Fewer hits to origin
Better end-user experience
Higher conversions

Final Thoughts

Origin Shield is an important feature for certain types of site, or when you're looking to maximise your cache hit rate. CMSs that offer built in full page caching, like Magento and Drupal, flush content often, and are susceptible to performance degradation as load increases. Minimising hits to the origin in these cases is vital.

If you are interested in getting more out of your CDN, need a bespoke CDN solution, or need a provider that offers performance, optimisation and security services, reach out to discuss the right setup.

Origin shield with request collapsing helps minimise origin hits, improve CHR and maintain user experience for your web application.

Fastly Outage

2021-06-09T13:00:00+10:00

You may have heard that Fastly, one of the world’s largest providers of CDN services, had an outage of about 1 hour on the 8th July. Some of the world's largest websites and services were down, including reddit, CNN, The Guardian, Shopify Stores, Stripe and Spotify, to name a few.

According to Fastly themselves, the outage was caused by a 'service misconfiguration' (Update: Bug triggered by a client changing their configuration), which propagated globally and took websites offline. When users tried to access a website using the Fastly service, they were presented with a Varnish 503 Guru Meditation error (for those of us old enough to remember, Guru Meditation is a geek reference to the Commodore Amiga computer of the late 80s!). This generally occurs when there is an issue contacting the server that the website is actually hosted on. There were also some reports on twitter saying 'unknown domain'.

Essentially, Fastly took down its own network with a bad software update. Similar problems have affected other online platforms in the recent past, including Google, Amazon, and Cloudflare.

Why wasn’t there a Plan B?

Fastly is an excellent service, with an enviable reliability record. There is a reason why they're trusted by some of the world's largest websites to improve reliability and load times. However, the vast majority of Fastly clients still had to sit tight and wait for Fastly to fix the issue. Luckily this was only an hour. It could have been much longer.

Just like death and taxes, software outages are a certainty. The real story is not that Fastly had an outage. It is why didn't these large websites have a contingency plan for a single point of failure. For sites at that scale, this is a major oversight in infrastructure planning.

How to handle a CDN failure

The simple solution is to have a backup CDN provider already configured and tested, ready to switch over to if your primary provider fails. You can then utilise short expiry of DNS records to redirect users when the failure happens. This needn't be very expensive or complicated, although individual circumstances vary.

A Quick Introduction To DNS (Domain Name System)

Modern CDNs, like Fastly, Cloudflare, and Peakhour, operate as ‘reverse proxies’. This means they sit between a website's end users and the website server itself. They achieve this through DNS configuration.

When someone types a domain url into a browser, eg fastly.com, a request is sent to a DNS server with the host name (eg fastly.com) to find the IP address of the server to retrieve the content from. CDNs, like Fastly, get website admins to list the address of the CDN on the DNS server. That means requests for a website go through the CDN first. The process is analogous to listing someone else’s number in the phone book so they take calls for you.

The DNS server has a TTL (Time To Live) associated with its records. This TTL tells whoever asked for an IP address, for a given hostname, to remember the answer and not ask again until after the TTL has passed. Typically DNS record TTLs will be 1 hour, but they can be shorter, eg 1 minute.

Switching providers in case of an outage

By keeping a short TTL in DNS, webmasters can switch the answer for a DNS request to that of another provider, meaning users can quickly be directed to an alternative Cloud Provider. Once service has resumed on the primary provider, DNS can be switched again so normal traffic is resumed. The key is that the alternative provider is configured, tested, and ready to go.

This switch can even be automated to minimise outages. Premium DNS services, like Amazon’s Route 53, have optional health checking of DNS answers. This allows a switch to happen nearly instantly. The only downtime would be for people already on the site who have to wait for the TTL to expire before being directed to the backup Cloud Provider. In fact this is exactly what Peakhour.io does. In the event of a catastrophic outage we use DNS to switch to backup infrastructure so our clients are minimally affected.

Backup provider options

Now we've shown how switching CDN providers can be done, let's compare the major players and how they might serve as a backup CDN for Fastly. The three things we'll look at are Cost, Features, Integration.

Simply route traffic to the origin

This would be the simplest and most cost effective option, Assuming your origin server can handle the increased load that removing its CDN would entail. It also assumes that it's ok to lose any features that you may have been relying on, eg load balancing, WAF, edge scripting, image optimisation etc.

Cloudflare

Many people use Fastly because it uses Varnish, a richly featured, programmable cache with several advanced features. If you rely on those features, eg cache tags, cache on cookie value, custom cache tags, then you have to be on Cloudflare's top plan, which is not cheap.

The other major drawback of Cloudflare is that, unless you are on the most expensive plans, you have to cede control of DNS to them by delegating your domain. Cloudflare DNS is a great service, however it has the major drawback of caching negative DNS requests for an hour. If you were switching from an A record to a CNAME record or vice versa, you could be down for an hour regardless. Not ideal.

Akamai

Akamai has a highly respected, fully featured, and very expensive product. Maintaining a backup option with them will run into the $1000s a month. Only you can decide whether it’s worth it.

Cloudfront

Amazon's CDN offering is the third of the big three alternatives. Since it uses volume based billing, it could be an attractive CDN option as a standby, as long as you don't mind missing out on cache by tag (sorry Magento and Drupal). It is also complicated to configure for dynamic content and could miss features that you need. In fact most people use Cloudfront for static content, eg images, CSS, etc and run a Varnish instance within AWS to provide easier to configure full page caching.

This is what the BBC did with the Fastly outage. They had their backup infrastructure on Cloudfront and, as of time of writing, hadn't switched back to Fastly.

Peakhour.io

Peakhour is also volume based billing with a minimum monthly charge of $20. We provide all the advanced caching features that Fastly does, as well as WAF and image optimisation as standard, all in the one service fee. We don't require you to cede control of DNS to us and we're Australian owned and based.

Final Thoughts

CDNs, no matter how big, can fail. If your website is important then it needs a Plan B. This is how that Plan B works, and it doesn't have to be expensive when using a provider like Peakhour.io.

The important part is having it configured and tested before you need it.

Setting Up A Chia Hobby Farm

2021-04-30T13:00:00+10:00

Here at Peakhour, when we're not making websites faster and more secure, we like new tech and we like a good scheme. We ran Seti@home while at uni, and mined some bitcoin back in its early days (unfortunately we don’t have them anymore). Just recently we decided to set up a Chia farm, not the super-food Chia, but the new crypto coin Chia!

What is Chia?

Chia is not just a cryptocurrency; it is a brand new blockchain and smart transaction platform that implements the first new Nakamoto consensus algorithm since Bitcoin. It was invented by the engineer behind BitTorrent, Bram Cohen, who set out to address the shortcomings of Bitcoin.

The Chia network is set to officially launch on May 3rd, and the crypto world is going crazy getting ready.

I thought Bitcoin was great, what’s wrong with it?

The major flaws that Chia sets out to address are:

The environmental impact

Without getting too technical, Bitcoin relies on very intensive computations to verify transactions (Proof of work). These computations are carried out by 'miners' who are rewarded for their efforts from an ever decreasing pool of possible bitcoin. As the blockchain gets older, the verification gets harder, and as a result the Bitcoin network is now consuming as much electricity as a mid-sized country like Argentina. Huge mining operations have been set up in China, and some even have dedicated power plants. One poster child for the environmental impacts of bitcoin is an Australian startup looking to reopen a decommissioned coal power plant to power its mining operations.

Possibility of manipulation

The huge energy requirements have led to massive server farms in cool regions near cheap electricity, concentrating mining in the hands of a few large players. This centralisation opens up Bitcoin to the possibility of manipulation as anyone with 50% of the network can effectively change the blockchain.

How does Chia address these issues with Bitcoin?

Chia has implemented a new consensus algorithm called proof of space and time. It relies on unused hard disk space, which lots of people have and can use free of charge. Again, without getting too technical, 'Farmers' seed unused space on their hard drive/SSD with 'plots' of cryptographic numbers. When verifying transactions, the network issues a challenge to the farmers, who then scan their plots for the closest answer. The farmer passes this answer back to a server on the network known as a 'timelord'. The farmer with the closest answer is rewarded with a coin.

The more 'plots' a farmer has, the higher the chance of winning a coin.

Setting up the Farm

We got excited about the idea of Chia being the next big thing and decided to hitch a ride on the bandwagon. We had a spare old computer lying around, so we decided to fill it up with as much storage as we could find and farm some Chia!

To set up a farm you need as much space for plots as you can get your hands on. The speed of this space is not critical, so you can use spinning drives. We found 12-terabyte NAS drives to be the sweet spot for bang for buck, and opted for 4x Seagate Ironwolf NAS drives from Scorptec. (Note: they’ve gone up $40 since we bought them!)

Seeding the plots, however, is VERY disk intensive, so you need speedy and reliable SSDs. Since they don't have moving parts you'd think that SSDs would be very reliable, but just like spinning drives, they wear out and eventually die. SSDs come with a TBW (Terabytes Written) rating which estimates the amount of writes you can do before the drive will die. Popular consumer SSDs like a 500GB Samsung EVO 870 have a TBW rating of 300. Chia recommends getting server-grade SSDs that have ratings into the Petabytes, but of course they come with a price to match.

We were limited by the age of our available motherboard, so we could only choose from SATA3-compatible drives. Appropriate enterprise SSDs were also unavailable, so in the end we settled on 500GB Seagate Firecuda 120s that are rated at 700 TBW (also from Scorptec). We decided on two so we could double the plotting rate.

Now we had our hands on the drives, we just had to install everything. Within a few hours of transferring components and wiring it up we were good to go and started plotting.

Our Chia Farm!

Final Thoughts

Our old hardware limits the speed of the SSDs and therefore the number of plots we generate. We're managing around 10 plots a day and will need close to 500 before we’ve filled the available storage.

When we bought our equipment (28th April) the chia calculator showed that we’d be earning around a coin a day when fully plotted. However, with the official launch of Chia imminent, the network has exploded in growth, passing 1 Exabyte (1000 Terabytes) just one day ago. It's now up to 1.68 Exabytes! So unfortunately our estimated time to a coin is down to one every 7 days. That’s still pretty good though, and if Chia does end up supplanting Bitcoin we might just make back the setup costs. It has been a fun exercise, even if we did spend too long on it, and if it does end up being a flash in the pan we can always use the drives for something else….

Why Manage Bots?

2020-11-30T13:00:00+11:00

Modern sophisticated bad bots often work around traditional security controls. They disrupt websites, mobile applications, and APIs. Malicious bot tactics include scraping user and pricing data, creating fake accounts, running advertising click fraud, exhausting online inventories, and taking websites offline with automated DDoS attacks.

About one-quarter of all website traffic in 2019 originated from bad bots, an increase of 18 percent over 2018. Advanced persistent bots (APBs) made up seventy-five percent of that bad bot traffic as they attempted to evade detection by cycling through random IP addresses, using anonymous/residential proxies, and changing their identities (user agent). The industries hit hardest by bad bots in 2019 included financial services, education, ecommerce, and government as well as media and airlines.

“Bot attack campaigns have become big business for threat actors, and major organizations are now fighting to support legitimate users and prospects while keeping attackers out of online applications and services,” says Paula Musich, Research Director, Enterprise Management Associates.

Bots have moved from simple scripts to distributed networks of automated agents that can mimic human interactions with machine learning techniques. They can avoid detection by network security technologies that have not kept pace with the way automated agents now operate.

Reducing the damage from bad bots means using security countermeasures that detect automated traffic and make attacks uneconomic, not just visible.

Bot Countermeasure Best Practices:

The following bad bot countermeasure practices cover network security, machine learning, and behavioural analysis. The aim is to reduce the economic harm that malicious bots inflict on businesses and end-users.

Web Application Firewalls

Web Application Firewalls (WAF) are a common first line of defence that filter out harmful Layer 7 web application (HTTP) traffic using rules or policies that protect organisations against Distributed Denial of Service (DDoS) bot attacks. WAFs also protect against cross-site forgery, cross-site-scripting (XSS), file inclusion, and SQL injection attacks. A WAF is considered a reverse proxy that protects servers and can be deployed as an appliance, server plug‑in, or filter, and customised by application type or use case. WAF rules can be updated or changed based on the type of bot attack.

IP Tracking and Reputation

Sophisticated bots can be detected with network forensics by inspecting web traffic and assessing whether requests come from actual users or bad bots. Requests can be analysed using data sources including Tor/proxy IPs, IP addresses, IP geo-location information, ISP information, and IP owners. Additional sources for real-time and near-time malicious IP threat data can come from network data, CERTs, MITRE and cooperating competitors.

Client/Device Fingerprinting

Fingerprinting attempts to identify devices, including PCs, Internet of Things (IoT) devices, mobile devices and servers, using data attributes that create real-time risk profiles to stop bot attacks. Using web page access data, a bot detection fingerprinting engine generates unique fingerprints for each end-user device and checks them against bad bots that use evasion techniques, including dynamic IP addresses and anonymous web proxies.

Machine Learning

Artificial Intelligence (AI) and machine learning algorithms are increasingly used to analyse malicious bot activity and make mitigation recommendations using data from sources such as user activity history, behavioural patterns and meta-data. Machine learning can use custom-tailored algorithms to target bots and iteratively process user data and identities to discern emerging bot attack patterns from very large amounts of real-time information.

Tarpitting

Tarpitting is a bot countermeasure that delays and slows down incoming malicious traffic from suspect connections. The technique is used to increase the financial and resource costs of bot attacks in an attempt to discourage malicious actors. Bad bot tar pits can delay bot request responses or take the bad bot IP address attack source offline completely. Innovative tarpitting techniques include requiring bad bots to solve computationally complex maths challenges to access resources or websites, thereby slowing down or stopping bot activity.

User Behavior Analysis

User interaction behaviour and identifying characteristics on a web page or mobile app differ from the behaviour of an automated malicious bot. Factors such as number of pages visited per session, time spent on each web page or within a mobile app and repeat visit frequency all help differentiate authentic users from bad bots. Defeating bad bots using Behavior Analysis involves creating a user model for individual sites with historical visitor data, then checking for anomalies that may indicate bad bot activity.

Intent-based Deep Behavior Analysis (IDBA)

Compared with Behavior Analysis, Intent-based Deep Behavior Analysis (IDBA) conducts behavioural analysis at the user intent level rather than the commonly used interaction-based behaviour analysis. IDBA consists of intent encoding, intent analysis, and adaptive learning. It also employs machine learning techniques to detect bad bots emulating on-site human behaviour interactions. Bad bot mitigation techniques include limiting attempts on login pages, web authentication pages and API call authentication pages.

Rate Limiting

Rate Limiting mitigates bad bots and DDoS attacks by restricting the amount of incoming traffic accepted by specific applications and API endpoints using pre-defined bandwidth limitation policies. Web applications, GET versus POST requests, APIs that receive queries, and login credentials can all be blocked if clients, IP addresses or IP and user-agent pairs violate Rate Limiting rules. Intellectual property scraping can also be protected by Rate Limiting policies that restrict repeated image or digital downloads.

Javascript Injection

JavaScript Injection techniques can help mitigate bad bot attacks in several ways. Scripts can be placed into web applications that “fingerprint” a user’s browser to distinguish humans versus bad bots emulating “human-like” mouse movements, keystrokes or clicks. Fingerprinting detection may also involve user agent identification, HTML5 canvas and audio fingerprinting, and protocol-level fingerprinting with TLS and HTTP2. JavaScript combined with browser cookies can also be used to identify anomalous behaviour from unwanted traffic or bad bots trending over time.

ANYCast DDoS Mitigation

Anycast is an IP addressing method that routes incoming traffic requests to the nearest location or “node.” Using ANYCast for selective routing enables network load resilience against DDoS attacks by routing high traffic across multiple servers and data centres. This prevents network resources from becoming overwhelmed with malicious or irrelevant traffic.

Alternative Content Serving

Serving Alternate and Cached Content when a bad bot is detected gives organisations a way to mislead bots without blocking them altogether. For instance, e-commerce sites may fool price scraping bots by serving alternative web pages that look like legitimate pages but with higher prices. Serving Cached Content when a bot is detected also minimises load on servers without affecting site performance.

Challenges

Requests from suspected bots can be redirected to Challenges or puzzles such as a CAPTCHA, also known as a Completely Automated Public Turing test, to help identify a bad bot versus a human. Online puzzles, such as letter matching, are easy for humans to solve but difficult for automated bots. reCAPTCHA, offered free from Google, is an advanced version of CAPTCHA puzzles that require users to identify text from real-world images such as street address signs, printed books or text from paper newspapers.

Final Thoughts

Bad bots hijack user accounts, create fake accounts, scrape websites for data and personal information, flood websites with traffic through automated distributed denial of service attacks and attack public-facing APIs using constantly changing techniques. They hide behind dynamic IP addresses, change their attack signatures, mimic human behaviours, and take over vast networks of hosts and IoT devices, creating zombie machines that distribute malware across the internet. Countermeasures ranging from Web Application Firewalls to sophisticated Machine Learning algorithms form an organisation's primary line of defence against bad bots.

Malicious Bot Threats

2020-08-12T13:00:00+10:00

Bots are software applications that automate repetitive tasks without human interaction. They have become part of the normal infrastructure of the internet. Some bots are useful; others are bad bots. The latter are the concern for application and security teams.

Bad bots keep changing and are increasingly difficult to detect. They can cause significant financial damage to organisations by disrupting online operations, overwhelming websites with traffic, and stealing information such as web content and ecommerce pricing data.

Bad Bot Types

Bad bots span a wide range of attack capabilities and scenarios. The following are the main categories these attacks fall into:

Spam Bots

Spam bots typically target blog comment sections, community portals and lead generation forms with 'garbage' or fake content. They can also insert unwanted ads, malicious phishing links and banners into real-time conversations to disrupt the service and attack users.

Scraping Bots

Price, content and inventory scraping bots steal prices and product listings. This can damage an ecommerce site's revenue stream and harm SEO rankings when duplicate content appears on competitor and bogus sites. These bots also scrape product reviews, news, product catalogues and user-generated content. Scraper bots can harvest email addresses, images and text from victim websites, then repurpose that material to pose as legitimate web pages.

Credential Stuffing Bots

Credential Stuffing Bots attempt to use login details from other sites, or run brute force guessing attacks against customer and admin accounts. If successful, they can make purchases, harvest personal information and purchase histories, make unauthorised cryptocurrency transactions, and transfer reward points and money to gift cards and air miles.

Ad Click Fraud Bots

Ad Click Fraud Bots can sabotage competitors by clicking on their ads to drive costs up and exhaust budget caps. They can also be used to scam advertisers with fake websites and ad clicks that pay the fraudster directly. In both scenarios, bots automatically generate interactions or 'clicks' with ads, promotions and media.

Credit Card Stuffing Bots

Carding bots make repeated attempts to authorise stolen credit card credentials. This can leave merchant payment processors with chargebacks and penalties, and may ultimately result in the victim merchant being prevented from accepting credit cards altogether.

Inventory Denial Bots

Cart Abandonment and Inventory Exhaustion bots automatically add hundreds of products to ecommerce shopping carts, then abandon them. This can block consumers from buying products, reduce sales, manipulate conversion rates and damage a brand’s reputation.

DDoS Bots and Botnets

Distributed Denial of Service (DDoS) attack bots and botnets are made up of thousands of compromised computers or Internet of Things (IoT) devices called "zombies". They can slow down a website or take it offline completely by flooding sites with massive amounts of artificially generated traffic. Researchers have found cybercriminals advertising DDoS services on the dark web with basic fees to attack unprotected sites ranging from $50 to $100, while an attack on a protected site can reach $400 or more.

Ticket Scalping Bots

Ticket scalping bots automatically buy tickets, enabling malicious users to resell them at a higher price. Examples include using a bot to purchase concert tickets for major events the minute they go on sale.

Fake Account Creation Bots

Fake Account Creation bots create fake accounts for criminal activities such as content spam, cryptocurrency laundering and malware distribution. Fake accounts can compromise brands and attack users with malware such as ransomware.

Hacker Bots

Hacker bots can distribute malware, attack websites and compromise entire networks by exploiting security vulnerabilities and injecting code into victim sites. Hacker bots can also perform DDoS attacks across web proxies with browser-like signatures to disrupt business operations.

Impersonator Bots

Impersonator bots copy human computer interactions and behaviours to fool users and bot mitigation defences while they conduct malicious activity. Impersonator bots also include propaganda bots that influence political opinions on platforms such as Facebook and Twitter. According to researchers at the University of Southern California who studied bot use during the 2016 U.S. Presidential election, “the presence of social media bots can indeed negatively affect democratic political discussion rather than improving it, which in turn can potentially alter public opinion.”

The Growing Threat

A report from Imperva found that roughly one-quarter of all website traffic in 2019 originated from bad bots, an increase of 18% over 2018. 75% of that bad bot traffic is made up by Advanced persistent bots (APBs) that attempt to evade detection by cycling through random IP addresses, using anonymous proxies, and changing their identities. The industries hardest hit by bad bots in 2019 included financial services, education, ecommerce and government, as well as media and airlines.

Companies offering "Bad Bots as-a-Service"* are also gaining ground. These data scraping services sell bots as easy-to-use packaged products that provide pricing and competitive intelligence, alternative data for finance, or competitive insights managed by Web Data Extraction Specialists and Data Scraping Specialists.

Malicious bot-for-hire services also offer personal and financial data harvesting, brute-force login services, ad click fraud, spamming services, transaction fraud services, and Distributed Denial of Service (DDoS) attacks.

Final Thoughts

Bad bot activity continues to increase, so websites need security controls that can identify and stop them. Our next article on bots will go over the common countermeasures used to combat bad bots.