Cloudflare outage proves Plan B depends on controlling DNS

Tue 18 November 2025

Co-Founder

4 min read

On Tuesday, 18 November 2025, Cloudflare’s own status page lit up bright red: every major service—CDN, Firewall, WARP, Workers, even the dashboard—was marked as degraded for most of the day while engineering teams fought an internal control-plane failure. Their timeline flipped from “Investigating” at 11:48 UTC to “Monitoring” after 14:42 UTC, and the incident wasn’t officially resolved until 19:28 UTC. During the worst of it, Cloudflare disabled WARP in London, bot scores seesawed, and customers were told to wait while remediation continued.

Waiting was the only option for many teams because their Plan B lived behind the same dashboard that was timing out. The top comment on the runaway Hacker News thread was literally a cheat sheet for curl commands to rip domains off Cloudflare’s proxy edge. Admins sat in 2FA queues trying to fetch an API token, or dug around for Terraform credentials just to toggle a proxied flag. That is not a resilience strategy.

We learned this lesson the hard way—and wrote about it after the 2021 Fastly outage in How to have a Plan B. The rule still stands: you cannot let the platform you are escaping be the only place that can change where your DNS points.

Detect: understand what’s actually broken

Incidents like Tuesday’s evolve quickly. Cloudflare’s own feed showed different failure domains every 30 minutes: bot management, dashboard auth, Access, WARP. The first Mile you need is impartial telemetry that tells you what your users feel, not what the provider thinks. At Peakhour we stream real user monitoring, synthetic checks, and control-plane health from multiple CDNs and DNS partners. That lets us distinguish “cache errors in Hong Kong” from “global auth outage” and choose the right lever.

Decide: keep DNS authority in neutral territory

When your domain delegation lives with agnostic providers—Route 53, NS1, Azure DNS, or the enterprise registrar your legal team already approved—you can make failover decisions without pleading with a failing control plane. Peakhour doesn’t replace those vendors; we orchestrate them. We set short-but-safe TTLs, keep secondary answers staged, and continuously audit API access so we can flip traffic with one signed request. The minute you outsource DNS authority to a proxy CDN, you’ve surrendered the switch that makes Plan B real.

Divert: run the playbook in minutes, not hours

A workable Plan B has three moves:

Pre-stage alternate edges. Your secondary CDN, origin, or transit provider must be in sync with the active one—certificates, cache rules, WAF policies, everything. We keep them hot by replaying production configs across vendors.
Wire DNS automation. We integrate with multiple third-party DNS APIs at once so we can update apex A/AAAA, flattened CNAMEs, and geo/latency rules in a single workflow. Because the automation lives off the impacted platform, we can execute even while Cloudflare’s dashboard is 500ing.
Drill humans on the handoff. Our SOC sits in Sydney and Melbourne, but we cover global hours. During an incident we line up Slack/Teams bridges with your SREs, confirm business impact, and keep execs in the loop while traffic drains to the healthy provider.

With that in place we routinely hit sub-five-minute diversion times, including DNS propagation, because the decision, the tooling, and the people are ready before the outage hits.

What Peakhour brings to your Plan B

Independent authority, familiar vendors. We leverage multiple market-leading DNS providers instead of locking you into ours. You keep your contracts; we bring the automation and guardrails.
Unified multi-CDN config. Cache rules, image optimisation, WAF, and routing policies stay aligned across providers so you don’t lose capabilities when you switch.
Real drills, not just runbooks. Quarterly failover exercises prove that certificates, APIs, and humans are ready. We share the post-mortems so your execs see clear RTO/RPO numbers.
People you can phone. 24×7 Australian-based engineers who know your stack and can execute the play while your own team communicates with customers.

Book a resilience review

If Tuesday felt like waiting in a ticket queue while revenue leaked, let’s fix that. Book a 30-minute Resilience Review with Peakhour and we’ll:

Map who really controls your DNS today.
Identify the gaps between your primary and standby CDNs.
Outline the automations we can layer on top of your existing DNS and hosting vendors.

You’ll leave with a concrete Plan B, a drill schedule, and a team that can execute it the next time a global provider blinks.

#CDN #DNS #Multi CDN #Incident Response