May 14, 20265 min read

Cloudflare DNS-Only Behind GCP: Keep Webhook Signatures Alive

Put Cloudflare in proxy mode in front of a webhook origin and you will spend a quarter chasing "missed" Twilio webhooks that aren't actually missed. Here is why we ran gray-cloud instead.

Cloudflare DNS-Only Behind GCP Managed Certs: Keep Webhook Signature Validation Alive

The stake: put Cloudflare in proxy mode in front of a webhook origin and you will spend a quarter chasing "missed" Twilio webhooks that aren't actually missed. You'll go round and round with provider support before someone realizes the proxy was rewriting the request just enough to invalidate the HMAC.

A typical "I'm migrating to GCP" architecture diagram has Cloudflare in front of the Google Load Balancer in orange-cloud (proxied) mode. You get DDoS protection, WAF, CDN, bot fight mode — all the things Cloudflare is famous for — sitting in front of your origin.

For most applications, that's the right answer. For ours, it was the wrong answer. We ran Cloudflare in gray-cloud (DNS-only) mode and put Cloud Armor at the GCP edge instead. The deciding factor is the same one that bites every team running third-party webhook integrations behind a proxy: signature validation breaks in ways that look intermittent and look like the provider's fault.

Here is the trade-off, the failure mode, and the architecture we landed on.

What "proxied" actually does to a request

When Cloudflare is in orange-cloud mode for a hostname, every request terminates at Cloudflare's edge. Cloudflare opens a new connection to your origin and re-sends the request. From your origin's perspective, the request arrives with:

A different source IP (Cloudflare's edge, not the original client).
A different Host header in some configurations.
Re-canonicalized headers — Cloudflare reorders, lowercases, sometimes drops headers it doesn't recognize.
Potentially-rewritten URLs if any Workers or page rules are in play.

For a normal browser request, none of this matters. The application reads cf-connecting-ip for the real client IP, and everything else "just works." For a webhook request from Twilio, Stripe, or any other provider that signs its requests, the story is different.

Why webhook signature validation breaks

Webhook signatures are computed over the request as the provider sent it: the exact URL, the exact body, sometimes a subset of headers, with the provider's secret as the HMAC key. The signature ships in a header (X-Twilio-Signature, Stripe-Signature, etc.). Your origin re-computes the signature over the request it received and compares.

If anything in the request changed between the provider and your origin, the signatures don't match. The request gets rejected as forged.

What can change:

URL canonicalization. Cloudflare may normalize trailing slashes, encoded characters, or query string parameter ordering. Twilio computes its signature against the raw URL you provided. A slightly-different URL at the origin = invalid signature.
Body decoding. Twilio sends application/x-www-form-urlencoded. Some Cloudflare features (caching rules, Auto Minify) can touch the body. Stripe sends raw JSON, signed over the byte-exact body — any whitespace normalization is fatal.
Header subset. Each provider expects specific casing and ordering. Pass-through is mostly safe, not guaranteed.

The result: signature validation that worked when Twilio talked directly to your origin starts failing intermittently the day you move behind a proxy. The failures are not 100%, which makes the problem look like a flaky provider rather than an architectural mismatch. I have lived this. I will not live it twice.

The fix: gray-cloud mode

Cloudflare can manage your DNS without proxying the traffic. Every DNS record has a cloud icon in the dashboard: orange = proxied, gray = DNS-only. Gray-cloud means Cloudflare's nameservers answer with the real origin IP and the client connects to the origin directly. Cloudflare never sees the request.

You lose:

DDoS protection from Cloudflare (replaceable by Cloud Armor + GCP's edge).
Cloudflare's WAF (replaceable by Cloud Armor's OWASP rule presets).
Cloudflare's caching CDN (irrelevant for an API; minor loss for a UI; if you need it later, run a separate hostname through orange-cloud for static assets only).
Cloudflare's bot management (acceptable trade for an internal-CRM use case).

You gain:

Webhook signature validation that works consistently against every provider.
A direct path from client to origin — one fewer hop in every trace.
Predictable TLS — your client sees the GCP managed cert directly, not a Cloudflare-edge cert that's been re-signed.

For an internal CRM with low traffic volume and high integration count, that's the right side of the trade.

What we actually configured

Cloudflare: owns the DNS zone. Every record is gray-cloud. A records for app.optil3ads.com and api.optil3ads.com point at the static IP of the GCP HTTPS load balancer.
GCP HTTPS Load Balancer: single LB, two URL maps (one per hostname), two serverless NEG backends (UI and API on Cloud Run).
Cert: Google-managed SSL certificate covering both hostnames. Provisioning takes ~15 minutes after the A record propagates, because Google's CA does a DNS-01 challenge.
Cloud Armor: attached to the LB backend service. OWASP SQLi/XSS preset in preview-mode on staging (logs but doesn't block while we tune false positives), enforcing on prod. Per-IP rate limit of 30/min on /api/users/login, 1000/min globally.
HTTP → HTTPS: separate forwarding rule on port 80, redirects to the HTTPS listener.

Twilio webhooks point directly at https://api.optil3ads.com. The request hits our LB. The LB hands it to our API. The signature, computed against the exact URL configured in Twilio, matches the signature the API recomputes. No intermediary, no drift.

The escape hatch if we change our minds

Gray-cloud is the right first answer because flipping back to orange-cloud is a one-click change in the Cloudflare dashboard. The right second answer, once we'd flipped, is to put the webhook hostname on a separate DNS record that stays gray-cloud while the rest of the traffic goes orange. So app.optil3ads.com would be orange (UI gets the CDN), and api.optil3ads.com would remain gray (webhooks keep their signature integrity).

For now the volume doesn't justify the complexity. We're gray-cloud across the board and we'll revisit when there's a reason.

What this costs you if you skip it

A quarter of intermittent missed-webhook tickets, a loss of trust in your integration layer, and an architecture review six months from now where someone — possibly me — points at the proxy and asks "why is this here?"

Three questions before flipping to orange-cloud

Do you have webhook integrations with signature validation? If yes, start gray-cloud. The proxy can be added later when you've built out the testing to confirm signature validation still passes.
Is your origin already protected by a cloud-native edge? GCP Cloud Armor, AWS WAF, Azure Front Door — all of them give you the WAF and rate-limiting features the Cloudflare proxy was going to provide. Running both is paying twice.
Do you need the CDN? For an internal application with a small user base, no. For a public site, yes — and you can have it on a separate hostname.

Run the audit → /audit-checklist — the same questions in worksheet form.

Next in the series: The 14-Day Soak: What We Monitored, What We Ignored — what to watch after cutover, and what to deliberately ignore.

Run the audit on your own stack

A 30-question self-audit. P0/P1/P2 severity. Takes about an hour.

Open the checklist →