May 6, 20267 min read

The Leaked Service Account Key, the Public-IP Database, and Other Handoff Sins

A leaked service account key, a public-IP database, and the other handoff sins I find on every contractor-built stack. Here is the audit, by category.

The Leaked Service Account Key, the Public-IP Database, and Other Handoff Sins

"Can we just rotate them this weekend?"

The mechanics are simple. The blast radius is what separates a quiet Saturday from a Monday-morning incident review with the customers cc'd. Some credentials you can rotate at lunch. One of them, rotated wrong, will render every encrypted column in your database as ciphertext until you finish a re-encryption pass you haven't planned yet.

This is the order I do them in, and the questions I ask before I touch any of them.

The two questions to ask about every credential

Before you touch anything, sort every secret you find into two buckets:

Does rotating it change anyone's experience? Some credentials are 100% backend — the application talks to a managed service, no user is in the loop. Rotate at will. Others are user-facing: rotating the JWT signing secret logs every user out instantly. Rotating the application-layer encryption key turns every encrypted column into garbage unless you re-encrypt the data first.
Is the same secret value also in use somewhere you don't control? If the contractor pasted the leaked secret into a Cloud Run env var, into a Twilio Console field, into a Zoho mailbox config, you have to rotate the value in all of those places at once. Forget one and the application breaks.

The reason most founders try to negotiate the rotation step is they don't realize that "the secret is in git" is the easy part. The hard part is reconstructing every place that secret was copy-pasted to over the last three years.

The quiet rotations

These you can do today, by yourself, off-hours, with zero customer impact. Do them first to build confidence and to shrink the blast radius before you touch anything user-facing.

The cloud service-account JSON key. That service-account-key.json file you found under config/? Find the IAM page for the service account it belongs to. Note every key ID. Create a new key. Update the application's env var (or, better, switch to Workload Identity Federation — more on that in the next post). Confirm the new key is in use. Then go back and delete the old key. The moment you delete it, anyone holding a copy of the leaked one gets a 401 the next time they try to authenticate. The application keeps running because it has the new key.

There's one gotcha: if the SA is used for two services and only one was updated, deleting the key bricks the other. List every Cloud Run service, every CI workflow, every personal laptop env file, before you delete.

The database credentials. This one's a little louder, because if you rotate the password while the application is running, every existing connection in the pool breaks on its next checkout. But you can sequence it: rotate the password, update Secret Manager (or the env var), then trigger a rolling restart of the application. New connections use the new password; old connections terminate gracefully when their requests finish. Done in under a minute.

If you have the option, this is also the moment to stop using root. Create an app_user with grants only on the one schema your application owns. Switch the application to that user. Rotate root's password separately and put it in a safe nobody touches. The next leak limits the blast radius to one schema.

Webhook provider keys (Twilio, Stripe, SendGrid, etc.). Most providers let you have multiple active API keys at once. Create a new one. Update Secret Manager. Roll the application. Confirm everything still works. Then revoke the old key in the provider's console. Total user impact: zero. There's no equivalent of "log everyone out" for these — they're machine-to-machine.

SMTP and IMAP credentials. Same pattern. Provider supports key/password rotation in the console. Update Secret Manager. Roll the application. Revoke old. The only customer-facing artifact is that outbound email may pause briefly during the rollout — under a minute on Cloud Run.

If you stop here, you've done most of the work. The blast radius of every contractor's copy of .env is now mostly zero. They have stale credentials.

The loud rotations

These you have to plan around.

The JWT signing secret. Every existing user session is signed with the old secret. The moment you rotate, every existing session token becomes invalid and every logged-in user gets bounced to the login screen.

That's not a bug, that's the point. You want every existing session invalidated, because some of them may be in the hands of someone who pulled the secret out of your git history. But you also want to choose when it happens.

Pick an off-peak hour. Send a one-line in-app banner 24 hours ahead: "We'll be performing a security update at X — you may be asked to sign in again." Rotate the secret. Roll the application. Users see a login prompt. They sign in. They forget about it within ten minutes.

Don't ship this on a Friday afternoon, ever. Half of the support burden from a JWT rotation is "I think your site is broken, it logged me out." Customers want to email you about that during business hours.

The application-layer encryption key. This is the dangerous one.

If your application encrypts PII at the database layer — emails, phone numbers, addresses, anything in an AES_* column — there is a single secret somewhere that decrypts all of it. Rotating that key without first re-encrypting the data turns every encrypted field into garbage. The dashboard renders ciphertext. The lead detail page renders ciphertext. Dedup logic that hashes decrypted values produces wrong hashes silently, and you discover the next morning that your unique_email index has been collecting duplicates for hours.

I have seen this happen on an attempted "quick rotation." The contractor told the founder, "I'll just rotate the encryption key this weekend." It was Tuesday before anybody understood what they were looking at.

The correct sequence is:

Keep the leaked key in use. Inherit it from the old stack. The exposure stays put for a few weeks; the application keeps working.
Write a one-off migration that iterates over every encrypted column, decrypts with the old key, encrypts with a fresh key, writes back.
Run that migration with the application in maintenance mode or with row-level locks. Measure the time on staging first; for a database with 1M encrypted rows, expect 10–30 minutes.
Rotate the secret to the new key value. Roll the application. Verify a known record decrypts correctly.
Destroy the old key.

This is a week of planning for what looks like a one-line command. It is also the only safe way to do it. The leaked encryption key in your git history is your problem until you finish step 5.

Anything signed with the encryption key for verification purposes. If the application uses the same key to sign tokens, validate webhooks, or hash dedup columns, every artifact in the database that depends on those signatures or hashes is also invalidated by a key rotation. Plan a re-hashing pass alongside the re-encryption.

The "do this last" item: purge git history

Once you've rotated everything, the secret values in git history are stale. They still shouldn't be there, but they no longer give anyone production access. You have time to do the purge properly.

git filter-repo (or BFG) to remove the file from every commit on every branch. Force-push to the remote. Then have every contributor revoke and re-issue their GitHub PATs, because their PATs cached the old history and would let them resurrect it on a re-clone.

This is the step founders most often skip, because the application keeps working without it. Do it anyway. The day you hire a new engineer and they grep through history during onboarding, you don't want them to find the old secrets and assume they're still valid.

The order, in summary

Quiet rotations: SA keys, DB password, webhook provider keys, SMTP, IMAP. Today.
JWT rotation, off-peak, with a banner. This week.
Encryption key: plan the re-encryption migration, dry-run on staging, schedule the cutover. Two to four weeks.
Purge git history and revoke PATs. After all of the above.

If you do these in any other order — particularly if you rotate the encryption key first — your week gets dramatically worse. The contractor's continued cooperation is not a prerequisite for any of this. Most founders try to negotiate that step. Don't.

Next in the series: why I never gcloud run deploy from my laptop — and how Workload Identity Federation removes the JSON-key problem from your future entirely.

Run the same audit on your own stack. Open the 30-question checklist →

Next in the series: Why I Never gcloud run deploy From My Laptop →

Run the audit on your own stack

A 30-question self-audit. P0/P1/P2 severity. Takes about an hour.

Open the checklist →