5 Things I Found When I Took Over a Contractor-Built SaaS
The same five problems show up inside every contractor-built SaaS I've inherited. None are exotic. All are P0.
5 Things I Find When I Take Over a Contractor-Built SaaS
The login page is throwing 502s. The founder isn't technical. The contractor stopped answering Slack a week ago, and the only person with the production password is over the Atlantic with one bar of signal.
That's the call. It's almost always the same five problems underneath, in roughly the same order — and the order matters more than the fixes.
1. The secrets are in git history. And probably a service-account key.
The first thing I do on every audit is git log --all -- .env and git log --all -- '**/*.json' | grep -i service-account.
I have not yet run those two commands and come up empty.
It is usually the very first commit. A .env file with the database root password, the JWT signing secret, the application-layer encryption key, every webhook provider's API key and secret, the SMTP password, the inbound-mail IMAP password — committed in plaintext, on day one of the project, often years ago. Even after the contractor's careful git rm in a later commit, the file is still in the history, still on every clone that ever happened.
The bonus prize is the cloud service-account JSON key, usually under config/ or secrets/, also in the first commit. That key is a standing cloud credential. Anyone who has ever cloned the repo — including every contractor who ever rotated through the dev shop — has production access to your cloud project, today.
The fix is not "delete the file and add it to .gitignore." That changes nothing. The fix is: rotate every credential, purge the file from every commit on every branch with git filter-repo, force-push, and revoke every contributor's GitHub PAT (because their PATs cached the old history). Then, separately, move every secret into a managed secrets store so the next contractor can't put them back.
Most founders do not know that none of those steps are reversible, and that they are unrelated to writing software. This is why the contractor never did them.
2. The contractor still owns the front door.
I open the registrar lookup on the production domain. The registrant is the contractor's personal name, with their personal Gmail as the admin contact.
Then I dig the production hostname. It resolves to a single IP in a VPS provider neither you nor I have a billing relationship with — a Hostinger box, a personal DigitalOcean droplet, a Linode in someone's Hetmer account. Inside that VPS is a hand-configured nginx terminating TLS with a Let's Encrypt cert renewed by a cron job nobody documented, then reverse-proxying to the Cloud Run service.
Read that sentence again. The contractor controls:
- Where your domain points (they can repoint it anywhere)
- The TLS certificate the customer's browser trusts (they can MITM all traffic if it ever matters)
- The webhook URL your phone provider sends SMS and call data to
- The hostname your application expects in its CORS allowlist
The "cloud migration" you paid for was a lift of two services into a managed cloud, fronted by the contractor's personal VPS. The serverless platform that is supposed to be your foundation is sitting behind one box that the contractor is going to forget to pay for in October.
When that box 502s — and it will — you will pay me to figure out how to take back the domain, the certs, and the webhook destinations without telling your phone provider's compliance team that you don't currently know who owns your phone numbers.
3. Cloud Run is --allow-unauthenticated. The proxy is a suggestion.
I curl the *.run.app URL directly. It returns 200. I curl the API with no headers. It returns 401 — "missing origin." I add -H "Origin: https://app.contractor-vps-hostname.com". It returns 200 with a full customer record.
The contractor's "security model" is that they check the HTTP Origin header in middleware, on the assumption that real attackers do not own a copy of curl.
The VPS in front of Cloud Run is not a perimeter. It is, charitably, a vanity URL. The Cloud Run services are deployed --allow-unauthenticated, which means every dollar you are paying for that VPS is buying you nothing except the contractor's continued involvement.
The fix is real, and almost nobody does it on a first build: set Cloud Run ingress to internal-load-balancer-only, stand up a managed Google Load Balancer with a Google-managed certificate, attach Cloud Armor for WAF and rate limiting, and lock invocation to the load balancer's service account. The contractor never did this because none of it is in the tutorial they followed. Cloud Armor also shows up as a line on the cloud invoice, which prompts uncomfortable questions.
4. Webhook signatures are off, and the database is on a public IP.
Inside the deployed environment, I find VALIDATE_WEBHOOK_SIGNATURE=false. This means anyone on the internet who knows the webhook URL — and that URL is in the contractor's git history, see (1) — can POST a forged payload and have it ingested as a real event. The contractor turned validation off years ago when it failed during local development and never turned it back on.
The managed database is on a public IP. The connection options in the application include ssl: { rejectUnauthorized: false }, which means TLS is happening but the server identity is never verified. Anyone who can intercept the network path — and "the network path" includes the customer's office wifi while they are signed in — can read the database credentials in flight.
These are the two findings that wake up the founder's lawyer at 11pm.
5. Everything is in-process. The application is the bottleneck.
The API process is running an in-process inbound-mail watcher. It is also running an in-process cron library for the monthly billing job and the customer-reminder job. It is also writing file uploads to local disk inside the container. It is also writing logs to a file with fs.appendFileSync.
This means you have exactly one production instance. Forever. Two instances would race for the same emails, fire the cron job twice, lose uploads to whichever container the user did not land on, and split logs across ephemeral disks that get destroyed on every redeploy.
The "scaling" the cloud platform was supposed to give you is gated by an application architecture the contractor chose in week one. You cannot horizontally scale until the inbound-mail listener moves to its own service, cron moves to a managed scheduler firing managed jobs, uploads move to object storage via signed URLs, and logs move to a centralized log sink.
That is not a refactor. That is a re-platforming.
What to do with this list
If you are a founder reading this, do not panic. None of these findings are uncommon. None require a rewrite. The application code is usually salvageable; the problem is uniformly with how the application was deployed.
The order of operations is what matters:
- Rotate everything, today. Treat the credentials as already compromised.
- Take back the domain, the DNS, and the registrar, this week.
- Stand up an in-house edge (LB + WAF + managed cert) and put your application behind it.
- Move the database to a private IP. Turn webhook signatures back on.
- Plan the dismantling of "everything in the application process" as a follow-up project, not a heroic sprint.
Next in the series: credential rotation, in order — which secrets you can rotate quietly tonight, which ones log every user out the instant you click the button, and the encryption-key trap that turns "rotate this weekend" into a Tuesday-morning incident review.
Run the same audit on your own stack. Open the 30-question checklist →
Next in the series: The Leaked Service Account Key, the Public-IP Database, and Other Handoff Sins →
Run the audit on your own stack
A 30-question self-audit. P0/P1/P2 severity. Takes about an hour.
Open the checklist →