Cloud Migration Strategies: A CTO's Playbook

Most cloud migrations don't fail loudly. They fail slowly. The cutover works, the demo goes fine, everyone exhales, and then three months later the bill is double the projection and the same database that was slow in your data center is now slow in someone else's. We've cleaned up enough of these to have opinions, so here's how we actually approach moving production systems to the cloud without the usual mess.

Start with why, then be ruthless about scope

Before anyone touches Terraform, we want one sentence on why you're migrating. "Our lease is up." "We can't scale for the holiday spike." "We need to be in three regions for compliance." Those are real reasons with clear success criteria. "Everyone's doing it" is not, and it tends to produce migrations that cost a lot and prove nothing.

The why dictates the how. A hardware-lease deadline pushes you toward speed. A scaling problem pushes you toward re-architecting the bottleneck. Compliance pushes you toward specific regions and managed services with the right certifications. Pick the reason, write down what "done" looks like, and let that filter every decision after it.

The 6 Rs, and when each actually makes sense

The "6 Rs" framework gets thrown around a lot, usually as a checklist nobody applies with any rigor. It's genuinely useful if you treat it as a decision per workload, not a strategy for the whole estate. You'll use several of these in the same migration.

Rehost (lift and shift): move the VM as-is. Fast, low risk, zero improvement. Good when you're racing a deadline or the app is a black box nobody understands anymore.

Replatform (lift and reshape): keep the app mostly intact but swap a piece for a managed service, say moving your self-hosted Postgres to a managed database. Small effort, real operational payoff.

Refactor: rewrite parts to be cloud-native. Highest cost, highest reward. Reserve it for the workloads that are actually strategic.

Repurchase: drop the thing and buy SaaS instead. Self-hosted email, wikis, CI runners, your homegrown CRM. Why are you maintaining that?

Retire: turn it off. In almost every estate we audit, 10 to 20 percent of "running" things are serving no one. Find them.

Retain: leave it where it is, for now. A mainframe with a 2027 contract or a workload with latency needs the cloud can't meet stays home.

How to actually decide

We score each workload on two axes: business value and migration difficulty. High value plus low difficulty goes first, those are your easy wins that build momentum. High value plus high difficulty is where refactoring earns its keep, but you schedule it deliberately, not in the first sprint. Low value plus high difficulty is a strong candidate for retire or retain. You'd be surprised how often the "critical" legacy system nobody will let you touch turns out to have eight active users.

Lift and shift moves your problems to a pricier address

Rehosting has its place. The trap is treating it as the finish line. If you take a chatty, over-provisioned app running on a maxed-out box and drop it onto an equally large cloud instance, you've changed nothing except the invoice. Worse, you've added network latency and egress fees the data center never charged you for.

We've seen a "simple" lift and shift land at 40 percent over the on-prem cost because the team sized cloud instances to match physical servers that were themselves wildly over-provisioned. That 32-core box was running at 8 percent CPU. You don't need a 32-core instance. You need to look at actual utilization first.

So lift and shift is fine as a first move, as long as everyone agrees it's move one of several. Get it stable in the cloud, then replatform and right-size from there. The danger is calling it done and walking away, because the "we'll optimize later" ticket never gets prioritized once the thing is technically working.

Go incremental, keep a rollback path

Big-bang cutovers are where the disasters live. One weekend, everything moves, and if anything's wrong you're debugging production at 3am with no way back. We don't do those unless the workload is genuinely trivial.

Instead we move in slices. Stand up the new environment alongside the old one. Route a small fraction of traffic over, watch it, then widen. The strangler-fig pattern works well here: wrap the old system, redirect functionality piece by piece, and the legacy app shrinks until there's nothing left to migrate.

The rule we don't break: every step has a rollback that takes minutes, not hours. That usually means:

Data syncing both ways during the transition, or at least a tested restore path

Feature flags or a routing layer so you can shift traffic back instantly

Keeping the old environment alive and warm until the new one has proven itself for a couple of weeks, not a couple of days

Yes, running both costs more for a while. It's cheap insurance against a multi-day outage that torches customer trust.

FinOps, or why the bill ambushed you

The surprise cloud bill is real, and it's almost always self-inflicted. The cloud will happily rent you anything you ask for and never once suggest you might not need it. Cost discipline has to be built in, not bolted on after finance starts asking questions.

The biggest line items we see wasted:

Idle resources. Dev and staging environments running 24/7 when nobody works weekends or nights. Shut them off on a schedule and you cut their cost by two thirds overnight.

Over-provisioning. Right-sizing based on real metrics, not gut feel. Most instances are bought a size or two too big "to be safe."

On-demand for steady workloads. If a baseline of compute runs all year, reserved instances or savings plans cut that by 40 to 70 percent. Pay on-demand only for the spiky, unpredictable part.

Forgotten everything. Orphaned volumes, unattached IPs, old snapshots, that load balancer from a deleted project. They quietly bill forever.

Tag everything from day one so you can actually tell which team and which product owns which dollar. An untagged account is a bill you can't argue with. And put cost in front of engineers, not just finance. When the people spinning up resources can see what they cost, behavior changes fast.

Sometimes the answer is don't migrate that workload

We'll say this plainly because not enough vendors will: not every workload belongs in the cloud, and some shouldn't move at all.

A predictable, steady, high-volume workload that's been running fine on owned hardware for years can be meaningfully cheaper on-prem. The cloud's pricing rewards elasticity. If your load doesn't vary, you're paying a premium for flexibility you're not using. Data-heavy systems with constant egress, ultra-low-latency trading, certain regulated data, that genuinely ancient app that works and that nobody understands: think hard before you move these.

Being honest about this is the difference between a co-founder and a vendor. We'd rather tell you to keep three workloads where they are and migrate the other forty well than charge you to migrate everything and pretend it all made sense.

In the cloud versus cloud-native

Here's the distinction that matters once the dust settles. Being "in the cloud" means your servers run on someone else's hardware. Being cloud-native means you've built for the way the cloud actually works: services that scale horizontally, infrastructure defined in code, automated recovery, paying for what you use instead of what you provisioned.

Lift and shift gets you in the cloud. It does not get you cloud-native, and the gap between the two is where most of the promised value lives. The elasticity, the resilience, the cost efficiency, none of that shows up just because your VM has a new IP address. It shows up when you re-architect for the platform.

You don't have to get there in one jump, and you shouldn't try. Migrate pragmatically, get stable, then earn cloud-native one workload at a time where the payoff justifies the work. That's the whole playbook: clear reasons, honest trade-offs, small reversible steps, and a tight grip on the bill. Do that and the cloud delivers. Skip it and you've just bought a more expensive version of the problems you already had.

Cloud ComputingMigrationFinOpsCloud NativeStrategy

Cloud Migration Strategies: A CTO's Playbook

Cloud Migration Strategies: A CTO's Playbook

Start with why, then be ruthless about scope

The 6 Rs, and when each actually makes sense

How to actually decide

Lift and shift moves your problems to a pricier address

Go incremental, keep a rollback path

FinOps, or why the bill ambushed you

Sometimes the answer is don't migrate that workload

In the cloud versus cloud-native

Keep reading

Right-Sizing Your Cloud: How to Stop Overpaying for Infrastructure

How to Choose the Right AI Solution for Your Project

Security as a Definition of Done: Baking AppSec Into Every Sprint