top of page

Cheap Compute Is Not Cost Optimization, Architecture Is

  • Writer: Pairoj Ruamviboonsuk
    Pairoj Ruamviboonsuk
  • 4 hours ago
  • 3 min read

The Scenario

A fast-scaling SaaS company reviews its cloud bill.

Kubernetes clusters are running across regions. Autoscaling is enabled. Traffic is healthy.

But monthly infrastructure costs continue rising.

The team notices something obvious:

Spot instances (AWS), preemptible VMs (GCP), and spot VMs (Azure) offer up to 90% savings compared to on-demand pricing.

The discount is real.

The risk is real too.

Instances can be terminated with little notice.

So the question becomes: Can we capture the discount without sacrificing reliability?

Why On-Demand Worked

On-demand infrastructure is predictable.

No sudden termination. No surprise evictions. No capacity volatility.

It feels safe. But safety without optimization compounds cost.

As clusters scale, over-provisioning becomes invisible.Idle capacity hides inside auto-scaled environments.Replication multiplies inefficiency.

The system works. But it is not economically intentional.

Where Constraint Emerges

Spot capacity introduces a structural trade-off.

Cloud providers can terminate:

  • Spot instances (AWS)

  • Preemptible VMs (GCP)

  • Spot VMs (Azure)

Often with minimal notice — for example, two minutes on AWS.

If architecture does not account for interruption:

  • Stateful services fail

  • Critical workloads collapse

  • Single-instance apps go offline

  • Customer trust is affected

Cheap compute without architectural discipline becomes fragility.

Cost optimization must be engineered.

The Architectural Principle

Interruptible capacity is not unreliable. Undesigned systems are unreliable.

The goal is not to run everything on spot. The goal is to design a fault-tolerant architecture that can absorb interruption without impact.

Cost efficiency is not a pricing decision.

It is an architectural property.

The Design Discipline

Optimizing Kubernetes infrastructure costs requires structured separation of reliability tiers.

1. Separate Node Pools by Reliability

Create distinct node groups:

  • On-demand nodes for critical services

  • Spot/preemptible nodes for fault-tolerant workloads

This enforces architectural intent. Critical system pods do not compete with interruptible capacity.

2. Target the Right Workloads

Spot capacity is appropriate for:

  • Stateless services

  • Batch processing

  • CI/CD pipelines

  • Horizontally replicated services

It is not suitable for:

  • Stateful, single-instance workloads

  • Mission-critical services without redundancy

Architecture determines placement — not cost pressure.

3. Enforce Placement with Taints and Tolerations

Apply Kubernetes taints to spot nodes.Use tolerations only on workloads designed for interruption.

This prevents accidental scheduling of sensitive workloads onto unstable capacity.

Guardrails preserve discipline.

4. Automate with Intelligent Autoscaling

Cost optimization only works when scaling is dynamic.

Use:

  • Horizontal Pod Autoscaler (HPA)

  • Vertical Pod Autoscaler (VPA)

  • Node autoscaling via Cluster Autoscaler or Karpenter

A smart node autoscaler can:

  • Prioritize spot capacity first

  • Fall back to on-demand when unavailable

Karpenter, for example, dynamically provisions the right instance types at the right time and gracefully handles spot fallback.

Automation is not convenience.

It is economic control.

5. Handle Interruption Gracefully

Spot termination notices must trigger safe draining and rescheduling.

Tools like kube-spot-termination-notice-handler (AWS) allow pods to migrate before termination.

Interruption becomes a controlled event — not an outage.

6. Distribute Risk with Topology Spread Constraints

Spread replicas across:

  • Availability zones

  • Nodes

  • Instance types

This prevents a single interruption event from cascading into service degradation.

Resilience is distribution by design.

7. Monitor and Right-Size Continuously

Autoscaling only works when resource requests and limits are accurate.

Over-requesting resources defeats optimization.Under-requesting creates instability.

Continuous monitoring ensures:

  • Accurate scaling triggers

  • Cost visibility

  • Prevention of over-provisioning

Cost control requires measurement discipline.

The Multi-Layer Outcomes

When spot capacity is architected correctly:

Technical

Fault-tolerant workload designAutomated scaling behaviorGraceful interruption handling

Operational

No service disruption during spot terminationClear workload classificationControlled scaling events

Commercial

Up to 90% savings versus on-demand for eligible workloadsReduced over-provisioningImproved infrastructure ROI

Strategic

Economic resilience during traffic spikesFlexibility in multi-cloud strategyFreedom to scale without runaway cost

Cost becomes intentional rather than reactive.

Executive Translation

In boardrooms, this conversation is not about HPA or Karpenter.

It is about unit economics.

Can we scale without letting infrastructure cost grow linearly?

Spot capacity alone does not solve this.

Architecture does.

The Architectural Close

Cheap compute is easy to buy.

Resilient cheap compute is designed.

Spot instances are not risky.

Undifferentiated infrastructure is risky.

Cost efficiency in Kubernetes is not a pricing trick.

It is architecture applied to economics. And when economics are engineered, scale becomes sustainable.

Comments


bottom of page