pricing-finopsgpus-computedeployment

Cloud GPU Pricing Shifts in Q1 2026

On-demand H100 pricing dropped 15-20% across hyperscalers. Spot markets are volatile but offering 50-65% discounts. Reserved 1-year commitments remain the best value at 35-40% off on-demand. Tier-2 providers are undercutting on price but trailing on availability guarantees.

Digiteria Labs/9 min read

Key Signals

  • AWS dropped on-demand pricing for p5.48xlarge (8x H100 SXM) by 18%, from $98.32/hr to $80.62/hr, effective January 15. GCP and Azure followed with 15-20% reductions on equivalent instances within two weeks.
  • Spot pricing on H100 instances is now 50-65% below on-demand across all three hyperscalers, but availability windows have narrowed to 2-4 hours in peak regions (us-east-1, europe-west4) before preemption.
  • 1-year reserved (committed use) pricing has settled at 35-40% off on-demand for H100 instances — the best risk-adjusted deal in GPU compute right now.
  • Tier-2 providers (CoreWeave, Lambda, Crusoe) are pricing H100 on-demand at $2.49-2.85/GPU/hr, undercutting hyperscalers by 20-30%, but with weaker SLAs and no multi-region failover.
  • The A100 80GB generation is entering clearance pricing territory: on-demand rates down 30-40% year-over-year, making it the budget option for inference workloads that do not require H100 memory bandwidth.

What Happened

I've been tracking GPU pricing across the major clouds for the past year, and what happened in January felt inevitable. H100 supply finally caught up with demand after eighteen months of allocation constraints. TSMC's expanded CoWoS packaging capacity, combined with NVIDIA shipping over 500,000 H100 units in Q4 2025, means the hyperscalers are no longer supply-constrained on current-generation hardware. When supply normalizes, prices fall. And in January, they fell hard.

AWS moved first on January 15, cutting p5.48xlarge on-demand rates by 18%. The thing most people miss is that this was not altruistic. Google Cloud had been quietly offering aggressive committed-use discounts to large accounts since November 2025, pulling workloads off AWS. (I talked to two teams that got unsolicited GCP pricing offers they described as "too good to ignore.") Within ten days, GCP formalized the cuts across their a3-highgpu-8g (H100) instance family, and Azure followed with reductions on ND H100 v5 series. The net result: on-demand H100 pricing converged to roughly $8.00-10.10 per GPU-hour across all three providers, down from $10.50-12.30 in Q4 2025.

But here is why the headline numbers are misleading. The on-demand price drop makes for a nice press release, but that is not where the real action is. Spot markets and reserved commitments are where the economics get genuinely interesting — and where most teams are leaving staggering amounts of money on the table.

Note: The gap between on-demand and optimized procurement (reserved + spot mix) is now $3.50-4.80 per GPU-hour on H100 instances. For a team running 64 GPUs continuously, that is $170,000-245,000 per month in avoidable spend. I want to be blunt: procurement strategy is no longer a finance concern — it is an engineering architecture decision. If you are not treating it that way, you are probably overpaying by more than your next hire costs.

Spot Markets: High Reward, Higher Variance

Spot pricing has always been the cheapest way to access GPU compute, but I'm seeing a new dynamic in Q1 2026 that changes the calculus: volatility. As more teams adopt spot-aware orchestration frameworks (SkyPilot, Kubernetes Karpenter with GPU-aware provisioning), the competition for spot capacity has intensified. Prices now swing 30-40% within a single day in popular regions. That is a lot of variance to manage.

The numbers are compelling — when you can actually get the instances. AWS p5.48xlarge spot instances have traded as low as $28.50/hr (71% off on-demand) during off-peak windows, and the weekly median sits around $33-36/hr (55-59% off). GCP and Azure spot markets show similar patterns, though with less liquidity and faster preemption — average instance lifetime before reclamation is 2.1 hours on GCP versus 3.4 hours on AWS for H100 instances. (That GCP number is tight. Really tight.)

My take on what this means practically: spot is excellent for fault-tolerant workloads — fine-tuning with checkpoint-resume, batch inference, offline evaluation suites — but it is not a viable primary strategy for real-time serving. Here is what I think a lot of teams are about to learn the hard way: those who built their entire inference stack on spot during the 2024-2025 scarcity (when spot prices were paradoxically high and stable) are now getting hammered with frequent preemptions as the market normalizes. Stable spot was an aberration, not the baseline.

Builder Breakdown

Pricing Comparison: H100 Instances Across Providers (February 2026)

Let me walk through the current pricing landscape, because the provider-to-provider differences are more nuanced than the headlines suggest.

ProviderInstance TypeGPUsOn-Demand ($/hr)1yr Reserved ($/hr)Spot Median ($/hr)
AWSp5.48xlarge8x H100 SXM$80.62$52.40 (35% off)$33.86 (58% off)
GCPa3-highgpu-8g8x H100$78.17$47.68 (39% off)$31.27 (60% off)
AzureND H100 v58x H100$80.88$51.77 (36% off)$35.49 (56% off)
CoreWeaveH100-80GB-SXM8x H100$22.76*$16.56 (27% off)N/A
Lambdagpu_8x_h100_sxm58x H100$19.92*$15.14 (24% off)N/A

Per-GPU pricing multiplied by 8 for comparison. CoreWeave/Lambda price per GPU: $2.85 and $2.49 respectively.

Optimizing Your Procurement Mix

Here is how I would think about structuring the buy, and I think this framework holds regardless of scale.

Baseline load (60-70% of capacity): Reserved instances. Look at your steady-state GPU utilization over the past 90 days. That floor is your reserved commitment target. At 35-40% off on-demand, the breakeven on a 1-year commitment is reached in under 4 months — everything beyond that is pure savings. GCP currently offers the deepest reserved discount (39%) on a3-highgpu-8g, which is worth noting even if you are an AWS shop.

Burst and batch (15-25% of capacity): Spot instances. Configure your orchestrator for multi-region spot with automatic failover. Key settings for Karpenter:

# Karpenter NodePool for GPU spot instances
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu-spot-h100
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["p5.48xlarge"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: gpu-spot
  disruption:
    consolidationPolicy: WhenEmpty
    expireAfter: 4h
  limits:
    cpu: "768"
    nvidia.com/gpu: "64"

Emergency headroom (10-20% of capacity): On-demand. Keep a small on-demand allocation for traffic spikes and spot preemption recovery. Think of this as your insurance policy. I know it feels wasteful. Over-provisioning here by 5% is still cheaper than the latency spike from scrambling for capacity during a preemption cascade — and I have watched teams learn that lesson in production.

A100 as budget tier. This is the one most people overlook. If your models fit in 80GB and you do not need the H100's higher memory bandwidth (3.35 TB/s vs 2.0 TB/s), A100 instances are now priced at $4.30-5.10/GPU/hr on-demand — 40-50% cheaper than H100. For inference on models under 30B parameters, the cost-per-token difference is marginal. Sometimes the boring option is the right one.

Economic Analysis

Winners and Losers

I always find it useful to think about pricing shifts in terms of who benefits and who gets squeezed. Here is what I'm seeing.

Winners:

  • Mid-scale inference operators running 100-500 GPUs. The 15-20% on-demand cut plus optimized procurement can reduce their annual GPU bill by $1-4M without any migration or architectural changes. Pure operational leverage. These teams just got a raise they did not have to ask for.
  • Spot-native ML platforms (Anyscale, Modal, SkyPilot users). The widening gap between spot and on-demand rewards teams that invested early in preemption-tolerant infrastructure. Their effective cost basis just dropped to $4.00-4.50/GPU/hr. That is a structural advantage.
  • A100 holdouts. I'll admit I was skeptical of teams that dragged their feet on the H100 migration. They were right. A100 clearance pricing makes the 80GB SKU the best price-performance option for inference workloads under 30B parameters. No migration needed. Sometimes procrastination pays off.

Losers:

  • Teams locked into H100 reserved instances from mid-2025. This one stings. Those contracts were priced at Q3 2025 rates — 20-30% above current market. Depending on the provider, there may be limited options to renegotiate. Azure allows reserved instance exchanges; AWS does not for Savings Plans already purchased. (If this is you, it is not the end of the world — you are still ahead of on-demand buyers. But it is frustrating.)
  • Tier-2 providers competing purely on price. CoreWeave and Lambda's value proposition was "H100s cheaper than hyperscalers." With the hyperscaler price cuts, the gap has narrowed from 40% to 15-20%. Their remaining edge is availability and bare-metal access, not cost. I'm not sure that is enough to hold enterprise customers long-term.
  • On-demand-only teams. I want to be honest about this: any organization still running production GPU workloads entirely on on-demand pricing is now paying a 35-60% premium over an optimized mix. At scale, this is the difference between a viable unit economics model and one that does not close. Full stop.

"GPU pricing is no longer a supply story — it is a procurement strategy story. The difference between the best and worst buyer of the same H100 hour is now 3x. That gap is wider than the performance difference between an A100 and an H100."

Note: Here is what worries me most right now: spot preemption rates on H100 instances have increased 2.4x since November 2025. If your spot-based training runs do not checkpoint at least every 20 minutes, you are statistically likely to lose work within a single session. I have seen this bite teams who tested their checkpoint-resume path against simulated failures but never against real preemption timing. Set checkpoint_interval aggressively and test your resume-from-checkpoint path under actual preemption conditions. The difference between those two scenarios is larger than you think.

Note: One thing I'm keeping a close eye on: GCP's new Flex CUD (Committed Use Discount) option, launched in January 2026. Unlike standard 1-year or 3-year commitments, Flex CUDs allow monthly re-allocation across GPU instance families. The discount is shallower (20-25% vs 35-40%), but the flexibility to shift between A100 and H100 — or between regions — makes it a strong hedge if you are uncertain about your workload mix over the next year. The data here is thin since it just launched, but the mechanism is sound.

Recommendation

What I'd Do

If you're a CTO: Audit your current GPU procurement mix this week. Not next quarter. This week. If more than 30% of your GPU spend is on-demand, you are overpaying by six figures annually — and I am being conservative with that estimate. Set a target of 60% reserved, 25% spot, 15% on-demand. Have your infra lead model the 1-year reserved commitment against your trailing 90-day utilization baseline. For hyperscaler selection, GCP currently offers the best reserved pricing on H100; AWS has the deepest and most liquid spot market. Pick based on where your workloads already live unless the delta is enormous.

If you're a founder: GPU cost should not be a black box in your financial model. Ask your technical team for the blended effective rate per GPU-hour — not just the list price. If your team is paying the on-demand rate because "reserved is complicated," I would push back on that framing. That is an engineering leadership gap, not a technical constraint. The procurement optimization I described above requires zero code changes to your application — it is purely an infrastructure configuration exercise. There is no good reason to leave this money on the table.

If you're an infra lead: Implement a three-tier procurement strategy this quarter. First, lock in 1-year reserved instances covering your P50 utilization (the level you exceed 50% of the time). Second, deploy Karpenter or SkyPilot for spot-based batch and training workloads with multi-region failover across at least three availability zones. Third, maintain a small on-demand pool sized to handle your P95 burst minus your reserved capacity. Run a cost simulation against the last 90 days of actual usage to validate the mix before committing. If you are on AWS, evaluate whether Savings Plans (compute-flexible) or Reserved Instances (instance-specific) better match your workload patterns — Savings Plans offer more flexibility but slightly shallower discounts. Honestly, either one beats doing nothing. Start there.

Sources

  1. "Amazon EC2 P5 Instance Pricing Update — January 2026," AWS Pricing Page, aws.amazon.com/ec2/pricing/on-demand
  2. "GPU Instance Committed Use Discounts and Flex CUDs," Google Cloud Blog, cloud.google.com/blog/products/compute (January 2026)
  3. "Azure ND H100 v5 Series Pricing Adjustments — Q1 2026," Microsoft Azure Pricing, azure.microsoft.com/pricing/details/virtual-machines
  4. "Spot Instance Preemption Rates and Availability Trends," The Cloud GPU Price Index, Vantage.sh (February 2026)
  5. "H100 Supply Normalization and Market Impact," SemiAnalysis GPU Market Report, semianalysis.com (January 2026)

Need help implementing AI infrastructure for your organization? We help enterprises build, deploy, and optimize production AI systems. Learn about our AI consulting services.

Related insights