Cheapest H100$1.90/hron Hyperstack
Total GPUs:17,721
Available:795

Compare Cloud GPU Prices Across 28 Providers

GPUPerHour tracks 17,721 cloud GPU offers from 28 providers, with prices updated every 60 seconds. We only show acutally available instances that you can provision right away. The table below covers 87+ GPU models across 7 continents: filter by VRAM, pricing type, provider, or location to find the lowest cost option for your workload. Select providers can be deployed directly through DeployGPU, a unified console for provisioning GPU instances across multiple providers without creating separate accounts.

How Cloud GPU Pricing Works

Cloud GPU pricing varies by more than 100x depending on the hardware and provider. GPUPerHour currently tracks prices ranging from $0.01/hr for entry-level GPUs to over $280.00/hr for multi-GPU clusters with high-end accelerators. The primary factors that determine cost per hour are the GPU model, the amount of VRAM, the number of GPUs in the instance, the cloud provider, and the data center region. Prices also differ significantly between provider categories: hyperscalers like AWS and Google Cloud typically charge 2-3x more than GPU-first providers like RunPod and Lambda for equivalent hardware.

Three pricing models dominate the cloud GPU market. On-demand pricing is the most common: users pay a fixed hourly rate with no commitment and can terminate at any time. Spot pricing (also called interruptible or preemptible) offers discounts of 50-80% compared to on-demand rates, but instances can be reclaimed by the provider when GPU demand spikes. Reserved pricing locks in a rate for a fixed term, typically 1-3 months, at savings of 20-40% compared to on-demand. The right model depends on the workload: on-demand for development and short-lived jobs, spot for fault-tolerant batch processing, and reserved for sustained production workloads where cost predictability matters.

The advertised hourly GPU price is not the full cost. Data egress fees add $0.05-$0.12 per GB on most providers, which can exceed the GPU cost for data-intensive workloads that transfer large datasets or model weights. Storage charges for attached volumes typically run $0.10-$0.20/GB/month. Networking costs, minimum billing increments (some providers bill by the hour even for jobs lasting minutes), and setup fees on reserved instances all contribute to the real price. GPUPerHour's cost calculator factors in egress, storage, and ingress to show the true cost of each offer beyond the headline hourly rate.

Cloud GPU prices are not static. Spot market rates change multiple times per hour as supply and demand shift across providers and regions. Even on-demand prices update when providers adjust their catalogs, launch promotions, or respond to competitor pricing. Static comparison pages and annual blog posts go stale within days. GPUPerHour addresses this by updating prices every 60 seconds across all 28 tracked providers, monitoring 17,721 individual offers for price changes and availability status. The metrics strip at the top of this page shows the current lowest H100 price and total available inventory in real time.

How to Rent a Cloud GPU

Renting a GPU in the cloud means provisioning a virtual machine or container with one or more GPUs attached, paying by the hour (or by the second on some providers), and releasing the resources when finished. There is no hardware to purchase, no maintenance overhead, and no upfront capital expenditure. The economics are straightforward: purchasing an NVIDIA RTX 4090 costs approximately $1,600, while renting one in the cloud costs as little as $0.20/hr. An H100 SXM costs over $25,000 to buy but can be rented for $1.50-$3.50/hr depending on the provider. Renting makes sense for any workload that does not require 24/7 GPU utilization over multiple months.

The process of renting a cloud GPU follows six steps. First, choose a GPU model based on the workload's VRAM and compute requirements: 12-24GB for inference and image generation, 40-80GB for training medium models, 80GB+ for large language model training. Second, compare prices across providers using GPUPerHour to find the best rate for the chosen GPU. Third, create an account with the selected provider. Fourth, provision an instance through the provider's console, API, or CLI. Fifth, connect to the instance via SSH or web terminal and run the workload. Sixth, terminate the instance when the job completes to stop billing. The entire process from account creation to a running GPU instance typically takes under 10 minutes with GPU-first providers. DeployGPU simplifies this further by offering a single console that provisions instances across multiple providers: compare prices on GPUPerHour, then deploy without creating separate accounts on each provider.

The cloud GPU provider landscape divides into three categories. Hyperscalers (AWS, Google Cloud, Azure, Oracle Cloud) offer enterprise-grade security, compliance certifications, global region coverage, and deep integration with their broader cloud ecosystems, at premium prices. GPU-first providers (Lambda, RunPod, CoreWeave, TensorDock, Jarvis Labs, Vultr) are purpose-built for AI and compute workloads, offering lower prices, faster GPU availability, and ML-specific features like pre-installed frameworks and Jupyter notebook access. Peer-to-peer marketplaces (Vast.ai community cloud, Salad) aggregate GPU capacity from distributed sources, offering the lowest prices but with variable hardware quality and fewer enterprise guarantees. GPUPerHour tracks 28 providers across all three categories.

Several strategies can reduce GPU rental costs significantly. Spot instances save 50-80% for workloads that tolerate interruptions: training jobs with checkpointing, batch inference, and data preprocessing are all good candidates. Region selection matters because GPU prices vary by geography: Nordic and Southeast Asian regions often run 15-25% below US East pricing for the same hardware. Right-sizing the GPU avoids paying for capacity the workload does not use: an RTX 4090 at $0.20/hr is a better choice than an H100 at $2.50/hr for inference on models under 13B parameters. Monitoring egress fees is critical for data-heavy workloads, as transferring 1TB of data at $0.09/GB adds $90 to the bill regardless of GPU hours. GPUPerHour's filters make it possible to compare these tradeoffs across all providers simultaneously, and the GPU rental directory lists every available configuration by model.

Choosing the Right GPU for Your Workload

GPUs accelerate AI and compute workloads because their parallel processing architecture is designed for the matrix operations that underpin deep learning, scientific simulation, and rendering. A modern GPU like the NVIDIA H100 contains 16,896 CUDA cores and 528 Tensor Cores, enabling it to perform 1,979 TFLOPS of FP8 computation: orders of magnitude faster than any CPU for these workloads. Cloud GPU compute eliminates the need to purchase and maintain this hardware. GPUPerHour tracks 87 GPU models from 28 cloud providers, covering hardware from entry-level consumer GPUs to the latest data center accelerators.

Training and inference are the two primary AI workload types, and they have fundamentally different GPU requirements. Training requires maximum VRAM to hold the model, optimizer states, and gradient buffers: fine-tuning a 70B parameter LLM with LoRA requires at least 40GB, while full-parameter training at that scale requires 80-160GB spread across multiple GPUs. High memory bandwidth (HBM3 on the H100 delivers 3,350 GB/s) accelerates training by reducing the time spent moving data to and from the GPU. Inference, by contrast, needs enough VRAM to hold the model weights but does not store optimizer states: a 7B parameter model fits comfortably in 16GB, and even 70B models can run on a single 80GB GPU with quantization. This means inference workloads can run on cheaper hardware.

GPU models divide into three pricing tiers. Entry-level GPUs (RTX 3060, RTX 3080, RTX A4000) offer 12-16GB VRAM and start at under $0.10/hr: they are suitable for experimentation, small model inference, image generation with Stable Diffusion or Flux, and learning. Mid-range GPUs (RTX 4090, L40S, A100 40GB) offer 24-48GB VRAM and typically cost $0.50-$2.50/hr: they handle fine-tuning medium models, batch inference at production scale, and video processing. High-end GPUs (A100 80GB, H100 SXM, H200) offer 80-141GB of HBM memory and cost $1.50-$4.00+/hr: they are required for training large models and high-throughput inference serving. GPUPerHour's VRAM filter makes it straightforward to narrow results to the right tier for any workload.

Multi-GPU configurations become necessary when a workload exceeds what a single GPU can handle: models that do not fit in one GPU's VRAM, training jobs that need to parallelize across multiple GPUs for speed, or inference at scale that requires multiple GPUs serving concurrent requests. NVLink is a high-bandwidth GPU-to-GPU interconnect that is essential for efficient multi-GPU training: it provides 900 GB/s bidirectional bandwidth on the H100, compared to 64 GB/s over PCIe 5.0. PCIe-connected multi-GPU setups are cheaper and sufficient for inference parallelism where GPUs operate independently on separate requests. GPUPerHour tracks 2x, 4x, and 8x GPU configurations and includes an NVLink filter for users who need high-bandwidth interconnect.

Concrete recommendations by use case: large language model training at 70B+ parameters requires H100 SXM 80GB or H200 in 4-8 GPU configurations with NVLink. LLM fine-tuning with LoRA or QLoRA works well on a single A100 80GB or H100. LLM inference for models up to 13B parameters runs efficiently on an RTX 4090 at a fraction of the H100 cost; for 70B+ models, the A100 80GB or L40S (48GB) is the cost-effective choice. Image and video generation with Stable Diffusion, Flux, or ComfyUI runs well on an RTX 4090 or RTX 3090 with 24GB VRAM. General machine learning training finds its cost-performance optimum on the A100 40GB, which balances VRAM, compute throughput, and price per hour.

Cloud GPU Providers Compared

The cloud GPU market in 2026 includes 28 providers tracked by GPUPerHour, offering a combined 17,721 GPU configurations across 7 continents. The market has expanded rapidly as demand for AI compute has grown: new providers launch regularly, and existing providers continuously add GPU models and regions. Provider choice affects not just price but also availability (popular GPUs like the H100 can have multi-week wait times on some providers), security posture, compliance certifications, and the quality of support.

Hyperscalers (AWS, Google Cloud, Azure, Oracle Cloud) offer enterprise-grade infrastructure with SOC 2, HIPAA, and ISO 27001 compliance, global region coverage spanning 60+ data center locations, and deep integration with storage, networking, and managed ML services. The tradeoff is premium pricing: hyperscaler GPU instances typically cost 2-3x more than equivalent hardware from GPU-first providers. GPU-first providers (Lambda, RunPod, CoreWeave, TensorDock, Jarvis Labs, Vultr) are purpose-built for AI workloads, with lower prices, faster provisioning, pre-installed ML frameworks, and features like persistent volumes and Jupyter notebook access. Peer-to-peer marketplaces (Vast.ai community cloud, Salad) aggregate GPU capacity from distributed sources and offer the lowest prices on the market, but with variable hardware quality and fewer enterprise security guarantees. GPUPerHour categorizes providers by security tier so users can filter based on their requirements.

Price per GPU hour is the headline metric, but several other cost factors determine the real expense of cloud GPU usage. Data egress fees range from free (some GPU-first providers include egress) to $0.12/GB on hyperscalers, which adds up quickly when transferring large datasets or model checkpoints. Storage costs for attached volumes typically run $0.10-$0.20/GB/month. Minimum billing increments vary: some providers bill by the second (ideal for short experiments), while others bill by the hour (which penalizes jobs that finish in 10 minutes). GPU availability is another hidden cost factor, as waiting days for an H100 allocation has a real opportunity cost. GPUPerHour's cost calculator factors in egress, storage, and ingress to show the true cost of each offer, and the availability filter shows only GPUs that can be provisioned immediately.

GPUPerHour maintains detailed profile pages for every tracked provider with real-time pricing, available GPU models, region coverage, and provider metadata. The provider directory lists all 28 providers with current offer counts and pricing ranges. For head-to-head analysis, the comparison tool covers provider matchups across GPU models, pricing, and availability, making it possible to evaluate specific provider pairs against each other on the metrics that matter for a given workload.

Frequently Asked Questions

How much does a cloud GPU cost?

Cloud GPU pricing ranges from $0.01/hr for entry-level GPUs to over $30.00/hr for multi-GPU H100 and H200 clusters. The cost per hour depends on the GPU model, VRAM capacity, number of GPUs in the instance, the cloud provider, and the data center region. On-demand pricing is the most common model, but spot instances can reduce costs by 50-80% for workloads that tolerate interruptions. GPUPerHour tracks 17,721 offers across 28 providers to help find the lowest price for any configuration.

What is the cheapest cloud GPU provider?

The cheapest provider depends on the GPU model and pricing type. For spot pricing, peer-to-peer marketplaces like Vast.ai and community cloud providers often offer the lowest rates, with consumer GPUs available for under $0.10/hr. For on-demand pricing, GPU-first providers like TensorDock, RunPod, and Lambda tend to undercut hyperscalers by 30-60% on equivalent hardware. Prices change throughout the day as supply and demand shift: GPUPerHour's sort-by-price feature shows the current cheapest option across all 28 providers in real time.

Which GPU is best for AI training?

The best GPU for AI training depends on the model size and budget. The NVIDIA H100 SXM 80GB is the standard for large-scale training: it offers 80GB of HBM3 memory, 3,350 GB/s memory bandwidth, and NVLink interconnect for multi-GPU scaling. The A100 80GB remains a strong option at lower cost, particularly for fine-tuning and mid-scale training. For smaller models and LoRA/QLoRA fine-tuning, a single A100 40GB or L40S (48GB) provides good cost-performance. GPUPerHour tracks availability and pricing for all of these across 28 providers.

Which GPU is best for inference?

Inference workloads are typically less VRAM-intensive than training, which means smaller and cheaper GPUs are often the right choice. The NVIDIA RTX 4090 (24GB GDDR6X) handles most inference tasks at a fraction of the H100 price: it is well suited for serving models up to 13B parameters and for image generation workloads. For large language models with 70B+ parameters, the A100 80GB or L40S (48GB) is the cost-effective choice. The H100 is only necessary for the highest-throughput production inference serving very large models or handling thousands of concurrent requests.

What is the difference between on-demand and spot GPU pricing?

On-demand pricing is pay-as-you-go with guaranteed availability: the instance runs until the user terminates it, with no long-term commitment. Spot pricing (also called interruptible or preemptible) offers discounts of 50-80% compared to on-demand rates, but the provider can reclaim the instance with short notice when demand for GPU capacity increases. Reserved pricing locks in a fixed rate for a term of 1-3 months, typically at 20-40% savings compared to on-demand. GPUPerHour's pricing type filter allows comparison within each category across all 28 tracked providers.

How do I rent a GPU in the cloud?

Renting a cloud GPU involves six steps: choose a GPU model that fits the workload's VRAM and compute requirements, compare prices across providers using a comparison tool like GPUPerHour, create an account with the chosen provider, provision an instance through their console or API, connect via SSH or web terminal, and terminate when the workload completes. Most providers bill by the second or minute with no long-term commitment required. The entire process from account creation to a running GPU instance typically takes under 10 minutes with GPU-first providers like RunPod or Lambda.

What is a cloud GPU server?

A cloud GPU server is a remote machine equipped with one or more GPUs that users access over the network. Three main options exist: shared virtual machines are the most common and lowest cost, where multiple users share the physical host but each gets dedicated GPU access. Dedicated instances provide single-tenant hardware with guaranteed performance and no resource contention. Bare metal servers give full control of the physical machine, including direct hardware access and custom driver configurations. Shared GPU instances start at under $0.10/hr, while dedicated GPU servers typically run $2.00-$10.00+/hr depending on the hardware configuration.

What is GPU as a service?

GPU as a service (GPUaaS) is a cloud computing model where providers offer GPU compute resources on demand without requiring users to own or maintain hardware. The model encompasses several delivery formats: on-demand instances where users provision full GPU virtual machines, serverless GPU endpoints where the platform handles scaling and infrastructure, and managed inference APIs where users submit requests without managing any infrastructure. GPUPerHour focuses on the instance-level market, tracking 17,721 configurations across 28 providers that offer direct GPU access.

What is a GPU virtual machine?

A GPU virtual machine is a cloud-hosted VM with one or more physical GPUs attached via PCIe passthrough or vGPU technology. The user gets full access to the GPU's compute capabilities through standard drivers and frameworks such as CUDA for NVIDIA GPUs or ROCm for AMD GPUs. Most cloud GPU providers offer VM-based instances with pre-installed machine learning frameworks including PyTorch, TensorFlow, and JAX, which eliminates setup time. GPUPerHour tracks availability across 17,721 GPU VM configurations spanning 87 different GPU models.

What is serverless GPU computing?

Serverless GPU computing abstracts away instance management: users submit code or API requests and the platform automatically provisions GPU resources, scales to match demand, and bills per request or per second of GPU time. Providers like RunPod Serverless and Vast.ai Autoscaler offer this model. Serverless is best suited for inference workloads with variable or unpredictable traffic, not for long-running training jobs. Some platforms also offer fractional GPU access, where multiple users share a single physical GPU to reduce costs for lightweight workloads that do not require full GPU memory.

How often are GPU prices updated on GPUPerHour?

GPUPerHour updates prices every 60 seconds across all 28 tracked providers. The system monitors 17,721 individual GPU offers for price changes, availability status, and new listings. This update frequency captures spot market price fluctuations that static comparison pages and blog posts miss: spot GPU prices can change multiple times per hour as supply and demand shift. The metrics strip at the top of the page shows the current lowest H100 price and total available GPU inventory in real time.

Which cloud GPU providers does GPUPerHour track?

GPUPerHour currently tracks 28 cloud GPU providers spanning three categories. Hyperscalers include AWS, Google Cloud, Azure, and Oracle Cloud, which offer enterprise-grade security and compliance at premium prices. GPU-first providers include Lambda, RunPod, CoreWeave, TensorDock, Jarvis Labs, and Vultr, which are purpose-built for AI workloads and typically offer lower prices. Marketplace platforms include Vast.ai and Salad, which aggregate GPU capacity from distributed sources. Each provider has a detailed profile page with real-time pricing, available GPU models, and region coverage. The provider list is updated as new providers enter the market.