What Modal Does
Modal is a serverless GPU and CPU platform designed so that Python developers can push compute-heavy workloads — model inference, training, batch pipelines, scheduled jobs, web endpoints — to the cloud without writing Dockerfiles, Kubernetes manifests, or YAML. The entire deployment surface is Python decorators: annotate a function with @app.function(gpu='H100'), import it locally, and Modal packages the code, builds the container, provisions a GPU, runs the function, and tears it down when idle. The result is a developer experience where 'local script' and 'cloud endpoint' are the same file.
Cold Starts, GPU Catalog, and Runtime
Cold starts are Modal's signature technical claim. The platform uses a custom container runtime that snapshots prepared images and mounts them directly, reaching sub-four-second cold starts for typical workloads and sub-one-second warm starts for frequently used functions. For inference APIs this is the difference between a sluggish first request and a responsive one, and it erases most of the usual argument for reserved GPU capacity on interactive endpoints.
The GPU menu is broad and current: NVIDIA B200, H200, H100, A100 40/80GB, L40S, A10, L4, and T4 are all available on demand, with multi-GPU configurations for larger models. CPU-only functions are first-class too, which matters for data-prep steps that do not need a GPU. Concurrency, timeouts, keep-warm pools, volumes, secrets, and scheduled cron jobs are all controlled from the same Python decorators, so teams ship end-to-end pipelines without learning a separate orchestration language.
Pricing and the Multiplier Question
Modal's pricing is per-second with no minimum commitment and a $30/month free-compute allowance on every account. Base GPU rates are competitive — H100 at roughly $3.95/hour, A100 80GB around $2.10/hour — and CPU compute is priced aggressively. The platform charges only while a function is executing, not while containers are idle or cold-starting, which for bursty workloads lands cheaper than always-on GPU reservations.
The complication is the regional multiplier system. Non-preemptible US workloads run at a 3.75x multiplier on top of base rates, meaning the advertised H100 price is not what most production deployments actually pay. Preemptible and EU regions are cheaper, and the multipliers are documented, but teams used to RunPod's flat pricing sometimes get surprised by the first invoice. For cost-sensitive batch training or long-running jobs, benchmarking Modal against RunPod, Paperspace, and Lambda Labs on your real workload is worth the afternoon.
Developer Experience and Integrations
The developer experience is Modal's strongest asset and the reason it dominates AI infra in indie and startup circles. modal run executes a function locally-then-remote, modal deploy ships it as a persistent endpoint, modal shell drops into a live container for debugging, and the web dashboard shows logs, GPU usage, and billing in one place. Images, mounts, volumes, and NFS-style shared storage all compose cleanly in Python.