Benchmarks

Run the benchmark harness and compare results.

Modal vs E2B vs Daytona vs Cloudflare vs Vercel vs Beam vs Blaxel vs StateSet Sandbox

Whether you're building a coding assistant, running AI-generated scripts, or powering code evaluation pipelines, choosing the right sandbox provider is critical for both developer experience and cost efficiency. In this benchmark, we evaluate eight leading AI code sandbox providers across two key dimensions: developer experience (DX) and pricing.

Benchmark Scope

Criteria and tables align with the benchmarks images in docs/benchmarks/.
Provider positioning and feature support are based on published documentation.
StateSet cold-start and exec latency are intended to be measured via the public API with a 1 vCPU / 2 GiB sandbox and a simple echo command.

TL;DR Summary

Blaxel leads for ultra-fast resume; Daytona and E2B are fast general-purpose options.
Modal and Beam are best suited for GPU-heavy workloads.
Cloudflare and Vercel integrate tightly with their respective ecosystems.
StateSet is strongest for self-hosted Kubernetes deployments with multi-language SDKs and Claude Code integration.

Provider Overview

Provider	Best For	SDK Languages	Max Runtime	Cold Start	Starting Price
Modal	Python ML workloads, high-scale	Python, JS/TS (beta), Go (beta)	24 hours	Sub-second	$0.000014/core/sec
E2B	Quick AI agent integration	Python, JS/TS	24 hours (Pro)	~150ms	$100 free credits
Daytona	Full-featured dev environments	Python, TypeScript	Unlimited	~90ms	$200 free credits
Cloudflare	Edge execution, global distribution	TypeScript	Configurable	2-3 seconds	$5/month base
Vercel	Next.js ecosystem integration	TypeScript	5 hours (Pro)	Fast	$0.128/CPU hour
Beam	Serverless GPU + sandboxes	Python, TS (beta)	Unlimited	2-3 seconds	15 hours free
Blaxel	Ultra-fast standby resume	Python, TypeScript	Unlimited	~25ms	$200 free credits
StateSet Sandbox	Self-hosted Kubernetes sandboxes + Claude Code	TypeScript/JS, Python, Go, Java, Kotlin, PHP, Ruby, Rust, Swift	Configurable (default 10 min; up to 5 hours)	p50 2.3s (API)	Free plan; usage-based

SDK & Language Support

Provider	Python	TypeScript/JS	Go	Other
Modal	Primary	Beta	Beta	No
E2B	Yes	Yes	No	No
Daytona	Yes	Yes	No	No
Cloudflare	No	Primary	No	No
Vercel	No	Primary	No	Python (limited)
Beam	Primary	Beta	No	No
Blaxel	Yes	Yes	No	No
StateSet Sandbox	Yes	Yes	Yes	Java, Kotlin, PHP, Ruby, Rust, Swift

Runtime & Performance

Provider	Max Runtime	Cold Start	Scale to Zero
Modal	24 hours	Sub-second	Yes
E2B	24 hours	~150ms	Yes
Daytona	Unlimited	~90ms	Yes
Cloudflare	Configurable	2-3s	Yes
Vercel	5 hours	Fast	Yes
Beam	Unlimited	2-3s	Yes
Blaxel	Unlimited	~25ms	Yes
StateSet Sandbox	Configurable (default 10 min; up to 5 hours)	p50 2.3s (API)	Yes

Key Features

Provider	GPU Support	Self-Host	Persistence	Custom Images
Modal	Extensive	No	Snapshots	SDK-defined
E2B	No	Open-source	Yes	Templates
Daytona	Yes	Enterprise	Yes	Docker/OCI
Cloudflare	No	No	Limited	Docker
Vercel	No	No	Ephemeral	Yes
Beam	Extensive	Open-source	Volumes	Docker
Blaxel	No	No	Snapshots	Yes
StateSet Sandbox	Not documented	Yes (Kubernetes)	Checkpoints	Docker/OCI

Pricing Comparison (Normalized)

*1 vCPU + 2GB RAM for 1 hour. Modal prices per physical core (= 2 vCPU). Cloudflare requires $5/mo base plan. StateSet rates from docs/GETTING_STARTED.md.*

Provider	Hourly Cost	CPU Rate	RAM Rate
E2B	$0.0828	$0.000014/vCPU/s	$0.0000045/GiB/s
Daytona	$0.0828	$0.000014/vCPU/s	$0.0000045/GiB/s
Blaxel	$0.0828	Bundled	$0.0000115/GB/s
Cloudflare	$0.0900	$0.000020/vCPU/s	$0.0000025/GiB/s
Modal	$0.1193	$0.00003942/core/s*	$0.00000672/GiB/s
Vercel	$0.1492	$0.128/CPU-hr	$0.0106/GB-hr
Beam	$0.2300	$0.190/core/hr	$0.020/GB/hr
StateSet Sandbox	$0.2000	$0.128/vCPU-hr	$0.0106/GB-hr

Note: StateSet pricing also includes $0.001 per sandbox creation and $0.10/GB network egress.

StateSet Sandbox Overview

Self-hosted Kubernetes pods per sandbox with security controls and auto-cleanup.
Multi-language SDKs: TypeScript/JS, Python, Go, Java, Kotlin, PHP, Ruby, Rust, Swift.
Built-in Claude Code CLI and full file read/write + command execution APIs.
Checkpoints, artifacts, and webhooks for persistence and integration workflows.

StateSet Sandbox Benchmarks

Sample: 10 sequential runs, 1 vCPU / 2 GiB, command echo benchmark, base URL https://api.sandbox.stateset.app.

Metric	P50	P95
Create API (end-to-end)	2.26s	3.08s
Startup metrics (server)	1.93s	2.56s
Execute API	0.85s	1.53s
Stop API	0.58s	1.08s

Load test harness:

Run benchmarks/k6-sandbox-create.js with API_URL=https://api.sandbox.stateset.app, API_PATH=/api/v1/sandbox, and API_KEY to capture cold-start and exec latency under load:

bash

API_URL=https://api.sandbox.stateset.app \
API_PATH=/api/v1/sandbox \
API_KEY=YOUR_API_KEY \
k6 run benchmarks/k6-sandbox-create.js

Load test sample: 5 VUs, ramp 10s, hold 20s, ramp down 10s, 1 vCPU / 2 GiB, echo benchmark.

Metric	P50	P95
Create API (end-to-end)	9.51s	27.79s
Execute API	0.71s	1.19s
HTTP request duration (all requests)	1.08s	23.45s

Notes: 0% error rate, 11 iterations, k6 thresholds for http_req_duration and sandbox_create_ms exceeded at p95 under load.

Recommendations by Use Case

Blaxel: Best for stateful agents needing fast resume.
E2B: Best for quick integration and great SDKs.
Modal and Beam: Best for ML/AI inference workloads with GPU requirements.
Cloudflare: Best for global edge distribution.
Vercel Sandbox: Best for Next.js/Vercel ecosystems.
Daytona: Best for full-featured dev environments with LSP support.
StateSet Sandbox: Best for self-hosted Kubernetes deployments and Claude Code agents.

Conclusion

The AI sandbox space is rapidly maturing, with each provider carving out distinct niches. Choose based on your primary SDK language, runtime requirements, cold-start sensitivity, and existing platform ecosystem.