Benchmarks
Run the benchmark harness and compare results.
Modal vs E2B vs Daytona vs Cloudflare vs Vercel vs Beam vs Blaxel vs StateSet Sandbox
Whether you're building a coding assistant, running AI-generated scripts, or powering code evaluation pipelines, choosing the right sandbox provider is critical for both developer experience and cost efficiency. In this benchmark, we evaluate eight leading AI code sandbox providers across two key dimensions: developer experience (DX) and pricing.
Benchmark Scope
- Criteria and tables align with the benchmarks images in
docs/benchmarks/. - Provider positioning and feature support are based on published documentation.
- StateSet cold-start and exec latency are intended to be measured via the public API with a 1 vCPU / 2 GiB sandbox and a simple
echocommand.
TL;DR Summary
- Blaxel leads for ultra-fast resume; Daytona and E2B are fast general-purpose options.
- Modal and Beam are best suited for GPU-heavy workloads.
- Cloudflare and Vercel integrate tightly with their respective ecosystems.
- StateSet is strongest for self-hosted Kubernetes deployments with multi-language SDKs and Claude Code integration.
Provider Overview
| Provider | Best For | SDK Languages | Max Runtime | Cold Start | Starting Price |
|---|---|---|---|---|---|
| Modal | Python ML workloads, high-scale | Python, JS/TS (beta), Go (beta) | 24 hours | Sub-second | $0.000014/core/sec |
| E2B | Quick AI agent integration | Python, JS/TS | 24 hours (Pro) | ~150ms | $100 free credits |
| Daytona | Full-featured dev environments | Python, TypeScript | Unlimited | ~90ms | $200 free credits |
| Cloudflare | Edge execution, global distribution | TypeScript | Configurable | 2-3 seconds | $5/month base |
| Vercel | Next.js ecosystem integration | TypeScript | 5 hours (Pro) | Fast | $0.128/CPU hour |
| Beam | Serverless GPU + sandboxes | Python, TS (beta) | Unlimited | 2-3 seconds | 15 hours free |
| Blaxel | Ultra-fast standby resume | Python, TypeScript | Unlimited | ~25ms | $200 free credits |
| StateSet Sandbox | Self-hosted Kubernetes sandboxes + Claude Code | TypeScript/JS, Python, Go, Java, Kotlin, PHP, Ruby, Rust, Swift | Configurable (default 10 min; up to 5 hours) | p50 2.3s (API) | Free plan; usage-based |
SDK & Language Support
| Provider | Python | TypeScript/JS | Go | Other |
|---|---|---|---|---|
| Modal | Primary | Beta | Beta | No |
| E2B | Yes | Yes | No | No |
| Daytona | Yes | Yes | No | No |
| Cloudflare | No | Primary | No | No |
| Vercel | No | Primary | No | Python (limited) |
| Beam | Primary | Beta | No | No |
| Blaxel | Yes | Yes | No | No |
| StateSet Sandbox | Yes | Yes | Yes | Java, Kotlin, PHP, Ruby, Rust, Swift |
Runtime & Performance
| Provider | Max Runtime | Cold Start | Scale to Zero |
|---|---|---|---|
| Modal | 24 hours | Sub-second | Yes |
| E2B | 24 hours | ~150ms | Yes |
| Daytona | Unlimited | ~90ms | Yes |
| Cloudflare | Configurable | 2-3s | Yes |
| Vercel | 5 hours | Fast | Yes |
| Beam | Unlimited | 2-3s | Yes |
| Blaxel | Unlimited | ~25ms | Yes |
| StateSet Sandbox | Configurable (default 10 min; up to 5 hours) | p50 2.3s (API) | Yes |
Key Features
| Provider | GPU Support | Self-Host | Persistence | Custom Images |
|---|---|---|---|---|
| Modal | Extensive | No | Snapshots | SDK-defined |
| E2B | No | Open-source | Yes | Templates |
| Daytona | Yes | Enterprise | Yes | Docker/OCI |
| Cloudflare | No | No | Limited | Docker |
| Vercel | No | No | Ephemeral | Yes |
| Beam | Extensive | Open-source | Volumes | Docker |
| Blaxel | No | No | Snapshots | Yes |
| StateSet Sandbox | Not documented | Yes (Kubernetes) | Checkpoints | Docker/OCI |
Pricing Comparison (Normalized)
*1 vCPU + 2GB RAM for 1 hour. Modal prices per physical core (= 2 vCPU). Cloudflare requires $5/mo base plan. StateSet rates from docs/GETTING_STARTED.md.*
| Provider | Hourly Cost | CPU Rate | RAM Rate |
|---|---|---|---|
| E2B | $0.0828 | $0.000014/vCPU/s | $0.0000045/GiB/s |
| Daytona | $0.0828 | $0.000014/vCPU/s | $0.0000045/GiB/s |
| Blaxel | $0.0828 | Bundled | $0.0000115/GB/s |
| Cloudflare | $0.0900 | $0.000020/vCPU/s | $0.0000025/GiB/s |
| Modal | $0.1193 | $0.00003942/core/s* | $0.00000672/GiB/s |
| Vercel | $0.1492 | $0.128/CPU-hr | $0.0106/GB-hr |
| Beam | $0.2300 | $0.190/core/hr | $0.020/GB/hr |
| StateSet Sandbox | $0.2000 | $0.128/vCPU-hr | $0.0106/GB-hr |
Note: StateSet pricing also includes $0.001 per sandbox creation and $0.10/GB network egress.
StateSet Sandbox Overview
- Self-hosted Kubernetes pods per sandbox with security controls and auto-cleanup.
- Multi-language SDKs: TypeScript/JS, Python, Go, Java, Kotlin, PHP, Ruby, Rust, Swift.
- Built-in Claude Code CLI and full file read/write + command execution APIs.
- Checkpoints, artifacts, and webhooks for persistence and integration workflows.
StateSet Sandbox Benchmarks
Sample: 10 sequential runs, 1 vCPU / 2 GiB, command echo benchmark, base URL https://api.sandbox.stateset.app.
| Metric | P50 | P95 |
|---|---|---|
| Create API (end-to-end) | 2.26s | 3.08s |
| Startup metrics (server) | 1.93s | 2.56s |
| Execute API | 0.85s | 1.53s |
| Stop API | 0.58s | 1.08s |
Load test harness:
Run benchmarks/k6-sandbox-create.js with API_URL=https://api.sandbox.stateset.app, API_PATH=/api/v1/sandbox, and API_KEY to capture cold-start and exec latency under load:
API_URL=https://api.sandbox.stateset.app \
API_PATH=/api/v1/sandbox \
API_KEY=YOUR_API_KEY \
k6 run benchmarks/k6-sandbox-create.jsLoad test sample: 5 VUs, ramp 10s, hold 20s, ramp down 10s, 1 vCPU / 2 GiB, echo benchmark.
| Metric | P50 | P95 |
|---|---|---|
| Create API (end-to-end) | 9.51s | 27.79s |
| Execute API | 0.71s | 1.19s |
| HTTP request duration (all requests) | 1.08s | 23.45s |
Notes: 0% error rate, 11 iterations, k6 thresholds for http_req_duration and sandbox_create_ms exceeded at p95 under load.
Recommendations by Use Case
- Blaxel: Best for stateful agents needing fast resume.
- E2B: Best for quick integration and great SDKs.
- Modal and Beam: Best for ML/AI inference workloads with GPU requirements.
- Cloudflare: Best for global edge distribution.
- Vercel Sandbox: Best for Next.js/Vercel ecosystems.
- Daytona: Best for full-featured dev environments with LSP support.
- StateSet Sandbox: Best for self-hosted Kubernetes deployments and Claude Code agents.
Conclusion
The AI sandbox space is rapidly maturing, with each provider carving out distinct niches. Choose based on your primary SDK language, runtime requirements, cold-start sensitivity, and existing platform ecosystem.