Benchmarks

Run the benchmark harness and compare results.

Modal vs E2B vs Daytona vs Cloudflare vs Vercel vs Beam vs Blaxel vs StateSet Sandbox

Whether you're building a coding assistant, running AI-generated scripts, or powering code evaluation pipelines, choosing the right sandbox provider is critical for both developer experience and cost efficiency. In this benchmark, we evaluate eight leading AI code sandbox providers across two key dimensions: developer experience (DX) and pricing.

Benchmark Scope

  • Criteria and tables align with the benchmarks images in docs/benchmarks/.
  • Provider positioning and feature support are based on published documentation.
  • StateSet cold-start and exec latency are intended to be measured via the public API with a 1 vCPU / 2 GiB sandbox and a simple echo command.

TL;DR Summary

  • Blaxel leads for ultra-fast resume; Daytona and E2B are fast general-purpose options.
  • Modal and Beam are best suited for GPU-heavy workloads.
  • Cloudflare and Vercel integrate tightly with their respective ecosystems.
  • StateSet is strongest for self-hosted Kubernetes deployments with multi-language SDKs and Claude Code integration.

Provider Overview

ProviderBest ForSDK LanguagesMax RuntimeCold StartStarting Price
ModalPython ML workloads, high-scalePython, JS/TS (beta), Go (beta)24 hoursSub-second$0.000014/core/sec
E2BQuick AI agent integrationPython, JS/TS24 hours (Pro)~150ms$100 free credits
DaytonaFull-featured dev environmentsPython, TypeScriptUnlimited~90ms$200 free credits
CloudflareEdge execution, global distributionTypeScriptConfigurable2-3 seconds$5/month base
VercelNext.js ecosystem integrationTypeScript5 hours (Pro)Fast$0.128/CPU hour
BeamServerless GPU + sandboxesPython, TS (beta)Unlimited2-3 seconds15 hours free
BlaxelUltra-fast standby resumePython, TypeScriptUnlimited~25ms$200 free credits
StateSet SandboxSelf-hosted Kubernetes sandboxes + Claude CodeTypeScript/JS, Python, Go, Java, Kotlin, PHP, Ruby, Rust, SwiftConfigurable (default 10 min; up to 5 hours)p50 2.3s (API)Free plan; usage-based

SDK & Language Support

ProviderPythonTypeScript/JSGoOther
ModalPrimaryBetaBetaNo
E2BYesYesNoNo
DaytonaYesYesNoNo
CloudflareNoPrimaryNoNo
VercelNoPrimaryNoPython (limited)
BeamPrimaryBetaNoNo
BlaxelYesYesNoNo
StateSet SandboxYesYesYesJava, Kotlin, PHP, Ruby, Rust, Swift

Runtime & Performance

ProviderMax RuntimeCold StartScale to Zero
Modal24 hoursSub-secondYes
E2B24 hours~150msYes
DaytonaUnlimited~90msYes
CloudflareConfigurable2-3sYes
Vercel5 hoursFastYes
BeamUnlimited2-3sYes
BlaxelUnlimited~25msYes
StateSet SandboxConfigurable (default 10 min; up to 5 hours)p50 2.3s (API)Yes

Key Features

ProviderGPU SupportSelf-HostPersistenceCustom Images
ModalExtensiveNoSnapshotsSDK-defined
E2BNoOpen-sourceYesTemplates
DaytonaYesEnterpriseYesDocker/OCI
CloudflareNoNoLimitedDocker
VercelNoNoEphemeralYes
BeamExtensiveOpen-sourceVolumesDocker
BlaxelNoNoSnapshotsYes
StateSet SandboxNot documentedYes (Kubernetes)CheckpointsDocker/OCI

Pricing Comparison (Normalized)

*1 vCPU + 2GB RAM for 1 hour. Modal prices per physical core (= 2 vCPU). Cloudflare requires $5/mo base plan. StateSet rates from docs/GETTING_STARTED.md.*

ProviderHourly CostCPU RateRAM Rate
E2B$0.0828$0.000014/vCPU/s$0.0000045/GiB/s
Daytona$0.0828$0.000014/vCPU/s$0.0000045/GiB/s
Blaxel$0.0828Bundled$0.0000115/GB/s
Cloudflare$0.0900$0.000020/vCPU/s$0.0000025/GiB/s
Modal$0.1193$0.00003942/core/s*$0.00000672/GiB/s
Vercel$0.1492$0.128/CPU-hr$0.0106/GB-hr
Beam$0.2300$0.190/core/hr$0.020/GB/hr
StateSet Sandbox$0.2000$0.128/vCPU-hr$0.0106/GB-hr

Note: StateSet pricing also includes $0.001 per sandbox creation and $0.10/GB network egress.

StateSet Sandbox Overview

  • Self-hosted Kubernetes pods per sandbox with security controls and auto-cleanup.
  • Multi-language SDKs: TypeScript/JS, Python, Go, Java, Kotlin, PHP, Ruby, Rust, Swift.
  • Built-in Claude Code CLI and full file read/write + command execution APIs.
  • Checkpoints, artifacts, and webhooks for persistence and integration workflows.

StateSet Sandbox Benchmarks

Sample: 10 sequential runs, 1 vCPU / 2 GiB, command echo benchmark, base URL https://api.sandbox.stateset.app.

MetricP50P95
Create API (end-to-end)2.26s3.08s
Startup metrics (server)1.93s2.56s
Execute API0.85s1.53s
Stop API0.58s1.08s

Load test harness:

Run benchmarks/k6-sandbox-create.js with API_URL=https://api.sandbox.stateset.app, API_PATH=/api/v1/sandbox, and API_KEY to capture cold-start and exec latency under load:

bash
API_URL=https://api.sandbox.stateset.app \
API_PATH=/api/v1/sandbox \
API_KEY=YOUR_API_KEY \
k6 run benchmarks/k6-sandbox-create.js

Load test sample: 5 VUs, ramp 10s, hold 20s, ramp down 10s, 1 vCPU / 2 GiB, echo benchmark.

MetricP50P95
Create API (end-to-end)9.51s27.79s
Execute API0.71s1.19s
HTTP request duration (all requests)1.08s23.45s

Notes: 0% error rate, 11 iterations, k6 thresholds for http_req_duration and sandbox_create_ms exceeded at p95 under load.

Recommendations by Use Case

  • Blaxel: Best for stateful agents needing fast resume.
  • E2B: Best for quick integration and great SDKs.
  • Modal and Beam: Best for ML/AI inference workloads with GPU requirements.
  • Cloudflare: Best for global edge distribution.
  • Vercel Sandbox: Best for Next.js/Vercel ecosystems.
  • Daytona: Best for full-featured dev environments with LSP support.
  • StateSet Sandbox: Best for self-hosted Kubernetes deployments and Claude Code agents.

Conclusion

The AI sandbox space is rapidly maturing, with each provider carving out distinct niches. Choose based on your primary SDK language, runtime requirements, cold-start sensitivity, and existing platform ecosystem.