sandrun benchmark --all
$ sandrun test --all --free-tier

Every sandbox,
benchmarked.

Real benchmarks of every remote execution environment for AI agents. Same task, every platform, side-by-side.

12
Benchmarked
39+
Tracked
3ms
Fastest exec
7
Isolation types
$0
All free tier

Leaderboard

Ranked by total end-to-end time. Same standardized task on every platform.

1 Cloudflare Workers
105ms
2 Miniflare (local)
157ms
3 Godbolt (JS/V8)
349ms
4 Deno
555ms
5 Godbolt (Python)
633ms
6 Codapi
974ms
7 Go Playground
1.1s
8 Codapi (Go)
1.4s
9 Rust Playground
1.5s
10 GKE Agent Sandbox
3.2s
11 Wandbox
5.0s
12 Cloud Run (Job)
90s

Total time = network + cold start + execution. Measured from GCE us-central1-a. Lower is better.

Benchmark details

Code snippets and real outputs from each platform.

Cloudflare Workers

V8 isolate 0ms cold start
105ms total
worker.js
export default {
  async fetch(request) {
    const start = Date.now();
    const result = {
      runtime: "Cloudflare-Workers",
      math_test: Array.from(
        {length: 100},
        (_, i) => i + 1
      ).reduce((a, b) => a + b),
      dns_works: (await fetch(
        "https://1.1.1.1"
      )).ok,
      exec_ms: Date.now() - start,
    };
    return Response.json(result);
  }
};
output
{
  "runtime": "Cloudflare-Workers",
  "math_test": 5050,
  "dns_works": true,
  "file_io_works": false,
  "exec_ms": 3
}
Cold start
~20ms
Execution
3ms
Network
Yes

GKE Agent Sandbox

gVisor K8s-native
3.2s total
test_gke_sandbox.py
from k8s_agent_sandbox import SandboxClient

with SandboxClient(
    template_name="python-runtime-template",
    namespace="default",
) as sandbox:
    sandbox.write("task.py", code)
    result = sandbox.run("python3 task.py")
    print(result.stdout)
output
{
  "python_version": "3.11.15",
  "platform": "Linux-4.4.0 (gVisor)",
  "math_test": 5050,
  "file_io_works": true,
  "dns_works": true,
  "pip_available": true,
  "execution_time_ms": 1240.66
}
Cold start
1.44s
Execution
1.51s
Kernel
gVisor 4.4

Deno

V8 isolate open-source
555ms total
test_deno.ts
const result = {
  runtime: `Deno ${Deno.version.deno}`,
  v8: Deno.version.v8,
  typescript: Deno.version.typescript,
  math_test: Array.from(
    {length: 100},
    (_, i) => i + 1
  ).reduce((a, b) => a + b),
};

await Deno.writeTextFile("/tmp/test", "hello");
console.log(JSON.stringify(result));
output
{
  "runtime": "Deno 2.7.9",
  "v8": "14.7.173.7-rusty",
  "typescript": "5.9.2",
  "math_test": 5050,
  "file_io_works": true,
  "dns_works": true
}
Cold start
86ms
Execution
469ms
Runtime
V8 14.7

Codapi

Container no auth
974ms total
curl (REST API)
$ curl -s https://api.codapi.org/v1/exec \
  -d '{
    "sandbox": "python",
    "command": "run",
    "files": {
      "": "import json, platform\nprint(json.dumps({\n  \"python\": platform.python_version(),\n  \"math_test\": sum(range(1,101)),\n  \"file_io\": True\n}))"
    }
  }'
output
{
  "python_version": "3.14.2",
  "platform": "Linux-6.1.0-amd64",
  "math_test": 5050,
  "file_io_works": true,
  "dns_works": false,
  "execution_time_ms": 11.92
}
Execution
12ms
Runtime
Py 3.14.2
Network
No

Google Cloud Run

gVisor 2-layer managed
~90s total
deploy + execute
# Build + deploy container
$ gcloud builds submit \
    --tag gcr.io/project/sandrun-test

# Execute as a Cloud Run Job
$ gcloud run jobs create test \
    --image gcr.io/project/sandrun-test
$ gcloud run jobs execute test --wait
output
{
  "python_version": "3.12.13",
  "platform": "Linux-6.9.12 (gVisor)",
  "math_test": 5050,
  "file_io_works": true,
  "dns_works": true,
  "pip_available": true
}
Build
13s
Execution
1.55s
Kernel
gVisor 6.9

Godbolt Compiler Explorer

Container no auth
349-633ms
curl (REST API)
$ curl -s https://godbolt.org/api/compiler\
  /python312/compile \
  -H "Content-Type: application/json" \
  -d '{
    "source": "import json, platform\nprint(json.dumps({\n  \"version\": platform.python_version(),\n  \"math\": sum(range(1,101))\n}))",
    "options": {
      "executeParameters": { "args": [] }
    }
  }'
output (Python 3.12)
{
  "python_version": "3.12.1",
  "platform": "Linux-6.8.0-AWS",
  "math_test": 5050,
  "file_io_works": true,
  "dns_works": false,
  "execution_time_ms": 59.03
}
Python
633ms
JS (V8)
349ms
Languages
40+

More results

Go Playground
Total1.1s
Runtimego1.26.1
CPUs8
NetworkNo
AuthNone
{"runtime":"go1.26.1","math_test":5050,"cpus":8}
Rust Playground
Total1.5s
Compile0.66s
ModeRelease
NetworkNo
AuthNone
{"runtime":"Rust stable","math_test":5050}
Wandbox
Total5.0s
Execution172ms
RuntimePy 3.10.15
NetworkYes
AuthNone
{"python":"3.10.15","math_test":5050,"dns":true}

Full comparison

All 12 benchmarked platforms at a glance.

# Platform Isolation Total Exec File I/O Network Auth
1Cloudflare WorkersV8 isolate105ms3msNoYesAPI key
2Miniflare (local)V8 (workerd)157ms132msNoYesNone
3Godbolt (JS)Container349ms13msNoNoNone
4DenoV8 + perms555ms469msYesYesNone
5Godbolt (Python)Container633ms59msYesNoNone
6CodapiContainer974ms12msYesNoNone
7Go PlaygroundContainer1.1s~0msNoNoNone
8Codapi (Go)Container1.4s~0msNoNoNone
9Rust PlaygroundContainer1.5s~0msNoNoNone
10GKE Agent SandboxgVisor3.2s1.5sYesYesK8s
11WandboxContainer5.0s172msYesYesNone
12Cloud Run (Job)gVisor 2-layer~90s1.5sYesYesgcloud

The sandbox universe

Every remote execution environment we're tracking. Benchmarks coming for platforms marked pending.

AI Agent-Native Sandboxes

GKE Agent Sandbox
gVisor, K8s CRDs, Python SDK, WarmPool
E2B
Firecracker, ~200ms cold start, BYOC
Modal
gVisor, GPU (A100/H100), $30 free/mo
Daytona
Docker, ~90ms creation, MCP support
Runloop
Custom hypervisor, ~100ms, VPC deploy
Morph Cloud
microVM, instant fork, snapshot/restore
Microsandbox
libkrun microVM, self-hosted, OSS
Upstash Box
Container, infinite lifespan, freeze/wake
OpenSandbox
Kata/gVisor, self-hosted, OSS

Serverless & Edge

Cloudflare Workers
V8 isolate, 0ms cold start, 105ms total
Deno
V8 + permissions, 555ms total, TS native
Cloud Run
gVisor 2-layer, any container, ~90s total
Vercel Sandbox
Firecracker, v0/AI SDK integration
AWS Lambda
Firecracker, ~125ms cold start
Fly.io / Sprites
Firecracker, GPU support, global edge

Code Playgrounds (Free, No Auth)

Codapi
12ms exec, Python 3.14.2, Go, 30+ langs
Godbolt
349ms (JS), 633ms (Py), 40+ languages
Go Playground
1.1s total, go1.26.1, 8 CPUs, official
Rust Playground
1.5s total, stable/nightly, official
Wandbox
5.0s total, DNS works, Python 3.10
Val Town
Eval API deprecated (404)

Cloud Development Environments

CodeSandbox
Firecracker, 500ms snapshot resume
Gitpod
Container, CDEs, VS Code/JetBrains
Bunnyshell
Firecracker, ~100ms, BYOC
Koyeb
Firecracker, 250ms cold start
Northflank
Kata/FC/gVisor, GPU, BYOC
Vertex AI / Bedrock
Managed code exec for LLM agents
Benchmarked
Pending
Failed/Deprecated

Methodology

The task

Every platform runs the same standardized task: compute sum(1..100), test file I/O, test DNS resolution, and report platform info. Results are returned as JSON.

The environment

All benchmarks run from a single GCE VM (e2-standard-4) in us-central1-a. Total time includes network latency. March 29, 2026.

What we measure

Total time = wall clock from request to response. Exec time = self-reported code execution. Cold start = total minus warm average (where applicable).

Limitations

Single-region, single-run benchmarks. Doesn't test concurrent load, GPU workloads, or long-running sessions. Platforms requiring OAuth aren't benchmarked yet.