Sandboxes

Containerized agentic tasks can be slow when performing rollouts. This is due to container startup and teardown overhead, waiting for LLM API calls, and waiting for command execution. Horizontal scaling becomes the only viable way to accelerate experimentation, so we recommend using a cloud sandbox provider like Daytona.

Using a cloud sandbox provider shifts command execution to the cloud, making trials I/O bounded rather than compute bounded. This means you can typically parallelize far above your CPU count.

Using a cloud sandbox provider

There are many cloud sandbox providers to choose from. Good options are Daytona, Modal, E2B, Runloop, Tensorlake, Islo, CoreWeave Sandboxes, W&B Sandboxes, and LangSmith.

harbor run -d "<org/name>" \
  -m "<model>" \
  -a "<agent>" \
  -e daytona \
  -n "<n-parallel-trials>"

We run up to 100 trials in parallel on a MacBook Pro with 14 cores.

Removing internet restrictions on Daytona

By default, Daytona accounts have internet access restrictions that can prevent many benchmarks from running correctly. Use the coupon code HARBOR_NETWORK on your Daytona account to remove these restrictions.

Multi-container deployments

Daytona, Islo, and LangSmith support multi-container deployments. To use multi-container tasks, include an environment/docker-compose.yaml file in your task definition.

Other cloud sandbox providers (Modal, E2B, Runloop, Tensorlake, CoreWeave Sandboxes, and W&B Sandboxes) do not currently support multi-container environments. For those providers, you will need to use single-container tasks or switch to Daytona, Islo, LangSmith or the local Docker environment.

Sandboxes

Using a cloud sandbox provider

Multi-container deployments

On this page