harbor

Run Jobs

Sandboxes

Horizontal scaling using cloud sandboxes

Containerized agentic tasks can be slow when performing rollouts. This is due to container startup and teardown overhead, waiting for LLM API calls, and waiting for command execution. Horizontal scaling becomes the only viable way to accelerate experimentation, so we recommend using a cloud sandbox provider like Daytona.

Using a cloud sandbox provider shifts command execution to the cloud, making trials I/O bounded rather than compute bounded. This means you can typically parallelize far above your CPU count.

Using a cloud sandbox provider

There are many cloud sandbox providers to choose from. Good options are Daytona, Modal, E2B, Runloop, Tensorlake, Islo, CoreWeave Sandboxes, W&B Sandboxes, LangSmith, Blaxel, Novita Sandbox, Amazon EC2, OpenSandbox, and Beam.

harbor run -d "<org/name>" \
  -m "<model>" \
  -a "<agent>" \
  -e daytona \
  -n "<n-parallel-trials>"

We run up to 100 trials in parallel on a MacBook Pro with 14 cores.

Removing internet restrictions on Daytona

By default, Daytona accounts have internet access restrictions that can prevent many benchmarks from running correctly. Use the coupon code HARBOR_NETWORK on your Daytona account to remove these restrictions.

Multi-container deployments

Daytona, EC2, Islo, LangSmith, Blaxel, Novita Sandbox, and Beam support multi-container deployments. To use multi-container tasks, include an environment/docker-compose.yaml file in your task definition.

Other cloud sandbox providers (Modal, E2B, Runloop, Tensorlake, CoreWeave Sandboxes, W&B Sandboxes, and OpenSandbox) do not currently support multi-container environments. For those providers, you will need to use single-container tasks or switch to Daytona, EC2, Islo, LangSmith, Blaxel, Novita Sandbox, Beam, or the local Docker environment.

On this page