harbor

Run Jobs

Sandboxes

Horizontal scaling using cloud sandboxes

Containerized agentic tasks can be slow when performing rollouts. This is due to container startup and teardown overhead, waiting for LLM API calls, and waiting for command execution. Horizontal scaling becomes the only viable way to accelerate experimentation, so we recommend using a cloud sandbox provider like Daytona.

Using a cloud sandbox provider shifts command execution to the cloud, making trials I/O bounded rather than compute bounded. This means you can typically parallelize far above your CPU count.

Using a cloud sandbox provider

There are many cloud sandbox providers to choose from. Good options are Daytona, Modal, E2B, Runloop, Tensorlake and Islo.

harbor run -d "<org/name>" \
  -m "<model>" \
  -a "<agent>" \
  -e daytona \
  -n "<n-parallel-trials>"

We run up to 100 trials in parallel on a MacBook Pro with 14 cores.

Removing internet restrictions on Daytona

By default, Daytona accounts have internet access restrictions that can prevent many benchmarks from running correctly. Use the coupon code HARBOR_NETWORK on your Daytona account to remove these restrictions.

Multi-container deployments

Daytona and Islo support multi-container deployments. To use multi-container tasks, include an environment/docker-compose.yaml file in your task definition.

Other cloud sandbox providers (Modal, E2B, Runloop and Tensorlake) do not currently support multi-container environments. For those providers, you will need to use single-container tasks or switch to Daytona, Islo or the local Docker environment.

On this page