Harbor Cookbook
March 27, 2026
Introducing the Harbor cookbook: recipes for building Harbor tasks and optimization loops
If you've built a Harbor task before, you've probably spent time figuring out the same things everyone else does: multi-container setups, simulated users, adding MCP tools, implementing computer use environments, and more.
We built Harbor to make agent evals simple. That’s why we’re releasing the Harbor Cookbook. It contains a collection of realistic, ready-to-run examples of how to build evals and optimize agents with Harbor.
What’s inside
We recommend you give your coding agent context on the one closest to what you're building and adapt from there.
| Recipe | What it does |
|---|---|
| simple-task | Minimal single-container task |
| multi-container | Docker Compose task where the agent interacts with a locally hosted REST API |
| mcp-tools | Giving the agent custom tools via a locally hosted FastMCP server |
| skills | Recipes for including skills in a Harbor task |
| multi-reward | Multiple independent verifiers each producing their own score |
| simulated-user | Agent discovers requirements by talking to a simulated user |
| computer-use-ubuntu | Computer use reference implementation on an Ubuntu virtual desktop |
| computer-use-windows | Computer use reference implementation on a remote Windows desktop (Daytona) |
| dns-blacklisting | Network-level hostname blacklisting with exact, wildcard, and regex rules |
Beyond evals: optimizing agents
Harbor tasks produce a reward, which means the same datasets you use for evals can also serve as training environments. The cookbook includes two recipes that demonstrate this: one example pairs Harbor with GEPA to optimize an agent harness on MedAgentBench. The other is the Harbor integration contributed by Thinking Machines, which uses Harbor tasks as RL environments through the Tinker SDK.
We welcome feedback on which examples to build next and how to improve Harbor. We’re actively developing the Harbor framework to improve its ability to integrate into optimization loops.
Get started
Clone the cookbook and then run:
uv tool install harborharbor run -p harbor_cookbook/recipes/simple-task -a "<agent>" -m "<model>"We welcome community contributions and will keep adding examples of interesting use cases for Harbor. Our goal is to make this the starting point for all things Harbor.
The Harbor Team