Harbor

Tutorial

In this tutorial, we will walk through creating a simple Harbor task.

Step 0: Install Harbor

Follow our installation instructions to install Harbor, which involves installing the package and its dependencies.

Step 1: Create your task

Now that Harbor is installed, run the following command to create a new task directory with the required files:

harbor tasks init ssh-key-pair

This will generate a task directory with the following structure:

ssh-key-pair/
├── instruction.md         # Task instructions
├── task.toml              # Configuration and metadata
├── environment/
│   └── Dockerfile         # Container definition
├── solution/
│   └── solve.sh           # Solution script
└── tests/
    │── test_outputs.py    # Pytest unit tests
    └── test.sh            # Test verification script

Step 2: Write the task instructions

Open the instruction.md file in your task directory and add the task description:

ssh-key-pair/instruction.md
# SSH Key Pair Generation

Generate an SSH key pair in the files `~/.ssh/id_rsa` and `~/.ssh/id_rsa.pub`.

Don't make them password protected.

Step 3: Configure task metadata

Open the task.toml file and configure your task metadata:

ssh-key-pair/task.toml
version = "1.0"

[metadata]
author_name = "Your Name"
author_email = "your.email@example.com"
difficulty = "easy"
category = "system-administration"
tags = ["ssh", "cryptography", "linux"]

[verifier]
timeout_sec = 120.0

[agent]
timeout_sec = 120.0

[environment]
build_timeout_sec = 600.0
cpus = 1
memory = "2G"
storage = "10G"

Step 4: Create the task environment

Open the Dockerfile in the environment/ directory that was generated:

ssh-key-pair/environment/Dockerfile
FROM ubuntu:24.04

# Create working directory
WORKDIR /app

# Install openssh-client for the task
RUN apt-get update && apt-get install -y openssh-client && rm -rf /var/lib/apt/lists/*

This Dockerfile defines the environment an agent will interact with through the terminal. Add any dependencies your task requires here.

Step 5: Test your solution idea

Before writing the automated solution, you'll want to manually verify your approach works. Build and run the container interactively:

harbor tasks start-env -p ssh-key-pair -e docker -a -i # or use daytona or modal

Inside the container, test that the following command solves the task without requiring interactive input:

ssh-keygen -t rsa -f ~/.ssh/id_rsa -N ""

Verify the keys were created correctly:

ls -l ~/.ssh/id_rsa*

You should see:

~/.ssh/
├── id_rsa      (-rw-------  600  private key)
└── id_rsa.pub  (-rw-r--r--  644  public key)

Exit the container with exit or Ctrl+D.

Step 6: Write the solution script

Take the command you verified in the previous step and create the solution script. This file will be used by the Oracle agent to ensure the task is solvable.

Update the solution/solve.sh file:

ssh-key-pair/solution/solve.sh
#!/bin/bash

ssh-keygen -t rsa -f ~/.ssh/id_rsa -N ""

Make sure the script is executable:

chmod +x ssh-key-pair/solution/solve.sh

Step 7: Create the test script

The test script verifies whether the agent successfully completed the task. It must produce a reward file in /logs/verifier/.

Update the tests/test.sh file:

ssh-key-pair/tests/test.sh
#!/bin/bash

apt-get update
apt-get install -y curl

curl -LsSf https://astral.sh/uv/0.9.5/install.sh | sh

source $HOME/.local/bin/env

# Run pytest tests
uvx \
  --python 3.12 \
  --with pytest==8.4.1 \
  pytest /tests/test_outputs.py

# Check exit code and write reward
if [ $? -eq 0 ]; then
  echo 1 > /logs/verifier/reward.txt
else
  echo 0 > /logs/verifier/reward.txt
fi

Now create the Python test file:

ssh-key-pair/tests/test_outputs.py
import os
from pathlib import Path


def test_key_files_exist() -> None:
    """Test that both private and public key files exist."""
    private_key = Path.home() / ".ssh" / "id_rsa"
    public_key = Path.home() / ".ssh" / "id_rsa.pub"

    assert private_key.exists(), "Private key file does not exist"
    assert public_key.exists(), "Public key file does not exist"


def test_key_file_permissions() -> None:
    """Test that the key files have correct permissions."""
    private_key = Path.home() / ".ssh" / "id_rsa"
    public_key = Path.home() / ".ssh" / "id_rsa.pub"

    private_perms = oct(os.stat(private_key).st_mode)[-3:]
    public_perms = oct(os.stat(public_key).st_mode)[-3:]

    assert private_perms == "600", (
        f"Private key has incorrect permissions: {private_perms}"
    )
    assert public_perms == "644", (
        f"Public key has incorrect permissions: {public_perms}"
    )


def test_key_format() -> None:
    """Test that the public key has the correct RSA format."""
    public_key = Path.home() / ".ssh" / "id_rsa.pub"

    with open(public_key, 'r') as f:
        content = f.read()

    assert content.startswith("ssh-rsa "), "Public key does not start with 'ssh-rsa'"
    assert len(content.split()) >= 2, "Public key format is invalid"

Step 8: Test your task with the Oracle agent

Run the following command to verify your task is solved by the solution script:

harbor run -p ssh-key-pair -a oracle

If successful, you should see output indicating the task was completed and the reward was 1.

Troubleshooting

If the Oracle agent fails, check:

  • The solution script has execute permissions
  • The Dockerfile installs all required dependencies
  • The test script correctly writes to /logs/verifier/reward.txt
  • The paths in your tests match the paths in your solution

Step 9 (Optional): Test with a real agent

Test your task with an actual AI agent to see if it can solve the task. For example, using Terminus with Claude:

harbor run \
  -p ssh-key-pair \
  -a terminus-2 \
  -m anthropic/claude-haiku-4-5

Step 10: Contribute your task!

Congratulations! You've created your first Harbor task. Your task is now ready to be used for benchmarking AI agents!