RL

Performing RL on containerized agentic tasks is a common use case but difficult to implement and scale. Harbor provides a simple interface for training on containerized agentic environments.

We plan to integrate with most popular RL frameworks. For now, we have worked with the SkyRL team to implement their interface for RL on Harbor tasks.

In general, performing RL on Harbor tasks requires generating rollouts and recording tokens and rewards. This usually involves implementing an RL framework's rollout interface.

We recommend using the TrialConfig or JobConfig classes to configure the rollouts. We also recommend using cloud sandboxes for scaling rollouts and reducing startup/teardown overhead.

Rollout interfaces typically receive batches of data to be used for the rollouts. We recommend using the TaskConfig class as the batch object. These configs can point to remote or local tasks.

task_configs = [
    TaskConfig(
        path="cancel-async-tasks",
        git_url="https://<version-control-url>/<org>/<repo>.git",
        git_commit_id="<hash>"
    ), # Remote task
    TaskConfig(
        path="tasks/fix-dockerfile-syntax",
    ), # Local task
]

Example of a rollout interface

rollout_interface.py

from pathlib import Path

from pydantic import BaseModel

from harbor.job import Job
from harbor.models.environment_type import EnvironmentType
from harbor.models.job.config import JobConfig, OrchestratorConfig
from harbor.models.trial.config import EnvironmentConfig, TaskConfig

from your_rl_framework import SomeRLFrameworkRolloutInterface, Rollout


class HarborRolloutInterface(SomeRLFrameworkRolloutInterface):
    async def run(self, task_ids: list[TaskId]) -> list[Rollout]:
        job = Job(
            config=JobConfig(
                jobs_dir=Path("jobs"),
                environment=EnvironmentConfig(
                    type=EnvironmentType.DAYTONA,
                ),
                agents=[
                    AgentConfig(
                        name=AgentName.TERMINUS_2,
                        model_name="hosted_vllm/<model-name>",
                        kwargs={
                            "base_url": "https://<vllm-server-url>",
                        }
                    )
                ]
                orchestrator=OrchestratorConfig(
                    n_concurrent_trials=32,
                ),
                tasks=task_configs,
            ),
        )

        result = await job.run()

        rollouts = []
        for trial_result in result.trial_results:
            reward = (
                trial_result.verifier_result.rewards.get("reward", 0)
                if trial_result.verifier_result and trial_result.verifier_result.rewards
                else 0
            )

            if (
                trial_result.agent_result
                and trial_result.agent_result.metadata
                and "token_ids" in trial_result.agent_result.metadata
                and "mask_ids" in trial_result.agent_result.metadata
            ):
                token_ids = trial_result.agent_result.metadata["token_ids"]
                mask_ids = trial_result.agent_result.metadata["mask_ids"]
            else:
                raise ValueError(
                    f"Missing token_ids or mask_ids for trial {trial_result.trial_name}"
                )

            rollout = Rollout(
                reward=reward,
                token_ids=token_ids,
                mask_ids=mask_ids,
            )

            rollouts.append(rollout)

        return rollouts

Collection tokens from agents

There are two strategies for collecting tokens:

Intercepting tokens from a vLLM server
Returning tokens as part of the agent result metadata

(1) assumes you are using a vLLM server to generate tokens and have a framework that handles the interception. (2) assumes your agent is configured to return tokens as part of the agent result metadata. We are working to add flags to Terminus 2 to include this information. If you plan to train using a custom harness, be sure to include this information in the agent result metadata.

RL

Example of a rollout interface

Collection tokens from agents

On this page