The Harbor registry is getting an upgrade

One of Harbor's core principles is that environments should be portable and easily flow between parties. We designed the Harbor task format, and released the open-source package to make it easy for users to create environments and run rollouts. We created the Harbor registry to simplify distributing Harbor tasks and datasets.

Today we're announcing our next step on that journey: a self-service Harbor registry.

The Harbor registry makes distributing Harbor tasks and datasets simple. Using the Harbor CLI, you can now publish your data and share it with anyone. Similar to all other Harbor features, the task is the atomic unit of the registry. Datasets are collection of tasks at specific versions. Registered data can be private or public.

Publishing a task or dataset is a simple three-step process:

For a full walkthrough, see Publishing a dataset.

Create a task or dataset

To create a task, run

harbor init --task hello/task

You can then edit the task files to implement your task.

You can then initialize a dataset in the same directory by running

harbor init --dataset hello/dataset

which creates a dataset.toml manifest and automatically adds any tasks from the directory to the manifest.

You can add other tasks from the registry or locally by running

harbor add org/task --to "<path/to/dataset>" # or harbor add "<path/to/task>"

Publish to the registry

First, log in or create an account using

harbor auth login

Then publish the task using

harbor publish "<path>" # optionally add --public to make it public

Harbor automatically publishes tasks and datasets at the path.

Run the task or dataset

Once your task is published, anyone in your org (or if it's public, any Harbor user) can run

harbor run -d hello/dataset

harbor run -t hello/world

to run the dataset or task.

We encourage Harbor users to distribute benchmarks and datasets through the Harbor registry.

We don't anticipate that task development itself happens in the registry, but rather on existing version control platforms and then published to the registry, similar to how Docker or PyPI work.

Every published task or dataset is versioned by its digest, a revision number, and optional tags. This maximizes reproducibility and emphasizes registered data as snapshots or tasks and datasets to be distributed, rather than a development platform.

We welcome feedback on the registry and will continue to develop tools to maximize the usability, portability, and creation of environments.