simpleqa
v1.0SimpleQA: 4,326 short, fact-seeking questions from OpenAI for evaluating language model factuality. Uses LLM-as-a-judge grading. Source: https://openai.com/index/introducing-simpleqa/
uvx harbor run -d simpleqa@1.0Tasks (4326)
simpleqa-976
uvx harbor run -d simpleqa@1.0 -t simpleqa-976b8f0fa9
simpleqa-977
uvx harbor run -d simpleqa@1.0 -t simpleqa-977b8f0fa9
simpleqa-978
uvx harbor run -d simpleqa@1.0 -t simpleqa-978b8f0fa9
simpleqa-979
uvx harbor run -d simpleqa@1.0 -t simpleqa-979b8f0fa9
simpleqa-98
uvx harbor run -d simpleqa@1.0 -t simpleqa-98b8f0fa9
simpleqa-980
uvx harbor run -d simpleqa@1.0 -t simpleqa-980b8f0fa9
simpleqa-981
uvx harbor run -d simpleqa@1.0 -t simpleqa-981b8f0fa9
simpleqa-982
uvx harbor run -d simpleqa@1.0 -t simpleqa-982b8f0fa9
simpleqa-983
uvx harbor run -d simpleqa@1.0 -t simpleqa-983b8f0fa9
simpleqa-984
uvx harbor run -d simpleqa@1.0 -t simpleqa-984b8f0fa9
simpleqa-985
uvx harbor run -d simpleqa@1.0 -t simpleqa-985b8f0fa9
simpleqa-986
uvx harbor run -d simpleqa@1.0 -t simpleqa-986b8f0fa9
simpleqa-987
uvx harbor run -d simpleqa@1.0 -t simpleqa-987b8f0fa9
simpleqa-988
uvx harbor run -d simpleqa@1.0 -t simpleqa-988b8f0fa9
simpleqa-989
uvx harbor run -d simpleqa@1.0 -t simpleqa-989b8f0fa9
simpleqa-99
uvx harbor run -d simpleqa@1.0 -t simpleqa-99b8f0fa9
simpleqa-990
uvx harbor run -d simpleqa@1.0 -t simpleqa-990b8f0fa9
simpleqa-991
uvx harbor run -d simpleqa@1.0 -t simpleqa-991b8f0fa9
simpleqa-992
uvx harbor run -d simpleqa@1.0 -t simpleqa-992b8f0fa9
simpleqa-993
uvx harbor run -d simpleqa@1.0 -t simpleqa-993b8f0fa9
simpleqa-994
uvx harbor run -d simpleqa@1.0 -t simpleqa-994b8f0fa9
simpleqa-995
uvx harbor run -d simpleqa@1.0 -t simpleqa-995b8f0fa9
simpleqa-996
uvx harbor run -d simpleqa@1.0 -t simpleqa-996b8f0fa9
simpleqa-997
uvx harbor run -d simpleqa@1.0 -t simpleqa-997b8f0fa9
simpleqa-998
uvx harbor run -d simpleqa@1.0 -t simpleqa-998b8f0fa9
simpleqa-999
uvx harbor run -d simpleqa@1.0 -t simpleqa-999b8f0fa9