simpleqa

v1.0

SimpleQA: 4,326 short, fact-seeking questions from OpenAI for evaluating language model factuality. Uses LLM-as-a-judge grading. Source: https://openai.com/index/introducing-simpleqa/

uvx harbor run -d simpleqa@1.0

Tasks (4326)

simpleqa-976
uvx harbor run -d simpleqa@1.0 -t simpleqa-976
b8f0fa9
simpleqa-977
uvx harbor run -d simpleqa@1.0 -t simpleqa-977
b8f0fa9
simpleqa-978
uvx harbor run -d simpleqa@1.0 -t simpleqa-978
b8f0fa9
simpleqa-979
uvx harbor run -d simpleqa@1.0 -t simpleqa-979
b8f0fa9
simpleqa-98
uvx harbor run -d simpleqa@1.0 -t simpleqa-98
b8f0fa9
simpleqa-980
uvx harbor run -d simpleqa@1.0 -t simpleqa-980
b8f0fa9
simpleqa-981
uvx harbor run -d simpleqa@1.0 -t simpleqa-981
b8f0fa9
simpleqa-982
uvx harbor run -d simpleqa@1.0 -t simpleqa-982
b8f0fa9
simpleqa-983
uvx harbor run -d simpleqa@1.0 -t simpleqa-983
b8f0fa9
simpleqa-984
uvx harbor run -d simpleqa@1.0 -t simpleqa-984
b8f0fa9
simpleqa-985
uvx harbor run -d simpleqa@1.0 -t simpleqa-985
b8f0fa9
simpleqa-986
uvx harbor run -d simpleqa@1.0 -t simpleqa-986
b8f0fa9
simpleqa-987
uvx harbor run -d simpleqa@1.0 -t simpleqa-987
b8f0fa9
simpleqa-988
uvx harbor run -d simpleqa@1.0 -t simpleqa-988
b8f0fa9
simpleqa-989
uvx harbor run -d simpleqa@1.0 -t simpleqa-989
b8f0fa9
simpleqa-99
uvx harbor run -d simpleqa@1.0 -t simpleqa-99
b8f0fa9
simpleqa-990
uvx harbor run -d simpleqa@1.0 -t simpleqa-990
b8f0fa9
simpleqa-991
uvx harbor run -d simpleqa@1.0 -t simpleqa-991
b8f0fa9
simpleqa-992
uvx harbor run -d simpleqa@1.0 -t simpleqa-992
b8f0fa9
simpleqa-993
uvx harbor run -d simpleqa@1.0 -t simpleqa-993
b8f0fa9
simpleqa-994
uvx harbor run -d simpleqa@1.0 -t simpleqa-994
b8f0fa9
simpleqa-995
uvx harbor run -d simpleqa@1.0 -t simpleqa-995
b8f0fa9
simpleqa-996
uvx harbor run -d simpleqa@1.0 -t simpleqa-996
b8f0fa9
simpleqa-997
uvx harbor run -d simpleqa@1.0 -t simpleqa-997
b8f0fa9
simpleqa-998
uvx harbor run -d simpleqa@1.0 -t simpleqa-998
b8f0fa9
simpleqa-999
uvx harbor run -d simpleqa@1.0 -t simpleqa-999
b8f0fa9