Skip to content

New Sources Log — 2026-04-07

Title URL Tags Status Canonical Page Notes
MMLU https://github.com/hendrycks/test?update=2026-04-07 tool integrated mmlu Discovered in humanitys-last-exam.md
AlpacaEval https://github.com/tatsu-lab/alpaca_eval?update=2026-04-07 tool integrated alpaca-eval Discovered in chatbot-arena.md
MT-Bench https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge?update=2026-04-07 tool integrated mt-bench Discovered in chatbot-arena.md
InterCode https://github.com/princeton-nlp/intercode?update=2026-04-07 tool integrated intercode Discovered in terminal-bench.md
Simple time command with curl https://github.com/ollama/ollama/blob/main/docs/api.md?update=2026-04-07 tool integrated ollama-benchmark-cli Discovered in ollama-benchmark-cli.md
MATH Benchmark https://github.com/hendrycks/math?update=2026-04-07 tool integrated math-benchmark Discovered in gsm8k.md
ASDiv https://github.com/chiahsuan/ASDiv?update=2026-04-07 tool integrated asdiv Discovered in gsm8k.md
ARC (AI2 Reasoning Challenge) https://github.com/allenai/ARC-benchmark?update=2026-04-07 tool integrated arc Discovered in gpqa.md
BigCodeBench https://github.com/bigcode-project/bigcodebench?update=2026-04-07 tool integrated bigcodebench Discovered in human-eval.md
EvalPlus https://github.com/evalplus/evalplus?update=2026-04-07 repository integrated evalplus Staged from previous logs.
HELM https://crfm.stanford.edu/helm/lite/?update=2026-04-07 tool integrated helm Staged from previous logs.
OpenCompass https://opencompass.org.cn/?update=2026-04-07 tool integrated opencompass Staged from previous logs.