New Sources Log — 2026-04-07¶

Title	URL	Tags	Status	Canonical Page	Notes
MMLU	https://github.com/hendrycks/test?update=2026-04-07	tool	integrated	mmlu	Discovered in humanitys-last-exam.md
AlpacaEval	https://github.com/tatsu-lab/alpaca_eval?update=2026-04-07	tool	integrated	alpaca-eval	Discovered in chatbot-arena.md
MT-Bench	https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge?update=2026-04-07	tool	integrated	mt-bench	Discovered in chatbot-arena.md
InterCode	https://github.com/princeton-nlp/intercode?update=2026-04-07	tool	integrated	intercode	Discovered in terminal-bench.md
Simple `time` command with `curl`	https://github.com/ollama/ollama/blob/main/docs/api.md?update=2026-04-07	tool	integrated	ollama-benchmark-cli	Discovered in ollama-benchmark-cli.md
MATH Benchmark	https://github.com/hendrycks/math?update=2026-04-07	tool	integrated	math-benchmark	Discovered in gsm8k.md
ASDiv	https://github.com/chiahsuan/ASDiv?update=2026-04-07	tool	integrated	asdiv	Discovered in gsm8k.md
ARC (AI2 Reasoning Challenge)	https://github.com/allenai/ARC-benchmark?update=2026-04-07	tool	integrated	arc	Discovered in gpqa.md
BigCodeBench	https://github.com/bigcode-project/bigcodebench?update=2026-04-07	tool	integrated	bigcodebench	Discovered in human-eval.md
EvalPlus	https://github.com/evalplus/evalplus?update=2026-04-07	repository	integrated	evalplus	Staged from previous logs.
HELM	https://crfm.stanford.edu/helm/lite/?update=2026-04-07	tool	integrated	helm	Staged from previous logs.
OpenCompass	https://opencompass.org.cn/?update=2026-04-07	tool	integrated	opencompass	Staged from previous logs.