Generative AI
Measuring AI models needs an overhaul.
Measuring AI models needs an overhaul.
I often mention AI model benchmarks in posts, but Kevin Roose at The New York Times said the quiet part out loud: AI benchmark tests don’t help in comparing models, and these need to change.
Benchmarks cover a small amount of human knowledge, but as Roose points out, AI models easily surpass that. Training datasets sometimes include answers from benchmarking tests, so, of course, models beat the tests.