Measuring AI models needs an overhaul.

April 15, 2024

201 Less than a minute

Measuring AI models needs an overhaul. — the verge social share.png

Measuring AI models needs an overhaul.

I often mention AI model benchmarks in posts, but Kevin Roose at The New York Times said the quiet part out loud: AI benchmark tests don’t help in comparing models, and these need to change.

Benchmarks cover a small amount of human knowledge, but as Roose points out, AI models easily surpass that. Training datasets sometimes include answers from benchmarking tests, so, of course, models beat the tests.

April 15, 2024

201 Less than a minute