Generative AI
Measuring AI models needs an overhaul.
Measuring AI models needs an overhaul.
I often mention AI model benchmarks in posts, but Kevin Roose at The New York Times said the quiet part out loud: AI benchmark tests don’t help in comparing models, and these need to change.
Benchmarks cover a small amount of human knowledge, but as Roose points out, AI models easily surpass that. Training datasets sometimes include answers from benchmarking tests, so, of course, models beat the tests.


![[Webinar] Considerations for Leveraging Generative AI in Legal Writing – May 21st, 10:00 am PDT | Association of Certified E-Discovery Specialists (ACEDS) [Webinar] Considerations for Leveraging Generative AI in Legal Writing – May 21st, 10:00 am PDT | Association of Certified E-Discovery Specialists (ACEDS)](https://europeantech.news/wp-content/uploads/2024/04/og.15858_1446-390x220.jpg)
