The Obtain: Rethinking AI benchmarks, and the ethics of AI brokers

Each time a brand new AI mannequin is launched, it’s sometimes touted as acing its efficiency towards a collection of benchmarks. OpenAI’s GPT-4o, for instance, was launched in Could with a compilation of outcomes that confirmed its efficiency topping each different AI firm’s newest mannequin in a number of exams.

The issue is that these benchmarks are poorly designed, the outcomes onerous to copy, and the metrics they use are steadily arbitrary, in accordance with new analysis. That issues as a result of AI fashions’ scores towards these benchmarks decide the extent of scrutiny they obtain.

AI firms steadily cite benchmarks as testomony to a brand new mannequin’s success, and people benchmarks already kind a part of some governments’ plans for regulating AI. However proper now, they may not be adequate to make use of that approach—and researchers have some concepts for a way they need to be improved.

—Scott J Mulligan

We have to begin wrestling with the ethics of AI brokers

Generative AI fashions have turn into remarkably good at conversing with us, and creating pictures, movies, and music for us, however they’re not all that good at doing issues for us.

AI brokers promise to vary that. Final week researchers revealed a brand new paper explaining how they skilled simulation brokers to copy 1,000 individuals’s personalities with beautiful accuracy.

AI fashions that mimic you could possibly exit and act in your behalf within the close to future. If such instruments turn into low cost and straightforward to construct, it’ll elevate a lot of new moral considerations, however two specifically stand out. Learn the total story.

—James O’Donnell

The Obtain: Rethinking AI benchmarks, and the ethics of AI brokers

Hifinis

Bitcoin (BTC) worth falls as post-election rally loses steam

Leave a Reply Cancel reply

Recommended

How Medical doctors Deal with Aggressive Prostate Most cancers Like Joe Biden’s

Financial institution of England retains charges regular, coverage cut up widens

Popular News

25 ROMBLON TOURIST SPOTS to Go to & Issues to Do

China asks Nepal to affix its new worldwide mediation organisation

Progress in internet gross sales of FDI cos moderated to 9.3 computer in FY24: RBI

The Greatest Pure Deodorant for Ladies (Up to date for 2025)

Innoviz groups with Nvidia on notion software program

About Us

Category

Recent Posts