• Home
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions
Friday, July 25, 2025
No Result
View All Result
Over Drive Journal
  • Home
  • World News
  • Business
  • Entertainment
  • Sports
  • Health
  • Travel
  • Tech
  • Lifestyle
  • Home
  • World News
  • Business
  • Entertainment
  • Sports
  • Health
  • Travel
  • Tech
  • Lifestyle
No Result
View All Result
Over Drive Journal
No Result
View All Result
Home Tech

A brand new AI coding problem simply revealed its first outcomes – and so they aren’t fairly

by Hifinis
July 24, 2025
in Tech
0
A brand new AI coding problem simply revealed its first outcomes – and so they aren’t fairly
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


A brand new AI coding problem has revealed its first winner — and set a brand new bar for AI-powered software program engineers. 

On Wednesday at 5pm PST, the nonprofit Laude Institute introduced the primary winner of the Ok Prize, a multi-round AI coding problem launched by Databricks and Perplexity co-founder Andy Konwinski. The winner was a Brazilian immediate engineer named Eduardo Rocha de Andrade, who will obtain $50,000 for the prize. However extra stunning than the win was his last rating: he received with appropriate solutions to only 7.5% of the questions on the check.

“We’re glad we constructed a benchmark that’s really onerous,” stated Konwinski. “Benchmarks needs to be onerous in the event that they’re going to matter,” he continued, including: “Scores can be totally different if the massive labs had entered with their largest fashions. However that’s type of the purpose. Ok Prize runs offline with restricted compute, so it favors smaller and open fashions. I like that. It ranges the taking part in subject.”

Konwinski has pledged $1 million to the primary open-source mannequin that may rating greater than 90% on the check.

Much like the well-known SWE-Bench system, the Ok Prize assessments fashions in opposition to flagged points from GitHub as a check of how nicely fashions can take care of real-world programming issues. However whereas SWE-Bench relies on a set set of issues that fashions can practice in opposition to, the Ok Prize is designed as a “contamination-free model of SWE-Bench,” utilizing a timed entry system to protect in opposition to any benchmark-specific coaching. For spherical one, fashions have been due by March twelfth. The Ok Prize organizers then constructed the check utilizing solely GitHub points flagged after that date.

The 7.5% prime rating stands in marked distinction to SWE-Bench itself, which at present reveals a 75% prime rating on its simpler ‘Verified’ check and 34% on its tougher ‘Full’ check. Konwinski nonetheless isn’t certain whether or not the disparity is because of contamination on SWE-Bench or simply the problem of gathering new points from GitHub, however he expects the Ok Prize challenge to reply the query quickly.

“As we get extra runs of the factor, we’ll have a greater sense,” he instructed TechCrunch, “as a result of we count on folks to adapt to the dynamics of competing on this each few months.”

Techcrunch occasion

San Francisco
|
October 27-29, 2025

It’d appear to be an odd place to fall quick, given the wide selection of AI coding instruments already publicly accessible – however with benchmarks changing into too simple, many critics see initiatives just like the Ok Prize as a needed step towards fixing AI’s rising analysis drawback.

“I’m fairly bullish about constructing new assessments for current benchmarks,” says Princeton researcher Sayash Kapoor, who put ahead an identical thought in a latest paper. “With out such experiments, we are able to’t really inform if the difficulty is contamination, and even simply concentrating on the SWE-Bench leaderboard with a human within the loop.”

For Konwinski, it’s not only a higher benchmark, however an open problem to the remainder of the trade. “When you hearken to the hype, it’s like we needs to be seeing AI docs and AI attorneys and AI software program engineers, and that’s simply not true,” he says. “If we are able to’t even get greater than 10% on a contamination free SWE-Bench, that’s the fact verify for me.”

Tags: arentChallengecodingPrettypublishedResults
Hifinis

Hifinis

Next Post
‘Minimalist will die in 3–5 years’: Founder led magic can’t survive HUL, says Shantanu Deshpande

'Minimalist will die in 3–5 years': Founder led magic can’t survive HUL, says Shantanu Deshpande

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Broadcaster Scott Oake amongst 88 appointees to Order of Canada

Broadcaster Scott Oake amongst 88 appointees to Order of Canada

7 months ago
12 Greatest Espresso Subscriptions (2025), Examined and Reviewed

12 Greatest Espresso Subscriptions (2025), Examined and Reviewed

3 weeks ago

Popular News

  • Innoviz groups with Nvidia on notion software program

    Innoviz groups with Nvidia on notion software program

    0 shares
    Share 0 Tweet 0
  • The Greatest Pure Deodorant for Ladies (Up to date for 2025)

    0 shares
    Share 0 Tweet 0
  • Ought to they keep or ought to they go? Australia’s finest spin choices to face Sri Lanka

    0 shares
    Share 0 Tweet 0
  • Federal Reserve officers noticed want for ‘cautious method’ to future charge cuts

    0 shares
    Share 0 Tweet 0
  • Nationwide Signing Day LIVE: Newest information, notes and evaluation

    0 shares
    Share 0 Tweet 0

About Us

Welcome to Overdrive Journal, your trusted source for timely, insightful, and diverse news coverage. We are dedicated to keeping you informed, engaged, and inspired by delivering stories that matter.

Category

  • Business
  • Entertainment
  • Health
  • Lifestyle
  • Sports
  • Tech
  • Travel
  • World News

Recent Posts

  • Tretinoin Flaking Disappears Immediately With This Hydrating Concealer
  • Two main AI coding instruments worn out person knowledge after making cascading errors
  • Lakers pluck Chris Mañon off Warriors’ Summer season League roster
  • Home
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions

© 2024 Overdrivejournal.com. All rights reserved.

No Result
View All Result
  • Home
  • World News
  • Business
  • Entertainment
  • Sports
  • Health
  • Travel
  • Tech
  • Lifestyle

© 2024 Overdrivejournal.com. All rights reserved.