On the Frontier Math benchmark by EpochAI, o3 solved 25.2 % of issues, whereas no different mannequin has exceeded 2 %—suggesting a leap in mathematical reasoning capabilities over the earlier mannequin.
Benchmarks vs. real-world worth
Ideally, potential purposes for a real PhD-level AI mannequin would come with analyzing medical analysis information, supporting local weather modeling, and dealing with routine elements of analysis work.
The excessive value factors reported by The Data, if correct, recommend that OpenAI believes these techniques might present substantial worth to companies. The publication notes that SoftBank, an OpenAI investor, has dedicated to spending $3 billion on OpenAI’s agent merchandise this yr alone—indicating important enterprise curiosity regardless of the prices.
In the meantime, OpenAI faces monetary pressures that will affect its premium pricing technique. The corporate reportedly misplaced roughly $5 billion final yr protecting operational prices and different bills associated to working its providers.
Information of OpenAI’s stratospheric pricing plans come after years of comparatively inexpensive AI providers which have conditioned customers to anticipate highly effective capabilities at comparatively low prices. ChatGPT Plus stays $20 per 30 days and Claude Professional prices $30 month-to-month—each tiny fractions of those proposed enterprise tiers. Even ChatGPT Professional’s $200/month subscription is comparatively small in comparison with the brand new proposed charges. Whether or not the efficiency distinction between these tiers will match their thousandfold value distinction is an open query.
Regardless of their benchmark performances, these simulated reasoning fashions nonetheless battle with confabulations—cases the place they generate plausible-sounding however factually incorrect info. This stays a vital concern for analysis purposes the place accuracy and reliability are paramount. A $20,000 month-to-month funding raises questions on whether or not organizations can belief these techniques to not introduce delicate errors into high-stakes analysis.
In response to the information, a number of individuals quipped on social media that corporations might rent an precise PhD pupil for less expensive. “In case you have got forgotten,” wrote xAI developer Hieu Pham in a viral tweet, “most PhD college students, together with the brightest stars who can do method higher work than any present LLMs—aren’t paid $20K / month.”
Whereas these techniques present robust capabilities on particular benchmarks, the “PhD-level” label stays largely a advertising time period. These fashions can course of and synthesize info at spectacular speeds, however questions stay about how successfully they’ll deal with the inventive pondering, mental skepticism, and unique analysis that outline precise doctoral-level work. However, they may by no means get drained or want medical health insurance, and they’ll doubtless proceed to enhance in functionality and drop in price over time.