OpenAI’s New o1 Model Thinks Hard Before Responding By Elets News Network - 14 September 2024

OpenAI

OpenAI introduced a new series of AI models to improve reasoning capabilities, and tackle more complex tasks in science, coding, and mathematics. These models, which take more time to think before responding, are expected to outperform previous versions in handling difficult problems.

The first model in this new series is being released as a preview in ChatGPT and via OpenAI’s API, with regular anticipated updates. Alongside this release, evaluations for the next update, still in development, have also been shared.

These models are trained to approach problems more deliberately, mimicking human thinking by refining their process, testing strategies, and learning from mistakes. In tests, the upcoming version performed comparably to PhD students on challenging tasks in physics, chemistry, and biology, and achieved high scores in mathematics and coding. Specifically, the reasoning model solved 83% of problems on a qualifying exam for the International Mathematics Olympiad (IMO), compared to just 13% by GPT-4o. In coding, the model ranked in the 89th percentile in Codeforces competitions.

While the initial version lacks some of the features commonly associated with ChatGPT, like web browsing and file uploads, it offers an advancement in handling complex reasoning tasks. The new series, now named OpenAI o1, marks a reset in the model numbering to reflect this leap in AI capability.

Safety remains a key focus in the development of these models, which is why OpenAI has introduced a new safety training approach, utilising the enhanced reasoning capabilities to better adhere to safety and alignment guidelines. In a rigorous test to prevent jailbreaking (user attempts to bypass safety measures), the new model scored 84 out of 100, significantly outperforming GPT-4o, which scored 22.

Also Read | OpenAI’s ChatGPT-4o Reclaims Top Spot, Dethrones Google Gemini

To support these advances, OpenAI has strengthened its safety efforts, internal governance, and collaboration with federal agencies. This includes rigorous testing and evaluation under their Preparedness Framework and formal agreements with AI Safety Institutes in the U.S. and U.K., granting these institutes early access to research versions of the model.

Related News