Blog Credit : Trupti Thakur
Image Courtesy : Google
Project StrawBerry
OpenAI has launched a new AI model called o1 under a secret initiative called ‘Project Strawberry.’ This model is designed to handle more complex reasoning in areas like science, coding, and mathematics. It’s part of a series of models aimed at making AI think more deeply and tackle tougher problems.
Model Development
The o1 model is built to process questions in a way that mimics how humans think. It can:
- Look at problems from different angles.
- Learn from its mistakes by reviewing its answers and improving.
In early tests, the o1 model showed impressive results. It can solve complex problems in physics, chemistry, and biology at a level similar to a PhD student. It performed especially well in math and coding, successfully solving 83% of questions in a tough math competition—far better than older AI models.
Cost-Effective Options
OpenAI also introduced a cheaper and faster version called o1-Mini, which is 80% less expensive than the full o1 model. Even though it costs less, o1-Mini still provides strong reasoning skills, making it a good option for tasks like coding without breaking the bank.
Safety Measures
To ensure user safety, OpenAI has developed a new training method for the o1 model. This includes:
- Making sure the model follows safety guidelines better than before, improving its safety score from 22 to 84.
- Working with the UK and US governments to test the model’s safety thoroughly.
- Giving safety experts early access to review and improve the model.
Implications for Jobs and Research
The o1 model could shake up industries, especially in fields like software development and data analysis. Some jobs might be affected, pushing people to focus more on creative thinking and problem-solving skills. On the other hand, the rise of AI like o1 could create new job opportunities in areas like AI safety and ethics.
For researchers, o1 can be a valuable tool, helping them solve complicated problems faster, particularly in fields like healthcare and other scientific areas. It can also process large amounts of data quickly, making research more efficient.
How to Access OpenAI o1?
The o1 model is available to ChatGPT Plus and Team users, with a feature called the model picker to choose between versions. For now, users can send up to 30 messages to the o1-preview and 50 messages to o1-mini each day. In the future, OpenAI plans to increase these limits and automatically choose the best model based on user needs. ChatGPT Enterprise and Edu users will also get access next week.
How It Works ?
We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes.
In our tests, the next model update performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology. We also found that it excels in math and coding. In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%. Their coding abilities were evaluated in contests and reached the 89th percentile in Codeforces competitions. You can read more about this in our technical research post.
As an early model, it doesn’t yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and images. For many common cases GPT-4o will be more capable in the near term.
But for complex reasoning tasks this is a significant advancement and represents a new level of AI capability. Given this, we are resetting the counter back to 1 and naming this series OpenAI o1.
Safety
As part of developing these new models, we have come up with a new safety training approach that harnesses their reasoning capabilities to make them adhere to safety and alignment guidelines. By being able to reason about our safety rules in context, it can apply them more effectively.
One way we measure safety is by testing how well our model continues to follow its safety rules if a user tries to bypass them (known as “jailbreaking”). On one of our hardest jailbreaking tests, GPT-4o scored 22 (on a scale of 0-100) while our o1-preview model scored 84. You can read more about this in the system card and our research post.
To match the new capabilities of these models, we’ve bolstered our safety work, internal governance, and federal government collaboration. This includes rigorous testing and evaluations using our Preparedness Framework(opens in a new window), best-in-class red teaming, and board-level review processes, including by our Safety & Security Committee.
To advance our commitment to AI safety, we recently formalized agreements with the U.S. and U.K. AI Safety Institutes. We’ve begun operationalizing these agreements, including granting the institutes early access to a research version of this model. This was an important first step in our partnership, helping to establish a process for research, evaluation, and testing of future models prior to and following their public release.
Whom it’s for
These enhanced reasoning capabilities may be particularly useful if you’re tackling complex problems in science, coding, math, and similar fields. For example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers in all fields to build and execute multi-step workflows.
OpenAI o1-mini
The o1 series excels at accurately generating and debugging complex code. To offer a more efficient solution for developers, we’re also releasing OpenAI o1-mini, a faster, cheaper reasoning model that is particularly effective at coding. As a smaller model, o1-mini is 80% cheaper than o1-preview, making it a powerful, cost-effective model for applications that require reasoning but not broad world knowledge.
How to use OpenAI o1ChatGPT Plus and Team users will be able to access o1 models in ChatGPT starting today. Both o1-preview and o1-mini can be selected manually in the model picker, and at launch, weekly rate limits will be 30 messages for o1-preview and 50 for o1-mini. We are working to increase those rates and enable ChatGPT to automatically choose the right model for a given prompt.
ChatGPT Enterprise and Edu users will get access to both models beginning next week.
Developers who qualify for API usage tier 5(opens in a new window) can start prototyping with both models in the API today with a rate limit of 20 RPM. We’re working to increase these limits after additional testing. The API for these models currently doesn’t include function calling, streaming, support for system messages, and other features. To get started, check out the API documentation
Blog By : Trupti Thakur