OpenAI on new AI models that can reason

18:34
14.09.2024
kr23_ka
200

OpenAI's o1 series models are new large language models trained with reinforcement to perform complex reasoning. The o1 models think before answering and can create a long internal chain of reasoning before responding to the user.

o1 models excel at scientific reasoning, ranking in the 89th percentile on competitive programming questions (Codeforces), placing among the top 500 students in the USA in the qualifying round of the USA Mathematical Olympiad (AIME), and exceeding human PhD-level accuracy on physics, biology, and chemistry tasks (GPQA).

Two reasoning-capable models are available in the API:

o1-preview: an early preview version of our o1 model designed for reasoning about complex problems using general world knowledge.
o1-mini: a faster and cheaper version of o1, particularly effective in coding, math, and science tasks where extensive general knowledge is not required.

o1 models show significant progress in reasoning, but they are not intended to replace GPT-4o in all use cases.

For applications requiring image input, function calling, or consistently fast response times, GPT-4o and GPT-4o mini models will still be the right choice. However, if you plan to develop applications requiring deep reasoning and designed for longer response times, o1 models may be an excellent choice. We can't wait to see what you create with them!

o1 models are currently in beta.

Access is restricted to 5th level developers (check your usage level here), with low rate limits (20 RPM). We are working on adding new features, increasing rate limits, and expanding access to more developers in the coming weeks.

Quick Start

Both o1-preview and o1-mini are available through the chat completions endpoint.

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="o1-preview",
    messages=[
        {
            "role": "user", 
            "content": "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."
        }
    ]
)

print(response.choices[0].message.content)

Depending on the amount of reasoning required by the model to solve the problem, these requests can take anywhere from a few seconds to a few minutes.

Beta Limitations

During the beta phase, many parameters of the Chat Completions API are not yet available. Specifically:

Modalities: only text, images are not supported.
Message Types: only user and assistant messages, system messages are not supported.
Streaming: not supported.
Tools: tools, function calls, and response format parameters are not supported.
Logprobs: not supported.
Other: temperature, top_p, and n are fixed at 1, while presence_penalty and frequency_penalty are fixed at 0.
Assistants and Batch: these models are not supported in the Assistants API and Batch API.

We will add support for some of these parameters in the coming weeks as we exit beta. Features such as multimodality and tool usage will be included in future o1 series models.

How Reasoning Works

The o1 models feature reasoning tokens. Models use them to "think," breaking down their understanding of the prompt and considering multiple approaches to formulating a response. After generating reasoning tokens, the model outputs the response in the form of visible completion tokens and discards the reasoning tokens from its context.

Here is an example of a multi-step interaction between a user and an assistant. Input and output tokens from each step are carried over, while reasoning tokens are discarded.

OpenAI introduces new AI models capable of reasoning and logical thinking.

Although reasoning tokens are not visible through the API, they still occupy space in the model's context window and are called output tokens.

Managing the Context Window

The o1-preview and o1-mini models offer a context window of 128,000 tokens. Each completion has an upper limit on the maximum number of output tokens—this includes both invisible reasoning tokens and visible completion tokens. The maximum output token limits are as follows:

o1-preview: up to 32,768 tokens
o1-mini: up to 65,536 tokens

When creating completion tokens, it is important to ensure that there is enough space in the context window for reasoning tokens. Depending on the complexity of the task, models can generate from several hundred to tens of thousands of reasoning tokens. The exact number of reasoning tokens used can be seen in the usage object of the chat completion response object in the completion_tokens_details section:

usage: {
  total_tokens: 1000,
  prompt_tokens: 400,
  completion_tokens: 600,
  completion_tokens_details: {
    reasoning_tokens: 500
  }
}

Cost Control

To manage costs in the o1 series models, you can limit the total number of tokens generated by the model (including reasoning and completion tokens) using the max_completion_tokens parameter.

In previous models, the max_tokens parameter controlled both the number of tokens generated and the number of tokens visible to the user, which were always equal. However, in the o1 series, the total number of generated tokens may exceed the number of visible tokens due to internal reasoning tokens.

Since some applications may rely on the max_tokens parameter matching the number of tokens received from the API, the o1 series introduces max_completion_tokens to explicitly control the total number of tokens generated by the model, including both reasoning and visible completion tokens. This explicit choice ensures that existing applications will not break when using the new models. The max_tokens parameter continues to work as before for all previous models.

Allocating Space for Reasoning

If the number of generated tokens reaches the limit of the context window or the value of max_completion_tokens set by you, you will receive a response about the completion of work in the chat with the finish_reason parameter set to length. This can happen before visible completion tokens are created, meaning you may incur input and reasoning costs without receiving a visible response.

To prevent this, make sure there is enough space in the context window, or change the max_completion_tokens value to a higher one. OpenAI recommends reserving at least 25,000 tokens for reasoning and conclusions when you start experimenting with these models. As you become familiar with the number of reasoning tokens required by your prompts, you can adjust this buffer accordingly.

Prompting Tips

These models work best with direct prompts. Some prompt engineering techniques, such as multi-frame prompts or instructing the model to "think step by step," not only do not improve performance but sometimes hinder it. Here are some best practices:

Keep prompts simple and straightforward: models understand and respond well to brief and clear instructions that do not require detailed specifications.
Avoid chain-of-thought prompts: since these models conduct reasoning internally, there is no need to prompt them to "think step by step" or "explain their reasoning".
Use delimiters for clarity: use delimiters such as triple quotes, XML tags, or section titles to clearly delineate different parts of the input, helping the model to correctly interpret the various sections.
Limit additional context in generation with extended search (RAG): when providing additional context or documents, include only the most important information so that the model does not overcomplicate its response.

Prompt Examples

Programming (refactoring)

OpenAI o1 series models are capable of implementing complex algorithms and producing code. In this task, o1 is asked to refactor a React component based on specific criteria.

from openai import OpenAI

client = OpenAI()

prompt = """
Instructions:
- Given the React component below, change it so that nonfiction books have red
  text. 
- Return only the code in your reply
- Do not include any additional formatting, such as markdown code blocks
- For formatting, use four space tabs, and do not allow any lines of code to 
  exceed 80 columns

const books = [
  { title: 'Dune', category: 'fiction', id: 1 },
  { title: 'Frankenstein', category: 'fiction', id: 2 },
  { title: 'Moneyball', category: 'nonfiction', id: 3 },
];

export default function BookList() {
  const listItems = books.map(book =>
    
      {book.title}
    
  );

  return (
    {listItems}
  );
}
"""

response = client.chat.completions.create(
    model="o1-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt
                },
            ],
        }
    ]
)

print(response.choices[0].message.content)

Programming (planning)

Models of the OpenAI o1 series can also create multi-step plans. In this example, o1 asks to create a file system structure for a complete solution, as well as Python code that implements the desired use case.

Examples of usage

Some examples of using o1 in real situations can be found in the cookbook.

OpenAI on new AI models that can reason

OpenAI's o1 series models are new large language models trained with reinforcement to perform complex reasoning. The o1 models think before answering and can create a long internal chain of reasoning before responding to the user.

Quick Start

Beta Limitations

How Reasoning Works

Managing the Context Window

Cost Control

Allocating Space for Reasoning

Prompting Tips

Prompt Examples

Examples of usage

Write comment

Relevant news on the topic "AI"

Life After Achieving AGI: Total Happiness or the Decline of Civilization?

We have implemented a Telegram bot with AI in a federal company

Cognitive Traps of Humans and AI

Getting the Most Out of ChatGPT: Tips for Choosing the Right Model

Full-cycle engineering: TAPP Group experience

Everything is generated by GPT! A guide on how to recognize AI text and how to make it indistinguishable from human writing.

What do engineers at OpenAI, Microsoft, and AWS think about the future of AI: honest answers from the AI Engineer World's Fair 2025

Also read

ChatGPT is still out of reach: what’s happening on the AI market by mid-2025?

The Godfather" AI accuses new models of lying to users: how to avoid problems with LLM

Also read