OpenAI’s o1 and o1-mini models have much more powerful reasoning/thinking capabilities than the earlier models. This makes it possible to develop more intelligent agents that can plan up-front more effectively, break down a task into sub-tasks, and execute actions. At the same time, we can also build these agents more easily due to several in-built features in these models. In this blog, we break down some key features of o1 related to Prompting, Planning, and Image Reasoning. Here goes:
Prompting:
Chain of Thought: o1 generates inherent Chain of Thought (COT) to explore the various pathways and verify its answers. It includes behavior like error correction, breaking down a problem into smaller tasks, and trying multiple strategies. It can reason through complex tasks in diverse domains including math, science, coding, logistics, and strategy.
The feature that we find most interesting is that it requires less prompting and context while giving thoughtful responses. If you have experimented with prompt engineering to optimize the outputs and Retrieval Augmented Generation (RAG) for context retrieval, you will particularly appreciate this feature of o1—it has built in COT which makes our lives easier. No explicit COT is required-you can skip step by step reasoning prompts. In fact, prompts should be direct and simple. Also, It is better to give examples instead of giving detailed breakdowns.
Reasoning tokens: The o1 models compute at inference time by producing tokens to reason through a problem. Beware that reasoning tokens (after several turns) can sometimes be as high as ten times the output tokens, which presents the risk of output tokens being truncated, given the limited size of the context window (128k at this time).
Planning
The o1 models are powerful enough that you can plan and execute complex business tasks. Below is an example scenario:
Business Challenge: You received an order to manufacture 100 widgets. You have to plan the production and ship the goods.
AI Agent Challenge: You are a Supply Chain Management Assistant. Please generate a plan that takes into account the new orders received and how best to fulfill this demand. Pl. check for inventory, order necessary parts from suppliers, schedule production based on available capacity, and ship these goods to our retailer in Texas. Inform the customer about the status.
AI Agent does the work: The AI agent can then use 01/o1-mini to plan all these tasks and use GPT 40-mini to execute them. We have to give the agent access to the various functions like “get inventory status”, “check available suppliers” and then let the agent get to work.
Image Reasoning
Reasoning with Images is another strong capability of o1. For example, you can feed it an image of an Organizational Chart and ask questions about it. You can ask about how many reports belong to an individual or who has the most reports.
When not to use o1
O1 models are best used for complex problems where the higher costs and latency of using these models make sense. They should not be used for simple tasks.
When to use o1
It is best to use o1 for scenarios where you need intelligence and the business benefits outweigh the higher costs and latency. In many cases, it makes sense to use a combination of models-like using o1/o1-mini for Planning and GPT 4-0/4-0 mini for task execution. Some good use cases would be Coding, Planning, and domain specific reasoning like law, mathematics, and science. For example, data analysis, research synthesis, and problem solving are good scenarios for using o1/o1-mini.
This is just the tip of the iceberg. As reasoning models become more sophisticated, we will be able to design ever intelligent and evolving agents that make our lives more productive and easier. At Unvired, we are experimenting with developing intelligent agents and suggest that you do likewise.