July 29, 2024

OpenAI's New "Project Strawberry" Hints a New and Powerful Model

OpenAI's undisclosed new AI model aims to make AI think more like humans.

by

Jim Clyde Monge

OpenAI is once again pushing the boundaries of AI with its undisclosed new AI model, “Project Strawberry.” This initiative, previously known as Q*, aims to dramatically enhance the reasoning capabilities of AI models.

According to the recent leaks from Bloomberg and Reuters, OpenAI is making progress in enabling AI models be capable of planning, navigating the internet autonomously, and carrying out what OpenAI refers to as “deep research.”

On Tuesday at an internal all-hands meeting, OpenAI showed a demo of a research project that it claimed had new human-like reasoning skills, according to Bloomberg

Essentially, OpenAI is striving to make AI think more like humans.

According to some sources, OpenAI has tested a model internally, achieving a 90 percent score on a challenging AI math skills test. However, it couldn’t confirm if this was related to Project Strawberry.

Meanwhile, two other sources reported seeing demos from the Q* project, showcasing models solving advanced math and science questions beyond the capabilities of today’s leading commercial AIs.

The exact methods OpenAI used to enhance these capabilities are still unclear. The Reuters report mentions that Project Strawberry involves fine-tuning OpenAI’s existing large language models, which have already been trained on extensive datasets.

This approach is reportedly similar to one outlined in a 2022 paper from Stanford researchers called Self-Taught Reasoner (STaR).

An overview of STaR and a STaR-generated rationale on CommonsenseQA

In the figure above, the fine-tuning outer loop is indicated with a dashed line. The questions and ground truth answers are expected to be present in the dataset, while the rationales are generated using STaR.

So, what’s the deal with Strawberry?

A few weeks ago, in an interview with Dartmouth Engineering, OpenAI CTO Mira Murati discussed the next generation of AI, which she described as having intelligence akin to someone with a Ph.D.

Is Strawberry the model she’s talking about?

“The most important areas of progress will be around reasoning ability.” — Sam Altman, OpenAI CEO

Current AI models excel at generating text and performing specific tasks, but they struggle with complex reasoning, long-term planning, and autonomous decision-making.

However, it’s important to consider the broader context.

With Ilya Sutskever no longer part of OpenAI and key members of the alignment team having left the company, there are growing concerns about the direction of OpenAI’s research and the safety of the products they are releasing to the public.

While I am deeply skeptical about the current trajectory, I am not an AI expert. My apprehensions stem from the rapid pace of development and the potential implications of these advanced technologies.

Previously Known as Q* (Q-Star)

Project Q* gained widespread attention in 2023 due to the drama with Sam Altman and Ilya Sutskever; the actual development work had been underway since early 2022.

The key development timeline is:

Early 2022: OpenAI begins efforts to make their AI systems smarter at reasoning
Mid 2022: Ilya Sutskever, a prominent researcher at OpenAI, kicks off what becomes Project Q*
Late 2022: The Q* team gets the model to solve simple math problems
Early 2023: Conflict arises between Q* researchers and CEO Sam Altman over the project

The architecture of Project Q* combines large language models, reinforcement learning, and search algorithms. It integrates deep learning techniques seen in ChatGPT with human-programmed rules. This AI model potentially combines Q-learning and A* search.

Why is it called Q*?

There’s no official statement from OpenAI, but the term “Q*” could be rooted in DeepMind’s history with reinforcement learning. Initially, DeepMind used Q-learning to train a neural network to play Atari video games by learning through trial and error, optimizing a function called the Q function to estimate rewards from various actions.

Building on this foundation, Q* likely represents an attempt to merge large language models with AlphaGo-style search techniques, potentially using reinforcement learning to enhance the model. The goal would be to develop a system where language models can refine their abilities by “playing against themselves” in complex reasoning tasks, pushing the boundaries of what these models can achieve.

Final Thoughts

While we don’t know all the details yet, it’s clear they’re aiming for something big. I’m skeptical that Q*, or Project Strawberry, is the breakthrough that will lead to AGI. I don’t believe it poses a threat to humanity. However, it could be a significant step toward developing an AI with general reasoning capabilities.

It’s important to recognize that intelligence, whether human or artificial, exists on a spectrum.

Just as human reasoning capacity varies based on factors like IQ, AI systems also have different levels of capability depending on their design and training. For many industrial and specialized applications, current AI systems already demonstrate AGI-like abilities, outperforming most humans in specific tasks involving data analysis, pattern recognition, and logical reasoning.

However, humans still maintain an edge in areas like general reasoning, common sense, creativity, and emotional intelligence. The key to unlocking truly transformative potential lies not in pitting AI against human intelligence but in combining them.

‍

Stay ahead. Stay updated.