GPT-4o is a free but highly capable language model.
Earlier today, thousands of AI enthusiasts eagerly tuned in to OpenAI’s highly anticipated livestream event, where the company unveiled its latest groundbreaking advancements in ChatGPT. While speculation ran rampant about the possibility of a revolutionary search feature to challenge Google’s dominance or the reveal of the much-awaited GPT-5 model, the actual announcement took a slightly different direction.
They announced a new model that’s smarter, cheaper, faster, better at coding, multi-modal, and mind-blowingly fast language model, GPT-4o.
It was a good choice for OpenAI to demo the new features live at 1x speed instead of using a pre-recorded video (yes, I am looking at you Google).
First things first, the “o” in GPT-4o means omni which represents the multimodality support for both inputs and output.
GPT-4o can process and generate text, audio, and images in real time. It represents a significant step towards more natural human-computer interaction, accepting any combination of text, audio, and image inputs and generating corresponding outputs.
Perhaps the most notable advancement to GPT-4o is its almost real-time response as a voice assistant.
It can respond to audio inputs in as little as 232 milliseconds on average, which is comparable to human response times in conversation. It matches the performance of GPT-4 Turbo on English text and code while showing significant improvements in non-English languages, and it is notably faster and 50% cheaper in the API.
Here is the list of new features in GPT-4o.
1. Real-time responses
One of the coolest things is how fast it responds. When you chat with GPT-4o, it feels like talking to a real person. It can match your tone, crack jokes, and even sing in harmony.
This natural, speedy back-and-forth makes using the chatbot feel way more fun and engaging. But how did OpenAI pull this off?
Before GPT-4o, ChatGPT’s Voice Mode relied on a three-step process: audio was transcribed to text, then processed by GPT-3.5 or GPT-4, and finally converted back to audio. This led to slower response times (2.8 seconds for GPT-3.5 and 5.4 seconds for GPT-4) and loss of information like tone and background noise.
GPT-4o uses a single AI model trained to handle text, images, and audio all at once. This end-to-end processing allows GPT-4o to respond much faster and more naturally, picking up on nuances that previous models would miss.
2. Improved Reasoning
GPT-4o has achieved new heights in reasoning, setting a record score of 88.7% on the 0-shot COT MMLU benchmark, which tests general knowledge. This was measured using OpenAI’s new “simple evals” library. GPT-4o also scored 87.2% on the traditional 5-shot no-CoT MMLU, another record.
However, other AI models like Llama3 400b are still in training and could potentially outperform GPT-4o in the future.
GPT-4o also demonstrated significant advancements in both mathematical reasoning and visual understanding.
On the M3Exam benchmark, which evaluates performance on standardized test questions from various countries, often including diagrams and figures, GPT-4o outperformed GPT-4 across all languages tested.
In terms of pure vision understanding, GPT-4o achieved state-of-the-art results on several key benchmarks, including MMMU, MathVista, and ChartQA. Notably, these evaluations were conducted in a 0-shot setting, meaning GPT-4o was not specifically trained or fine-tuned on these tasks.
3. GPT-4o is free to use
GPT-4o is going to be free to use. This is huge because if the free version of ChatGPT with the GPT-3.5 model brought in 100 million users, the smarter model GPT-4o could potentially bring in 100 million more.
Users on the Free tier will be defaulted to GPT-4o with a limit on the number of messages they can send using GPT-4o, which will vary based on current usage and demand. When unavailable, Free tier users will be switched back to GPT-3.5. — OpenAI
Honestly, it’s quite intriguing how OpenAI is offering this new and improved model for free without losing so much money. I mean, the amount of computing power required to run these language models is astonishing. Plus, GPT-4o runs so fast (100 tok/sec).
Here are a few thoughts on why they are making it free:
For better context, this is how GPT-4o compares to GPT-4. GPT-4o has the same high intelligence but is faster, cheaper, and has higher rate limits than the GPT-4 Turbo:
GPT-4o currently has a context window of 128k and has a knowledge cut-off date of October 2023.
Right now, I do not see the GPT-4o option in the free version of ChatGPT. But if you go to OpenAI Playground, the new model is now accessible.
According to a tweet from Sam Altman, the new voice mode will be live in the coming weeks for ChatGPT Plus users.
There are currently two models gpt-4o and gpt-4o-2024–05–13. The pricing for both GPT-4o models is as follows:
Take note that access to GPT-4, GPT-4 Turbo, and GPT-4o models via the OpenAI API is only available after you have made a successful payment of $5 or more (usage tier 1).
Overall, it was an impressive demo of GPT-4o, particularly on the fact that it’s free to use and impressively fast voice responses.
So the question now is, will it attract more users? It’s a sure yes. The new model is free to use and the real-time voice responses are definitely worth checking out.
Is it worth the $20 upgrade though? I couldn’t say it’s worth it because I still need to do more hands-on tests on the model and see if it’s really better than Claude Opus. Besides, Google may drop some huge updates to Gemini tomorrow in Google IO that are more exciting than what was announced by OpenAI today.
‍
Software engineer, writer, solopreneur