After more than a month of delay, the Advanced Voice Mode feature in ChatGPT is now available to select users.
Two months after its announcement, OpenAI’s Advanced Voice Mode is rolling out to select ChatGPT Plus users. The feature was initially set to be released last month but was delayed for safety reasons.
Unfortunately, I am not one of the lucky users, so what I am about to share with you are interesting examples from my real hands-on experience with Advanced Voice Mode and my initial impression of the feature.
Advanced Voice Mode on ChatGPT features more natural, real-time conversations that pick up on and respond with emotion and non-verbal cues.
This new voice feature is currently in a limited alpha phase and is only accessible to 10,000 users. If you are invited to Alpha, you will receive an email with instructions on how to use Advanced Voice Mode.
If you’re wondering when you’ll get access, everyone on Plus will have access to Advanced Voice Mode in the fall.
The setup is pretty basic. When you open the ChatGPT app, you’ll see a tooltip on the bottom-right inviting you to try Advanced Voice Mode.
To start a conversation, select the voice icon on the bottom-right of the screen. This will bring up a big black circle icon that turns into cloud form whenever the bot is speaking.
You can mute or unmute your microphone by selecting the microphone icon on the bottom-left of the screen. You can end the conversation by pressing the red icon on the bottom-right of the screen.
Note that you will need to provide the ChatGPT app with microphone permission to use this feature.
Usage of Advanced Voice Mode (audio inputs and outputs) is limited daily, and the exact limits can change. The ChatGPT app will notify you when you have 3 minutes of audio left.
Once you reach the limit, the conversation will end immediately, and you’ll be invited to use our standard voice mode.
Now let’s see some fun and interesting examples shared on X.
1. Live translation: In the video below, ChatGPT translates Japanese text into English. This example uses a combination of both the vision and audio capabilities of the GPT-4o.
2. Talk in Japanese. Another example shows a user asking ChatGPT to tell a story in Japanese, specifically requesting it to be told in an excited tone.
3. ChatGPT rapping and beatboxing. We’ve already heard ChatGPT sing during the May 2024 announcement, but this user showed us that the bot can also rap and beatbox!
4. Make ominous sounds. The advanced voice mode can also create ominous sounds, perfect for PC game narrations and indie horror films!
5. As a voice actor. In this example, a user asks ChatGPT to count really fast. Impressively, it pauses at some point to catch its breath, just like a real person would.
Here’s another one:
Okay, that’s about it. What do you think of the new Advanced Voice Mode in ChatGPT? I think it looks and sounds great.
One of OpenAI’s spokespersons stated that ChatGPT’s new mode will only use four preset voices created with voice actors.
“We’ve made it so that ChatGPT cannot impersonate other people’s voices, both individuals and public figures, and will block outputs that differ from one of these preset voices.”
Will OpenAI use your audio for AI model training?
During the alpha phase, audio from conversations using Advanced Voice Mode will be used to train models if users have shared their audio. Users can opt out of audio training by disabling “Improve voice for everyone” in the Data Controls Settings.
If “Improve voice for everyone” is not visible in the settings, it means audio has not been shared, and it will not be used for training.
With Standard Voice Mode, if users choose to share their audio, it will be stored instead of being deleted after transcription. Steps will be taken to reduce personal information in the audio used for training models. The team may review the audio that has been shared.
Overall, I am really impressed with the hands-on videos of Advanced Voice Mode shared on X. The way it speaks sounds so eerily like a real person.
I don’t get why some people are disappointed with GPT-4o-Voice. It seems to me like it delivered exactly what was promised. Honestly, I’m even pleasantly surprised.
The ability to have nearly instant conversations with an AI that can mimic emotions, speak different languages, and act as a tutor or soccer commentator (or anything else you need) is pretty amazing.
While I haven’t experienced Advanced Voice Mode firsthand, the shared examples and my initial impressions suggest that this is a significant step forward in making AI interactions more human-like and engaging.
It would be really interesting if Apple replaced Siri with this. What do you think?
‍
Software engineer, writer, solopreneur