Comparing the text rendering capability of top AI image generators.
The ability to render text in images is now supported by top AI image generators, including Midjourney V6, OpenAI’s Dall-E 3, and Amazon’s Titan G1.
The text rendering feature has been a long time coming due to its difficulty in implementation. Text has sharp edges, while pictures are smooth blends. AI doesn’t quite grasp the meaning of words or how they should look, like fonts and spacing. Plus, they need tons of examples to learn from, and putting text in the right place is hard.
So, how good are the text capabilities of today’s best AI image generators?
In this article, I will compare the image results of Midjourney V6, Dall-E 3, and Amazon Titan G1 using a similar prompt.
Let’s get started.
A cup of strawberry yogurt with the word “Delicous” written on the side, sitting on a wooden tabletop. Next to the cup of yogurt is a plate with toast and a glass of orange juice.
What I did here was intentionally misspell the word “delicious” and see if the AI would follow it or not. Among the three tools, only Midjourney V6 was able to add the correct text to the image.
Winner: Midjourney V6
A promotional card for “Bean Bliss Coffee Shop” featuring a steaming cup of coffee on a rustic table
In this example, only Dall-E 3 correctly rendered the words “Bean Bliss Coffee Shop.” However, the image contained some unreadable text artifacts.
Winner: Dall-E 3
A sleek and modern logo for a tech startup named “Innovatech Solutions” The logo includes the company name in a futuristic font, with a stylized, abstract symbol representing innovation and technology.
Both the first and third images rendered the text accurately, but Titan G1’s logo was particularly impressive. Which one do you prefer?
Winner: Amazon Titan G1
A colorful birthday greeting card showing balloons and confetti in the background. The card reads “Happy Birthday Jim!” in large, cheerful letters, with a space below for a personalized message.
At first glance, Amazon's Titan G1’s result looks the most fun and colorful. However, there are some weird artifacts that aren't supposed to be there. Also, Dall-E 3 misspelled the word “Jim” in this example.
Winner: Midjourney V6
A girl wearing a red t-shirt that is holding a card that says, “Hello there!”, smiling, blue background for promotional material.
In this example, only Dall-E 3 was able to correctly spell the phrase “Hello there!” making it the clear winner.
Winner: Dall-E 3
From these examples, it’s evident that no single platform consistently outperforms the others in all scenarios. The key takeaway from this comparison is the importance of ongoing development and the diverse applications of these AI tools. Each platform’s unique approach to handling text in images underscores the complexity of the task and the varied ways in which AI can interpret and execute it.
The evolution of AI in image generation, particularly with text rendering capabilities, is an ongoing journey marked by exciting potential and innovation. Soon, we can all ditch Photoshop and just ask the AI to render a perfect text graphics for us.
Software engineer, writer, solopreneur