Imagen 2 can generate AI images that's even better than Dall-E 3 or Midjourney.
After 15 months of total silence, Google has finally released an update to its AI image generator, Imagen, and boy the results look so good.
Imagen 2.0 was announced quietly during Google’s I/O conference in May 2023. Today, it’s finally here but only accessible to Google Cloud customers using Vertex AI.
Google Imagen is a text-to-image AI model that can create photorealistic images from a text description. Like other AI image generators such as Dall-E 3 or Midjourney, Imagen is based on a diffusion model, which is a type of neural network that can gradually refine images to match a given text prompt.
Imagen is trained on a massive dataset of text and images, which allows it to generate images that are both accurate and detailed.
A conditional diffusion model maps the text embedding into a 64×64 image. Imagen further utilizes text-conditional super-resolution diffusion models to upsample the image 64×64 to 256×256 and 256×256 to 1024×1024.
If you want to dig deeper into how Imagen works, check out the whitepaper here.
These are the key improvements in Imagen 2.0:
Let’s unpack each of these features.
To generate higher-quality, more accurate images that better match prompts, Imagen 2’s training dataset had more detailed image captions. This helps Imagen 2 better grasp the relationship between images and words, improving its understanding of context and nuance.
Take this prompt as an example:
An image of: Consider the subtleness of the sea; how its most dreaded creatures glide under water, unapparent for the most part, and treacherously hidden beneath the loveliest tints of azure
The prompt is an excerpt from Moby-Dick by Herman Melville. Hence, Imagen 2 generated an abstract painting of a whale. While Dall-E 3 simply generated a random underwater scene.
Imagen 2 has improved a lot in an area where most AI image generator struggles — hands. Aside from the hands the face symmetry and details has also greatly improved.
To produce more visually appealing images, Imagen 2 was trained using an image aesthetics model that scored images based on qualities like lighting, framing, and sharpness that humans find more attractive. This scoring system allowed Imagen 2 to give greater weight to training images that align with human aesthetic preferences.
Imagen 2 also supports image editing capabilities — inpainting and outpainting.
Here’s an example:
Imagen is currently accessible in Google Vertex AI which is only limited to selected users. Head over to Google Cloud Console and search for Vertex AI. Under the Vision tab, you’ll see the dashboard that lets you generate images.
Here are more examples from Google Deepmind’s blog:
Prompt: A shot of a 32-year-old female, up and coming conservationist in a jungle; athletic with short, curly hair and a warm smile
That is one heck of a photorealistic image. Seriously, if this AI tool gets the ability to copy a face from a reference image, it would be the start of the end of professional photographers’ careers.
Prompt: The robin flew from his swinging spray of ivy on to the top of the wall and he opened his beak and sang a loud, lovely trill, merely to show off. Nothing in the world is quite as adorably lovely as a robin when he shows off — and they are nearly always doing it.
This image also poses a danger to animal photographers’ jobs. Can you even tell that this is not a real image? I bet you can’t.
Another particular example that really caught my attention is how well it generates logos and brand names. While other competitors like Dall-E 3 are also capable of legibly adding texts to an image, the quality of the result is more impressive.
As you can see from these images, Dall-E 3 sometimes fails to correctly spell words while Imagen had perfectly added the text on the product even on skewed angles. Brand designers and owners will surely be excited to get their hands on this technology.
Let’s talk about pricing.
On Google’s pricing page, image generation costs $0.020 per image. But I cannot verify if this pricing is for v1 or v2 of the AI model.
If anyone can confirm the pricing for each AI model, I’d appreciate it a lot.
Google is going all-in on AI. Imagen 2 images are impressive, even better than Dall-E 3 and Midjourney outputs. I can’t wait to get my hands on this tool.
As for the security, the enhanced photorealism of Imagen 2 will surely raise eyebrows among policymakers. Google is currently chooses to not talk about the dataset it used to train the AI model while relevant lawsuits are still working their way through the courts.
And just a final note to Google, they desperately needs to get their platform and docs in order. It is incredibly difficult to use any of their new AI tools and models.
What is your view on this new AI image generator? What concerns you most?
Software engineer, writer, solopreneur