December 20, 2023

Google Announces Imagen 2.0 AI Image Generator

Imagen 2 can generate AI images that's even better than Dall-E 3 or Midjourney.

by

Jim Clyde Monge

After 15 months of total silence, Google has finally released an update to its AI image generator, Imagen, and boy the results look so good.

Imagen 2.0 was announced quietly during Google’s I/O conference in May 2023. Today, it’s finally here but only accessible to Google Cloud customers using Vertex AI.

What is Google Imagen?

Google Imagen is a text-to-image AI model that can create photorealistic images from a text description. Like other AI image generators such as Dall-E 3 or Midjourney, Imagen is based on a diffusion model, which is a type of neural network that can gradually refine images to match a given text prompt.

Imagen is trained on a massive dataset of text and images, which allows it to generate images that are both accurate and detailed.

An illustration of Google Imagen image generation process — Google Imagen image generation process

A conditional diffusion model maps the text embedding into a 64×64 image. Imagen further utilizes text-conditional super-resolution diffusion models to upsample the image 64×64 to 256×256 and 256×256 to 1024×1024.

If you want to dig deeper into how Imagen works, check out the whitepaper here.

What’s new in Google Imagen 2?

These are the key improvements in Imagen 2.0:

Improved image-caption understanding
More realistic image generation
Fluid style conditioning
Advanced inpainting and outpainting

Let’s unpack each of these features.

Improved image-caption understanding

To generate higher-quality, more accurate images that better match prompts, Imagen 2’s training dataset had more detailed image captions. This helps Imagen 2 better grasp the relationship between images and words, improving its understanding of context and nuance.

Take this prompt as an example:

An image of: Consider the subtleness of the sea; how its most dreaded creatures glide under water, unapparent for the most part, and treacherously hidden beneath the loveliest tints of azure

Consider the subtleness of the sea; how its most dreaded creatures glide under water, unapparent for the most part, and treacherously hidden beneath the loveliest tints of azure — Imagen 2 vs Dall-E 3

The prompt is an excerpt from Moby-Dick by Herman Melville. Hence, Imagen 2 generated an abstract painting of a whale. While Dall-E 3 simply generated a random underwater scene.

More realistic image generation

Imagen 2 has improved a lot in an area where most AI image generator struggles — hands. Aside from the hands the face symmetry and details has also greatly improved.

To produce more visually appealing images, Imagen 2 was trained using an image aesthetics model that scored images based on qualities like lighting, framing, and sharpness that humans find more attractive. This scoring system allowed Imagen 2 to give greater weight to training images that align with human aesthetic preferences.

Inpainting and outpainting

Imagen 2 also supports image editing capabilities — inpainting and outpainting.

Inpainting is a feature that allows you to edit a portion of your image by adding a mask and letting the AI auto-fill that masked portion.
Outpainting is a technique that extends the original image beyond its borders. This allows you to expand an image to various aspect ratios.

Here’s an example:

How to access Imagen 2

Imagen is currently accessible in Google Vertex AI which is only limited to selected users. Head over to Google Cloud Console and search for Vertex AI. Under the Vision tab, you’ll see the dashboard that lets you generate images.

Google vertex ai image generation dashboard — Image by Jim Clyde Monge

More Example Images

Here are more examples from Google Deepmind’s blog:

Prompt: A shot of a 32-year-old female, up and coming conservationist in a jungle; athletic with short, curly hair and a warm smile

That is one heck of a photorealistic image. Seriously, if this AI tool gets the ability to copy a face from a reference image, it would be the start of the end of professional photographers’ careers.

Prompt: The robin flew from his swinging spray of ivy on to the top of the wall and he opened his beak and sang a loud, lovely trill, merely to show off. Nothing in the world is quite as adorably lovely as a robin when he shows off — and they are nearly always doing it.

The robin flew from his swinging spray of ivy on to the top of the wall and he opened his beak and sang a loud, lovely trill, merely to show off. Nothing in the world is quite as adorably lovely as a robin when he shows off — and they are nearly always doing it. — Google’s Imagen 2 example image

This image also poses a danger to animal photographers’ jobs. Can you even tell that this is not a real image? I bet you can’t.

Focus on Branding and Logos

Another particular example that really caught my attention is how well it generates logos and brand names. While other competitors like Dall-E 3 are also capable of legibly adding texts to an image, the quality of the result is more impressive.

Imagen 2 vs Dall-e 3

As you can see from these images, Dall-E 3 sometimes fails to correctly spell words while Imagen had perfectly added the text on the product even on skewed angles. Brand designers and owners will surely be excited to get their hands on this technology.

Pricing

Let’s talk about pricing.

On Google’s pricing page, image generation costs $0.020 per image. But I cannot verify if this pricing is for v1 or v2 of the AI model.

If anyone can confirm the pricing for each AI model, I’d appreciate it a lot.

Final Thoughts

Google is going all-in on AI. ^Imagen 2 images are impressive, even better than Dall-E 3 and Midjourney outputs. I can’t wait to get my hands on this tool.

As for the security, the enhanced photorealism of Imagen 2 will surely raise eyebrows among policymakers. Google is currently chooses to not talk about the dataset it used to train the AI model while relevant lawsuits are still working their way through the courts.

And just a final note to Google, they desperately needs to get their platform and docs in order. It is incredibly difficult to use any of their new AI tools and models.

What is your view on this new AI image generator? What concerns you most?

Stay ahead. Stay updated.