X's Grok-2 can now generate high-quality images comparable to Midjourney.
The recently released open-weight image model Flux, from Blackforest Labs, has been integrated into Elon Musk’s X platform via Grok. The integration is part of the recently launched Grok-2 and Grok-2-mini models, which feature cutting-edge capabilities in chat, coding, and reasoning.
Flux.1 is a new state-of-the-art (SOTA) family of text-to-image models that set a new standard in image detail, prompt accuracy, style variety, and scene complexity for text-to-image generation.
It comes in three variants:
All Flux.1 models feature a blend of multimodal and parallel diffusion transformer blocks, with 12 billion parameters. These models outperform previous diffusion models by leveraging flow matching, a straightforward method for training generative models that incorporates diffusion.
Moreover, the models achieve better performance and hardware efficiency through the use of rotary positional embeddings and parallel attention layers.
While it’s not explicitly stated which specific Flux image model is being used in Grok AI, the quality of the images suggests that it’s likely either the Dev or the Pro model. The level of detail and the richness of the image are remarkable, which leads me to lean towards the Pro model.
Generating images on Grok is straightforward, but you’ll need to be a Premium or Premium+ user on the X platform. Once you’re logged in, simply navigate to the left-hand sidebar and click on the Grok button. From there, you can describe the image you want to create.
Here’s an example:
Prompt: Make an image of an attractive influencer presenting at a TED Talk
To give you an idea of how Grok stacks up against Midjourney, here’s a side-by-side comparison of images generated with the same prompt:
Which one do you prefer?
Personally, I find the image generated by Grok more appealing. While the Midjourney image has more texture and finer details, it also has an uncanny valley effect that makes it feel slightly artificial. On the other hand, the Grok image has a more natural look with softer tones and less saturation.
I also fed the same prompt into ChatGPT (using Dall-E 3) and this is what I got:
The result was decent, but it didn’t quite match the quality you get from either Midjourney or Grok.
Here’s another example:
Prompt: Polaroid photo with VSCO filter, 1990, gorgeous woman, night, flash photo, blonde, cute, young face, beautiful shadows, tropical plants, urban clothing, inside an apartment, DSLR, holding a sign written in ballpoint pen on a notebook saying “This photo was created for Generative AI Publication using Grok 2 Mini.
It’s incredibly impressive. The image not only looks photorealistic but also nails the specific style and atmosphere described in the prompt. The text rendering is also really good, even though there was a small omission (‘was’ missing in the text).
Grok’s image-generation feature lacks restrictions, allowing users to create virtually any type of image. Check out some images of Donald Trump and Kamala Harris generated by some X users:
Some users have pointed out that while Grok claims to have limitations — such as avoiding pornographic or excessively violent content — these rules appear to be inconsistently enforced. This leniency is in stark contrast to other major AI image generators, which often reject prompts involving real people or automatically add identifying watermarks to their images.
With no restrictions on the types of images it can generate, Grok could easily be used as a tool for creating misinformation on X and other platforms.
Access to Grok’s image generation feature is limited to premium users, with a subscription fee of $8 per month.
This pricing is relatively competitive, especially when you compare it to other AI tools. For instance, ChatGPT’s GPT-4 model costs $20 per month, while Midjourney charges $10 per month.
That, of course, does not immediately mean you’re getting a bang for your buck. ChatGPT’s GPT-4o seems to be way ahead of Grok in terms of language model quality, and Midjourney provides a more extensive set of customization options for images.
Weigh what you’re getting against what you might be sacrificing in terms of functionality and versatility.
For developers, there’s more to look forward to. X.ai plans to make both Grok-2 and Grok-2-mini models available through an enterprise API later this month.
We are also releasing Grok-2 and Grok-2 mini to developers through our new enterprise API platform later this month. Our upcoming API is built on a new bespoke tech stack that allows multi-region inference deployments for low-latency access across the world.
Grok has been around for a while, but it hasn’t quite kept up with competitors like ChatGPT and Claude AI—until now. The integration of Flux has revitalized the platform, and I find this chatbot to be interesting once again.
With Flux pushing Stable Diffusion out of the spotlight, it’s going to be interesting to see how Stability AI responds. Will they release an improved SD3 model soon? Come on, Stability AI, don’t go gentle into that good night.
Moreover, Blackforest Labs has hinted at further developments on its website, including a new text-to-video model. If this too gets integrated into Grok, it could pose a serious threat to giants like OpenAI and Anthropic.
Software engineer, writer, solopreneur