Google’s native multimodal AI image generation in Gemini 2.0 Flash impresses with fast edits, style transfers

by | Mar 12, 2025 | Technology

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Google’s latest open source AI model Gemma 3 isn’t the only big news from the Alphabet subsidiary today.

No, in fact, the spotlight may have been stolen by Google’s Gemini 2.0 Flash with native image generation, a new experimental model available for free to users of Google AI Studio and to developers through Google’s Gemini API.

It marks the first time a major U.S. tech company has shipped multimodal image generation directly within a model to consumers. Most other AI image generation tools were diffusion models (image specific ones) hooked up to large language models (LLMs), requiring a bit of interpretation between two models to derive an image that the user asked for in a text prompt. This was the case both for Google’s previous Gemini LLMs connected to its Imagen diffusion models, and OpenAI’s previous (and still, as far as know) current setup of connecting ChatGPT and various underlying LLMs to its DALL-E 3 diffusion model.

By contrast, Gemini 2.0 Flash can generate images natively within the same model that the user types text prompts into, theoretically allowing for greater accuracy and more capabilities — and the early indications are this is entirely true.

Gemini 2.0 Flash, first unveiled in December 2024 but without the native image generation capability switched on for users, integrates multimodal input, reasoning, and natural language understanding to generate images alongside text.

The newly available experimental version, gemini-2.0-flash-exp, enables developers to create illustrations, refine images through conversation, and generate detailed visuals based on world knowledge.

How Gemini 2.0 flash enhances AI-generated images

In a developer-facing blog post published earlier today, Google highlights several key capabilities of Gemini 2.0 Flash’s native image generation:

• Text and Image Storytelling: Developers can use Gemini 2.0 Flash to generate illustrated stories while maintaining consistency in characters and settings. The model also responds to feedback, allowing users to adjust the story or change the art style.

• Conversational Image Editing: The AI supports multi-turn editing, meaning users can iteratively refine an image by providing instructions through …

Article Attribution | Read More at Article Source