Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now
Adobe Photoshop is among the most recognizable pieces of software ever created, used by more than 90% of the world’s creative professionals, according to Photutorial.
So the fact that a new open source AI model — Qwen-Image Edit, released yesterday by Chinese e-commerce giant Alibaba’s Qwen Team of AI researchers — is now able to accomplish a huge number of Photoshop-like editing jobs with text inputs alone, is a notable achievement.
Built on the 20-billion-parameter Qwen-Image foundation model released earlier this month, Qwen-Image-Edit extends the system’s unique strengths in text rendering to cover a wide spectrum of editing tasks, from subtle appearance changes to broader semantic transformations.
Simply upload a starting image — I tried one of myself from VentureBeat’s last annual Transform conference in San Francisco — and then type instructions of what you want to change, and Qwen-Image-Edit will return a new image with those edits applied.
AI Scaling Hits Its Limits
Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:
Turning energy into a strategic advantage
Architecting efficient inference for real throughput gains
Unlocking competitive ROI with sustainable AI systems
Secure your spot to stay ahead: https://bit.ly/4mwGngO
Input image example:
Photo credit: Michael O’Donnell Photography
Output image example with prompt: “Make the man wearing a tuxedo.”
The model is available now across several platforms, including Qwen Chat, Hugging Face, ModelScope, GitHub, and through the Alibaba Cloud application programming interface (API), the latter which allows any third-party developer or enterprise to integrate this new model into their own applications and workflows.
I created my examples above on Qwen Chat, the Qwen Team’s rival to OpenAI’s ChatGPT, however, it should be noted for any aspiring users that generations are limited to about 8 free jobs (input/outputs) per 12 hour period before it resets. Paying users can have access to more jobs.
With support for both English and Chinese inputs, and a dual focus on both semantic meaning and visual fidelity, Qwen-Image-Edit aims to lower barriers to professional-grade visual content creation.
And given that the model is available as an open source code under an Apache 2.0 license, it’s safe for enterprises to take, download and set up for free on their own hardware or virtual clouds/machines, potentially resulting in a huge cost savings from proprietary software like Photoshop.
As Junyang Lin, a Qwen Team researcher wrote on X, “it can remove a strand of hair, very delicate image modification.”
The team’s announcement echoes this sentiment, presenting Qwen-Image-Edit not as an entirely new system, but as a natural extension of Qwen-Image that applies its unique text rendering and dual-encoding approach directly to editing tasks.
Dual encodings allow for edits preserving style and content of original image
Qwen-Image-Edit builds on the foundation established by Qwen-Image, which was introduced earlier this year as a large-scale model specializing in both image generation and text rendering.
Qwen-Image’s technical report highlighted its ability to handle complex tasks like paragraph-level text rendering, Chinese and English characters, and multi-line layouts with accuracy.
The report also emphasized a dual-encoding mechanism, feeding images simultaneously into Qwen2.5-VL for semantic control and a variational autoencoder (VAE) for reconstructive detail. This approach allows edits that remain faithful to both the intent of the prompt and the look of the original image.
Those same architectural choices underpin Qwen-Image-Edit. By leveraging dual encodings, the model can adjust at two levels: semant …