OpenAI integrates Images in ChatGPT, allowing users to create images directly in chatbots through the power of the GPT-4o model.
On March 26, OpenAI said that the initial release of Images in ChatGPT is mainly focused on image creation and can be used by users of Plus, Pro, Team or free subscription packages. Previously, ChatGPT allowed image creation but through the Dall-E model, it was limited in features and only allowed to create three free images per day.
On social networks, many people have tried it and expressed surprise at the new tool. "The real-looking images surprised me. If it weren't for the note that this is an AI image, I might not have recognized it. The quality is superior to the image creation tools I've experienced before," said Facebook account Hoang Vy. "In the near future, the images you see online may not be real," said account Cong Tam, while account The Ha commented: "Maybe graphic designers and photo editors will have to upgrade themselves to use AI or lose their jobs."
The Verge quoted OpenAI spokesperson Taya Christianson as saying that the free version will have limited features, but still outperform Dall-E.
"The new feature is a huge improvement over the previous model," said lead researcher Gabriel Goh, adding that his team used the multimodal platform GPT-4o, one of OpenAI's most powerful language models, for ChatGPT's image generation.
According to Goh, one notable improvement in ChatGPT's image generation using GPT-4o is called "Binding" - a term that refers to the degree to which the AI image generator maintains the correct association between attributes and objects. For example, given a prompt for a blue star plus a red triangle, a poorly-bound model would only generate the red star without the triangle. Goh said that most image models struggle with this, often mixing up colors and shapes when given multiple requests at once.
“The new image generator with Binding can accurately associate attributes for 15-20 objects without confusion, which represents a significant improvement in accuracy and reliability,” Goh said.
The image generator on ChatGPT also improves the rendering of text in images, making the text more coherent and “distorted.” This is also a significant challenge, according to Goh, because if the titles or text elements have errors, the entire image is unusable.
In addition, the new tool uses an auto-regression method, which generates images sequentially from left to right and from top to bottom, similar to how text is written, instead of the diffusion model technique used by most image generators. This technical difference is what makes Images in ChatGPT better at rendering and linking text in images.
“This is an iterative process that took months to perfect,” Goh emphasized. While not perfect, he added, ChatGPT’s image generation “has reached a point where the output quality is always usable.”
In a demo of the new feature, OpenAI showed several examples of how seamlessly ChatGPT can generate images, such as a scientific diagram of a Newton prism experiment with correctly labeled color components; a multi-panel comic with consistent characters and speech bubbles; or transparent backgrounds for stickers, logos, and restaurant menus.
However, compared to other models, Images in ChatGPT takes longer to generate images. This is a “worthwhile trade-off,” said Jackie Shannon, head of multimodal products at ChatGPT.
“We’re definitely improving latency, but the ability to generate images and the quality of the images it produces really makes up for the extra seconds of waiting,” Shannon wrote on the blog.
Regarding the risk of creating fake images, nude images, etc., Shannon said Images in ChatGPT has strong protection features, blocking deepfake pornographic content and rejecting “fraudulent” requests, but did not elaborate. The generated images also incorporate C2PA-standard metadata to mark them as AI-generated, which can be searched by tools to detect.
“Of course, no system is perfect, but we are constantly improving our protections,” Shannon added.

