
New diffusion model revolutionizes AI image generation, solving key issues
New diffusion model revolutionizes AI image generation, solving key issues
Kapil Kajal/Intresting Engineering
Generative artificial intelligence (AI) has historically struggled to produce consistent images, often misinterpreting details such as fingers and facial symmetry. Moreover, when prompted to generate images of different sizes and resolutions, these models can fail. Rice University computer scientists have developed a new method for generating images using pre-trained diffusion models to curb such issues.
These models are generative AI that learns by adding layer after layer of random noise to the images they are trained on and then generates new images by removing the added noise. ElasticDiffusion Moayed Haji Ali, a doctoral student in computer science at Rice University, presented the new approach called ElasticDiffusion in a peer-reviewed paper at the 2024 Institute of Electrical and Electronics Engineers (IEEE) Conference on Computer Vision and Pattern Recognition (CVPR) in Seattle. “Diffusion models like Stable Diffusion, Midjourney, and DALL-E create impressive results, generating fairly lifelike and photorealistic images,” Haji Ali said. “But they have a weakness: They can only generate square images. So, in cases where you have different aspect ratios, like on a monitor or a smartwatch … that’s where these models become problematic.” If you instruct a model like Stable Diffusion to generate a non-square image, such as one with a 16:9 aspect ratio, the elements used to construct the resulting image may become repetitive.
That repetition manifests as abnormal deformities in the image or image subjects, such as individuals with six fingers or a strangely elongated car. The way these models are trained also contributes to the problem. “If you train the model on only images that are a certain resolution, they can only generate images with that resolution,” said Vicente Ordóñez-Román, an associate professor of computer science who advised Haji Ali on his work alongside Guha Balakrishnan, assistant professor of electrical and computer engineering. Overfitting Ordóñez-Román explained that overfitting is a common problem in AI, where the model becomes too specialized in the training data. “You could solve that by training the model on a wider variety of images, but it’s expensive and requires massive amounts of computing power ⎯ hundreds, maybe even thousands of graphics processing units,” Ordóñez-Román said.
According to Haji Ali, digital noise used by diffusion models can be translated into a signal with two data types: local and global. The local signal contains detailed pixel-level information, such as the shape of an eye or the texture of a dog’s fur, while the global signal captures the image’s overall outline. “One reason diffusion models need help with non-square aspect ratios is that they usually package local and global information together,” said Haji Ali, who worked on synthesizing motion in AI-generated videos before joining Ordóñez-Román’s research group at Rice for his Ph.D. studies. “When the model tries to duplicate that data to account for the extra space in a non-square image, it results in visual imperfections.” Different approach The ElasticDiffusion method explained in Haji Ali’s paper takes a unique approach to generating images.
Instead of combining both signals, Elastic Diffusion separates the local and global signals into conditional and unconditional generation paths. It subtracts the conditional model from the unconditional one, resulting in a score encompassing overall image information. After that, the unconditional path with the local pixel-level detail is applied to the image in quadrants, filling in the details one square at a time. Global information, such as the image aspect ratio and the content of the image (e.g., a dog, a person running, etc.), remains separate. This ensures that the AI does not confuse the signals and repeat data. The result is a clearer image that does not require additional training, regardless of the aspect ratio. The only drawback to ElasticDiffusion relative to other diffusion models is time. Currently, it takes up to 6-9 times as long for Haji Ali’s method to make an image.
The goal is to reduce that to the same inference time as other models like Stable Diffusion or DALL-E.
(Except for the headline, this story has not been edited by VoM News staff and is published from the syndicated feed)
Latest Posts
- US believes Iran hasn’t decided to build a nuclear weapon
October 11, 2024 | Breaking News, United States of America, World - Anti-India Protests At Dhaka University
December 26, 2025 | Breaking News, Politics, World - Syria Mosque Bombing During Friday Prayers Kills 8, Injures 18
December 26, 2025 | Breaking News, World - President Droupadi Murmu Launches Santhali Edition of Constitution of India
December 26, 2025 | Breaking News, India, Politics - Mirza Shahzad Akbar, Adviser to Former PM Imran Khan, Assaulted by Masked Men in UK
December 25, 2025 | Breaking News, Politics, World - Libyan Army Chief Mohammed Ali Ahmad al-Haddad Killed in Turkey Jet Crash After Ankara Defence Talks
December 24, 2025 | Breaking News, World - Jammu and Kashmir: Army Officer Dies After Firing Incident in Camp
December 24, 2025 | Breaking News, Jammu Kashmir - Doda: Marmat Premier League T-10 Cricket Tournament to Begin on December 25
December 24, 2025 | Breaking News, Jammu Kashmir, Voice of Marmat - Veteran Masjid An-Nabawi Muadhin Sheikh Faisal bin Abdul Malik Nauman Dies in Madinah
December 24, 2025 | Breaking News, World - Motorola Edge 70 Price in India, Availability, Offers
December 24, 2025 | Tech, Technology - Bitcoin Price Falls Below $88,000 as Investors Turn Cautious Amid Macro Uncertainty
December 24, 2025 | Breaking News, Business, Crypto currency
