AI Transforms Text Into Realistic Building Designs

Japan Advanced Institute of Science and Technology

When working on projects, architects must quickly turn rough concepts into visual representations. Text-to-image models offer an opportunity in this field, where high-quality designs can be generated simply by typing a description. Some of these systems can also incorporate rough sketches or depth information, offering additional control over the results. However, these models often fail to generate accurate representations of the prompt. For example, even a direct prompt such as "generate a 5-story building" might result in an image of a building with the incorrect number of floors. The reason lies in the training datasets, which lack detailed annotations about building structure, making it difficult for artificial intelligence (AI) to understand precise spatial requirements, such as floor counts or the exact placement of windows and facade elements.

Researchers at the Japan Advanced Institute of Science and Technology (JAIST) have now addressed these problems with a retrieval-augmented generation system that combines text prompts with information retrieved from external architectural datasets, enabling the model to reference real architectural examples during generation. Such a tool could set the groundwork for AI-generated architectural design tools that make the process easier and faster.

The work, published online in the journal Frontiers of Architectural Research on March 26, 2026, was carried out by a collaborative team led by Associate Professor Haoran Xie from JAIST, together with Associate Professor Ye Zhang from Tianjin University, China.

"Today, high-quality architectural visualization requires significant expertise and expensive software. With the help of this work, individual designers and smaller teams will be able to participate meaningfully in the design of their own built environments, expressing preferences and seeing realistic results without needing a large professional team," said Dr. Xie.

The team designed the framework to mirror real architectural practice. Architects typically begin with simple sketches that show the overall shape and layout of a building. Over time, these sketches are gradually refined with more detailed elements, such as windows, doors, and facade components. The new system follows this step-by-step process.

First, the system converts the text prompt into a simple structural sketch that captures the overall building form and ensures the correct number of floors. Next, it refines this sketch by adding detailed architectural elements using a database of real building components. Finally, the refined sketch is combined with the original text description to produce a realistic, high-quality building rendering that accurately reflects the designer's intent.

To evaluate the framework, the researchers tested it on campus building designs, where controlling the number of floors and the placement of windows and entrances is especially important.

They constructed three specialized datasets: a building box dataset containing 2,200 images, a component dataset with 4,000 images showing different window and entrance arrangements, and a sketch–rendering pair dataset with 1,600 examples linking detailed sketches, text prompts, and final campus building renderings.

In objective evaluations, the framework achieved 70.5% accuracy in vertical configuration and outperformed baseline diffusion models on several quality metrics measuring structural accuracy, visual realism, and alignment between generated images and text prompts.

The results were further supported by a subjective study involving 56 graduate students in architecture and design. Using a five-point Likert scale, where 1 indicated "very dissatisfied" and 5 indicated "very satisfied," participants gave the system average scores above 4 for image quality, alignment with prompts, and architectural detail accuracy.

Such a system could significantly improve early-stage architectural design workflows. "Designers can use it to quickly revise schemes in response to client feedback during meetings, dramatically shortening the design iteration cycle. Planners and developers can use the tool to visualize and compare dozens of design alternatives under shared constraints before any detailed modeling begins," explained Dr. Xie.

As AI continues to evolve, tools like this could make architectural visualization quicker, more accessible, and more reliable.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.