How great would it be if a music creator had a colleague who could help solve problems together to come up with initial ideas or when stuck in the middle of a creation, and provide practical help in exploring various musical directions? KAIST researchers have developed AI technology similar to a fellow writer who helps create music.
KAIST (President Kwang-Hyung Lee) has developed an AI-based music creation support system, Amuse, by a research team led by Professor Sung-Ju Lee of the School of Electrical Engineering. The research results were announced on the 7th of May that they won the Best Paper Award at the ACM Conference on Human Factors in Computing Systems (CHI), one of the world's most prestigious international academic conference in the field of human-computer interaction, that was held in Yokohama, Japan from April 26 to May 1, which is given only to the top 1% of all papers.
< (From left) Professor Chris Donahue of Carnegie Mellon University, Ph.D. Student Yewon Kim and Professor Sung-Ju Lee of the School of Electrical Engineering >
The system developed by Professor Sung-Ju Lee's research team, Amuse, is an AI-based system that converts various forms of inspiration such as text, images, and audio into harmonic structures (chord progressions) to support composition.
For example, if a user inputs a phrase, image, or sound clip such as "memories of a warm summer beach", Amuse automatically generates and suggests chord progressions that match the inspiration.
Unlike existing generative AI, Amuse is differentiated in that it respects the user's creative flow and naturally induces creative exploration through an interactive method that allows flexible integration and modification of AI suggestions.
The core technology of the Amuse system is a hybrid generation method that naturally connects and reproduces the two methods by using a large language model to generate music codes that match the letters entered in the prompt with the user's inspiration, and an AI model that has learned actual music data to filter out unnatural or awkward results (rejection sampling).
< Figure 1. Amuse system configuration. After extracting music keywords from user input, a large language model-based code progression is generated and refined through rejection sampling (left). Code extraction from audio input is also possible (right). The bottom is an example visualizing the chord structure of the generated code. >
The research team conducted a user study targeting actual musicians and evaluated that Amuse has high potential as a creative companion, or a Co-Creative AI, a concept in which people and AI collaborate, rather than having a generative AI simply put together a song.
The paper, in which a Ph.D. student Yewon Kim and Professor Sung-Ju Lee of KAIST School of Electrical and Electronic Engineering and Carnegie Mellon University Professor Chris Donahue participated, demonstrated the potential of creative AI system design in both academia and industry.
※ Paper title: Amuse: Human-AI Collaborative Songwriting with Multimodal Inspirations
DOI: https://doi.org/10.1145/3706598.3713818
※ Research demo video: https://youtu.be/udilkRSnftI?si=FNXccC9EjxHOCrm1
※ Research homepage: https://nmsl.kaist.ac.kr/projects/amuse/
Professor Sung-Ju Lee said, "Recent generative AI technology has raised concerns in that it directly imitates copyrighted content, thereby violating the copyright of the creator, or generating results one-way regardless of the creator's intention. Accordingly, the research team was aware of this trend, paid attention to what the creator actually needs, and focused on designing an AI system centered on the creator."
He continued, "Amuse is an attempt to explore the possibility of collaboration with AI while maintaining the initiative of the creator, and is expected to be a starting point for suggesting a more creator-friendly direction in the development of music creation tools and generative AI systems in the future."
This research was conducted with the support of the National Research Foundation of Korea with funding from the government (Ministry of Science and ICT). (RS-2024-00337007)