A much faster, more efficient training method developed at the University of Waterloo could help put powerful artificial intelligence (AI) tools in the hands of many more people by reducing the cost and environmental impact of building them.
Large language models (LLMs) are advanced AI systems designed to understand and generate human language by learning patterns in how words and ideas are connected from massive amounts of text in books, articles and websites.
Teaching them to do that now requires months of work and huge quantities of computational power, specialized hardware and electricity, making the costs of development prohibitive to all but large corporations and organizations.
Researchers at Waterloo set out over a year to make the technology cheaper, greener and therefore more accessible - a goal they refer to as 'democratization' - by combining and building on previous efforts to improve training.
The result is SubTrack++, a technique that speeds up pre-training of LLM models - the first and most costly, resource-intensive step in a multi-step process - by up to 50 per cent, while still exceeding state-of-the-art performance in terms of accuracy.
"These are extremely large models which consume a lot of energy, so an improvement of even five per cent translates into big gains," said Dr. Sirisha Rambhatla, a professor of management science and engineering at Waterloo. "Advances like these will help us all build our own LLMs in the long run."
The project also contributes fresh thinking on responsible and accessible AI to Waterloo's Global Futures initiative, which looks to advance innovations to address the world's most pressing challenges.
LLMs are neural networks comprised of vast numerical matrices that learn through trial and error from billions of examples how to predict the next word in a sequence. Each time an LLM makes a mistake, it slightly adjusts its mathematical parameters to improve accuracy.
"In simple terms, training is like letting the model read an entire library and learn how people use language by recognizing patterns in words and ideas," said Sahar Rajabi, a PhD student who led the study.
SubTrack++ accelerates both pre-training to establish a basic foundation and fine-tuning for a specific task by essentially focusing on the most important parameters of a model to simplify the correction process.
Rambhatla, director of the Critical Machine Learning (ML) Lab at Waterloo, compares the highly technical approach to using a map of a mountain, rather than its actual physical features, to plot the quickest route to climb it.
Researchers expect faster, cheaper training for LLMs - which are now capable of such functions as drafting emails and reports - to make it possible and affordable for ordinary people, not just large companies, to build and customize powerful tools of their own.
"By safely learning from personal preferences, LLMs could act as truly personal digital assistants that adapt to each person's style, goals and needs," Rajabi said. "Future models may become intelligent partners in human work and creativity."
Rambhatla's work on researching more efficient AI systems aligns with and contributes to the broader research efforts underway at Waterloo. Dr. Juan Moreno-Cruz, a professor in the Faculty of Environment at Waterloo, recently published a research paper that found U.S. AI power usage has a minimal effect on global greenhouse gas emissions.
Researchers are scheduled to present a paper on the new training method, SubTrack++ : Gradient Subspace Tracking for Scalable LLM Training, at the upcoming Conference on Neural Information Processing Systems (NeurIPS 2025) in Mexico City.
Featured image: Engineering master's student Nayeema Nonta (left), one of the three paper authors, and her supervisor, Dr. Sirisha Rambhatla, in a large server room with the computer power needed to develop their new LLM training technique. (University of Waterloo)