Chemists and computer scientists tapped AI to find new disinfectants to combat the growing threat of dangerous "superbugs."
The Journal of Chemical Information and Modeling published their computational-experimental framework for developing quaternary ammonium compounds, or QACs, to kill bacteria.
The method yielded 11 new QACs that show activity against antimicrobial-resistant bacteria.
"We believe this is the first example of using AI to generate molecules for disinfectants," says Bill Wuest, Emory professor of chemistry and a senior author of the study. "As an experimental chemist, I find it remarkable to see a machine help design new chemicals."
"We built an effective feedback loop between AI research, computational biochemistry and experimental chemistry," says Liang Zhao, Emory associate professor of computer science. "While we proved that our concept works to generate QACs, we also think that a broad range of scientific areas could benefit from it."
Additional senior authors, whose labs contributed equally to the research, are: Amarda Shehu, a computational biochemist at George Mason University in Virginia; and Kevin Minbiole, an experimental chemist at Villanova University in Pennsylvania.
An arms race with microbes
Check the label on any antimicrobial cleaning product — from homes to hospitals — and you will most likely find QACs listed among the ingredients. QACs are cheap, simple to make and generally effective. As a result, they have remained at the vanguard of sanitizing everything from kitchen counters to operating-room floors for more than a century.
But while QACs are relatively unchanged, bacteria keep evolving — sometimes in ways that allow them to survive the onslaught of these cleaning agents. It's a bit like an arms race, at the microscopic scale. And, lately, some microbes have started winning, becoming dangerous "superbugs" resistant to QACs.
Wuest and Minbiole, leading experts on the problem of rising antimicrobial resistance to disinfectants, are pioneering ways to modify QACs to ensure their potency. They tweak the structure of QAC molecules, synthesize these new designs and then test their power to kill pathogens.
It's a painstaking, time-consuming process.
Zhao wondered if AI could help speed things up. He develops machine learning and artificial intelligence methods to advance scientific discovery and medical diagnostics.
"The design of new molecules is traditionally done one at a time by humans in a chemistry lab," Zhao says. "But an AI model can give you thousands of new designs in one go."
Zhao "knocked on Bill's door."
After some back-and-forth, they decided to form a team, including the labs of Shehu and Minbiole, to develop a computational-experimental framework for QAC discovery.
The National Science Foundation funded the project.
Building a database
Most QACs consist of a nitrogen atom at the center of four carbon chains. In the simplest terms, the positively charged head of the nitrogen center is drawn to the negatively charged phosphates of the fatty acids encasing the cells of bacteria.
Once a QAC is anchored to a bacterium cell, the heads of the carbon chains act like spearpoints, stabbing into both the protective fatty membrane and inner cellular membrane, causing the bacterium to disintegrate.
Improper use of cleaning agents may be one factor leading to bacteria evolving ways to evade this killing power of QACs, Wuest theorizes. And greater use of cleaning agents during the COVID-19 pandemic may have given hard-to-kill pathogens more opportunities to develop resistance, he adds.
Over the past decade, Wuest and Minbiole have built a library of hundreds of new QAC molecules that their labs have designed, synthesized and tested for toxicity against mammalian cells and antimicrobial activity against various pathogens.
"It's the strongest dataset for QACs available anywhere," Wuest says.
A key part of what makes the dataset so powerful, he adds, is its standardization. The chemists followed consistent procedures for testing and categorizing the QACs and assembled the results in a uniform way.
A graph problem
Developing an effective algorithm for the project "was tough," Liang says. "We had to customize the model's architecture so it could be mathematically designed to follow particular chemical rules."
It was essentially a graph problem, he explains. The geometric structure of a molecule can be mapped into graphs where atoms are treated as nodes and chemical bonds as edges.
The researchers broke the problem into a hierarchal, two-step generation process: one step for the nitrogen center of a QAC and the other step for the multiple tails of the molecule. They could then assemble these two parts together.
They drew from 603 molecules in the QAC dataset to train the algorithm.
The model generated around 300 molecular structures. The computer scientists sent these results to the chemists for review. The team members decided to limit this review time to four hours, to ensure that the approach they were developing was practical.
The chemists applied systematic decision criteria to access the AI generated molecules, including QAC geometric conformance, the feasibility of synthesizing them, their structural novelty and their predicted ability for antimicrobial activity.
Their expert analysis determined that nine percent of the generated molecules were good candidates for synthesis and testing. More than half of the compounds, or 65 percent, were not new or showed only incremental changes over existing compounds. Five percent were considered impractical for synthesis.
And 21 percent of the generated molecules were invalid — including 18 percent that were not QACs and three percent that were not compounds of any category.
Improving the method
For a second experimental workflow, the researchers enhanced the process.
They curated the library of 603 QAC compounds, retaining only those showing activity against four dangerous bacterial strains: Staphylococcus aureus, Enteroccus faecalis, Escherichia coli and Psudomonas aeruginosa.
That curation yielded 421 compounds which they used to retrain the AI generative model.
The resulting 2,000 generated candidates underwent a structural validity check, automated through a computer program, to retain only chemically valid molecules. That filter reduced the pool to 800 candidates.
These 800 molecules were further filtered using a computational classifier that predicted their activity against each of the four bacterial strains. The classifier rated each molecule from one to four, depending on how many of the strains a molecule was predicted to have activity against.
This filtering process yielded 300 top-ranked compounds which the laboratory chemists focused on assessing within a four-hour time limit.
This second experimental workflow greatly improved the results. Invalid outputs decreased from 21 percent to zero while the compounds deemed worthy of synthesis increased from 9 percent to 38 percent.
Testing the results
The laboratory chemists selected 29 generated molecules from across both workflows to synthesize and then test for potency. The results yielded 11 novel QACs with experimentally validated powers to inhibit bacterial pathogens.
"One of these QACs especially stands out for having broad activity against all seven strains of bacteria that we used in the testing," Wuest says. "That's including gram-negative bacteria, which are the hardest to kill."
Gram-negative bacterial cells have two membranes, making it harder for a molecule to penetrate, he explains.
The work has already drawn interest from the private sector as a potential model to speed discovery of new and more effective disinfectants, Wuest says.
The framework also serves as a model for scientists from other disciplines for how to gather and standardize datasets for potential AI applications.
"Meanwhile, this research yielded a laundry list of lead compounds for us to study," Wuest says. "We are having undergraduates synthesize and test more of the generated compounds, which is good training for them and will likely lead to more discoveries."
First author of the paper is Shiva Gaemi, a PhD student at George Mason University. Co-authors include Emory PhD students Bo Pan, from Liang's lab; and Alice Wu and Elise Bezold, from the Wuest lab. Additional authors are: Amanda Consylman, Ashley Petersen, Gabe Chang, Alice Wu and Diana McDonough, all from Villanova University, and Mark Forman from Saint Joseph's University in Philadelphia.