Providing prior knowledge about the ancestry tree
"Most previous AI algorithms have a hard time analyzing biological data through an evolutionary lens, because they don't know what to look for and get confused by random patterns," says Axel Mosig. The team in Bochum has provided its AI with prior knowledge of the phylogenetic trees of the species being analyzed. This approach is based on classifying groups of four species into the presumably correct ancestry tree when training the AI. The tree contains information about close and distant relationships. "If all groups of four are correctly arranged, the entire ancestry tree can come into place like a puzzle," explains Luis Hack, who also worked on the study. "The AI can then look in the sequences to identify patterns that have evolved throughout this tree."
The kicker: This method works not only for genetic sequence data, but also for any other type of data, such as image data or structural patterns of biomolecules from various species. After the bioinformaticists from RUB initially established the approach for DNA sequence data as part of their current work, they are already exploring its applicability for image data. "For example, you could reconstruct hypothetical images of evolutionary predecessor species," says Hack, explaining the method's potential for future projects.