Representing a molecule in a way that captures both its structure and function is central to tasks such as molecular property prediction, drug drug interaction prediction, and drug target interaction prediction. Graph neural networks (GNNs) have become a popular choice because they model molecules as graphs of atoms and bonds, allowing structure aware learning. However, many existing GNN models operate as black boxes: while they may achieve good accuracy, it is often unclear which functional groups or substructures are responsible for a given prediction, limiting their usefulness for chemical analysis and rational drug design.
To address this limitation, the authors developed MolUNet++, a model that explicitly learns molecular representations at multiple structural scales. The framework integrates Molecular Edge Shrinkage Pooling (MESPool) with a Nested UNet architecture, enabling hierarchical extraction of molecular substructures while preserving global context. In addition, MolUNet++ introduces a substructure masking explainer, which quantitatively evaluates how different molecular fragments contribute to model outputs. This design allows the model to point to chemically meaningful substructures rather than relying solely on abstract node embeddings.
MolUNet++ was evaluated on a range of public benchmarks, including molecular property prediction, drug drug interaction (DDI) prediction, and drug target interaction (DTI) prediction. Across these tasks, the model consistently improved performance, particularly under cold start settings where unseen drugs or targets are involved. Beyond accuracy gains, MolUNet++ supports task specific interpretability by using different query signals, such as molecular fingerprints, paired drug features, or protein sequence features, to guide substructure extraction. As a result, the model provides both reliable predictions and clearer structural explanations, offering a practical tool for interpretable molecular representation learning.
This work entitled " MolUNet++: Adaptive-grained explicit substructure and interaction aware molecular representation learning " was published on Acta Physico-Chimica Sinica (published on November 4, 2025).