This study is led by Prof. Rui Liu (School of Mathematics, South China University of Technology), Prof. Pei Chen (School of Mathematics, South China University of Technology), and Dr. Jiayuan Zhong (School of Mathematics, Foshan University). Disease progression is inherently dynamic and prone to dramatic shifts over time, often triggered by subtle internal or external disturbances, leading to irreversible and severe consequences. Such a process marked by abrupt critical shifts can typically classified into three phases: the normal stage, pre-disease stage and disease stage (Fig. 1a). The normal stage reflects a relatively healthy condition in disease progression, where the system maintains normality and exhibits high stability. The pre-disease stage marks the critical threshold preceding the appearance of disease symptoms. That is, as this critical point approaches, the patient often experiences a catastrophic and irreversible transition, commonly resulting in deterioration. In contrast to the reversible normal stage, the irreversible deterioration of disease stage poses a serious threat to the life and health of patients. Therefore, grasping the dynamics of disease progression and unveiling the pre-disease stage plays a key role in facilitating early disease intervention and treatment. However, the accurate detection of the pre-deterioration stage or critical point for complex diseases presents a considerable difficulty. There show only minor changes in gene expression patterns and clinical phenotypes between the normal stage and the pre-disease stage. Additionally, challenges such as data noise, patient heterogeneity, limited sample sizes, and model inaccuracies hinder the reliable identification of critical transitions.
The team of Prof.Liu proposes a new and generalized method called sample-perturbed Gaussian graphical model (sPGGM) based on the optimal transport theory and Gaussian graphical models, to identify the critical point or pre-disease stage and discover signaling molecules during disease progression from a sample-specific perspective. Specifically, to reduce irrelevant variables and improve actual biomolecular associations, their proposed sPGGM constructs candidate stages of detection at a single-sample level using a Gaussian graphical model embedded with prior knowledge of the protein-protein interaction network. Then, sPGGM captures the distributional changes between the baseline distribution (fitted from reference samples) and the perturbed distribution (fitted from mixed samples that combine a specific case sample with reference group) through optimal transport (Figs. 1b), and utilize the Wasserstein distance to quantify the relative differences between various detection stages (Figs. 1c). To demonstrate the robustness and effectiveness of sPGGM, they applied it to both simulated data (Fig 2) and various real-world disease datasets, including six cancer datasets from the TCGA database: colon adenocarcinoma (COAD), thyroid carcinoma (THCA), kidney clear cell carcinoma (KIRC), uterine corpus endometrial carcinoma (UCEC), kidney renal papillary cell carcinoma (KIRP), and liver hepatocellular carcinoma (LIHC) (Fig 3). The results indicate that the proposed sPGGM effectively handles real-world disease data, accurately detects pre-disease stages across various disease categories, and identifies signaling molecules at critical points. Moreover, it exhibits a better performance in capturing critical signals of complex diseases compared to other existing single-sample detection approach. In brief, their sPGGM provides a new single-sample way to identify the pre-disease state and discover signaling molecules leading to potential disease, which showcases exceptional effectiveness and robustness for both bulk and single-cell data analyses, offering a novel perspective for personalized disease prediction.