Systemic sclerosis (SSc) is a severe autoimmune disease with complex genetic causes. Some genetic contributors have been identified, but others remain unknown, which has impeded development of targeted treatments. In a new study published in Annals of the Rheumatic Diseases, researchers at Baylor College of Medicine and collaborating institutions used complementary approaches that integrate exome sequencing and evolutionary action machine learning to identify protein changes and their associated mechanisms in SSc.
Previous genome-wide association studies (GWAS) that analyzed the frequency of common genetic variants show the strongest genetic contributors located in the human leucocyte antigen (HLA) region on chromosome six. In this study, researchers led by first author Dr. Shamika Ketkar performed GWAS using exome sequencing data from 2,559 SSc patient cases and 893 healthy control cases in the Scleroderma Family Registry and DNA Repository at the University of Texas Health Science Center at Houston. They aimed to find novel genes and rare variants contributing to SSc risk.
"What truly surprised and excited us was the discovery and replication of MICB, a gene located within the HLA region but acting independently of the classical HLA genes. MICB had not previously been implicated in systemic sclerosis, and its identification represents a novel genetic contributor and a potential therapeutic target," said Ketkar, assistant professor of molecular and human genetics at Baylor.
Collaborators in Spain replicated the findings using previously published European GWAS data comprising nearly 10,000 cases, further strengthening the significance of the findings. At Baylor, Dr. Olivier Lichtarge's lab used its evolutionary action-machine learning (EAML) framework to analyze the exome sequencing data and prioritize genes with high-impact variants predictive of SSc. The results once again pointed to MICB, as well as other genes on chromosome six like NOTCH4 and rare missense variants in genes enriched in interferon signaling (a key pathway in the immune system), including IFI44L and IFIT5.
"With our machine learning framework, we are not only identifying whether a variant occurs frequently, but also, using evolutionary data across all species, we are weighing the likelihood the variant is functionally disruptive to the protein and eventually to the patient," said Lichtarge, Cullen Chair and professor of molecular and human genetics, biochemistry and molecular biology and pharmacology. "We previously used this method in diseases with much larger genome data sets, like Alzheimer's disease and heart disease, and in this study, we show that it can be effective in complex diseases with a smaller patient data set."
To understand the functional impact of the genetic variants identified in the study, researchers integrated publicly available single-cell RNA sequencing data from SSc skin biopsies to resolve cell type-specific expression patterns of risk genes. They also performed expression quantitative trait locus (eQTL) analysis using whole blood datasets to establish regulatory links between disease-associated variants and transcriptomic changes. MICB and NOTCH4 were found to be expressed in fibroblasts and endothelial cells, two cell types that play central roles in fibrosis and vasculopathy, key clinical features of SSc. These complementary analyses confirmed functional regulatory effects of identified risk genes.
"To solve complex diseases like SSc, we need to combine different approaches and machine learning to the analysis of large DNA, RNA and protein data sets to discover otherwise hidden targets for treatment," said corresponding author Dr. Brendan Lee, professor, chair and Robert and Janice McNair Endowed Chair of molecular and human genetics at Baylor.
Other authors who contributed to this work include Hongzheng Dai, Lindsay Burrage, David Murdock, Brian Dawson, Marialbert Acosta-Herrera, Martin Kerick, Javier Martin, Kevin Wilhelm, Jennifer Kay Asmussen, Regeneron Genetics Center, Shervin Assassi and Maureen D. Mayes. They are affiliated with one of the following institutions: Baylor College of Medicine, McGovern Medical School at UTHealth Houston, Institute of Parasitology and Biomedicine Lopez-Neyra and Regeneron Pharmaceuticals Inc.
This work was funded by the National Institute of Arthritis, Musculoskeletal and Skin Diseases of the National Institutes of Health, the University of Texas Health Science Center and the Department of Defense Congressionally Directed Medical Research Program. See the publication for a full list of funding.