Home >> ALL News >> Highlights

Researchers Developed RNA-binding Protein Splicing Activity Prediction Model

2018-11-08

Genome wide studies estimated that at least 90% of human genes undergo some degree of alternative splicing, which is tightly regulated in different tissues and developmental stages. Therefore the disruption of splicing regulation is a common cause of human diseases such as cancer.

Alternative splicing is generally controlled by various trans-acting splicing factors that specifically bind cis-elements in pre-mRNA to promote or suppress splicing reactions. Typical splicing factors have a modular domain configuration, containing one or several RNA binding domains to specifically recognize pre-mRNA targets and the functional domain(s) to control splicing.

While human genome contains hundreds of RNA binding proteins with potential roles in regulating splicing, current knowledge on splicing factor activities are mainly based on studies of several canonical splicing factors, such as SR protein family and hnRNP family. A deep understanding of the splicing regulatory activity in all RNA binding proteins will provide a basis for scientists to further study and synthesize splicing factors with specific activity.


RNA-binding proteins regulate pre-mRNA alternative splicing
(Image by Dr. WANG Zefeng's Group)

On Nov 7th, 2018, Dr. WANG Zefeng’s group at the CAS-MPG Partner Institute of Computational Biology (PICB) published a research article entitled “Modeling and predicting the activities of trans-acting splicing factors with machine learning” in Cell Systems. In this study, the researchers developed a machine learning approach to classify and predict the activities of RNA binding proteins (RBPs) and revealed the association between RBP sequence compositions and their activities in regulating splicing, enabling de novo engineering of artificial splicing factors.

It is previously known that many RNA-binding proteins contain a large number of low-complexity regions, some of this low complexity fragment can affect splicing. Based on the phenomenon, researchers conducted a systematic survey of the low-complexity regions in RNA binding proteins for the splicing regulatory activities using an engineered splicing factor system (up to 12 representative low-complex regions). They further use the survey results as a training dataset and use machine learning approach to learn the hidden rules on how the protein sequences determine their activity. Such approaches led to a predictive model for splicing regulatory activity of peptides. With this framework, scientists discovered new splicing factors with sequence features that have never been reported. Based on the sequence features, they achieved first de novo synthesis of the artificial splicing factor with customized activity with a very high success rate. These findings also pave the way to the development of gene therapy methods based on artificial splicing factors.

This work was mainly carried out by Dr. MAO Miaowei, from East China University of Science and Technology (ECUST) and now a postdoctoral fellow at PICB, HU Yue, from PICB, under the guidance of Dr. WANG Zefeng (PICB). Professor YANG Yi (ECUST) and senior investigator LI Xiaoling from the National Institute of Environmental Health Sciences also participated the work.

This work was supported by National Natural Science Foundation of China, Science and Technology Commission of Shanghai Municipality, the China Scholarship Council, etc.

Media Contact:
WANG Jin (Ms.)
Shanghai Institute of Nutrition and Health,
Chinese Academy of Sciences
Email:
sibssc@sibs.ac.cn