A newer version of the Gradio SDK is available:
5.40.0
RPcontact: RNA-Protein Contact Prediction
Improved prediction of RNA-protein contacts using RNA and protein language models
Overview
RPcontact is a novel computational tool for accurately predicting RNA-protein contacts, addressing a fundamental challenge in understanding molecular biology processes such as transcription, splicing, and translation. Traditional methods are limited by the scarcity of RNA-protein complex structures and the constraints of experimental techniques. While recent deep learning approaches like AlphaFold 3 and RoseTTAFoldNA have made progress, they still rely heavily on homologous templates.
RPcontact overcomes these limitations by leveraging large language models specifically designed for RNA (ERNIE-RNA) and proteins (ESM-2). Trained exclusively on ribosomal RNA-protein complexes, RPcontact delivers robust and generalized performance, accurately predicting contacts in both dimeric and multimeric non-rRNA-protein complexes. Benchmark results show that RPcontact significantly outperforms binary contacts inferred from models like AlphaFold 3 and RoseTTAFoldNA, making it a valuable tool for structure and function prediction in RNA-protein research.
Quick Start
Requirements
Dependency | Recommended Version |
---|---|
Python | ≥ 3.8 |
PyTorch | 1.13.1 |
fair-esm | 1.0.2 |
Install dependencies (example):
pip install numpy pandas matplotlib biopython scikit-learn
pip install torch==1.13.1
pip install fair-esm==1.0.2
Script Overview
Script | Function | Example Command |
---|---|---|
predict.py | Single RNA-protein pair contact prediction | python predict.py |
predict_batch.py | Batch RNA-protein pairs contact prediction | python predict_batch.py |
evaluate.py | Evaluation and visualization | python evaluate.py |
app.py | Launch web-based demo interface (need install gradio) | python app.py |
Data Preparation
- RNA/protein sequences: FASTA format
- Embedding features: pickle format
- For batch prediction: provide a CSV file for pairing info
Typical Usage
Single pair prediction:
python predict.py --fasta your_sequence.fasta --out output_dir/
Batch prediction:
python predict_batch.py --rna_fasta rna.fasta --pro_fasta protein.fasta --csv pairs.csv --out output_dir/
Evaluation:
python evaluate.py --fasta your_sequence.fasta --out eval_dir/ --flabel true_labels.pickle
Common Parameters
Parameter | Description |
---|---|
--fasta | Input FASTA file (for single prediction) |
--rna_fasta | RNA FASTA file (for batch prediction) |
--pro_fasta | Protein FASTA file (for batch prediction) |
--csv | RNA-protein pairing info CSV (for batch prediction) |
--ffeat | Precomputed embedding feature file (pickle format) |
--fmodel | Pretrained model file path |
--out | Output directory |
--flabel | True label file (for evaluation) |
--device | Specify device (e.g., cpu or cuda:0) |
--draw | Whether to visualize results |
Output Interpretation
- The prediction output is a contact probability matrix for each RNA-protein pair. Higher scores indicate a higher probability of interaction.
- The evaluation script provides accuracy and other metrics, as well as visualization.
Contact & Citation
Questions or suggestions? Contact:
- Jiuhong Jiang
- Email: jiangjh2023@shanghaitech.edu.cn
If you find this project helpful, please cite our manuscript. - Jiang, J., Zhang, X., Zhan, J., Miao, Z., & Zhou, Y. (2025). RPcontact: Improved prediction of RNA-protein contacts using RNA and protein language models. bioRxiv, 2025-06.
Make RNA-protein contact prediction easier and more accurate!