RPcontact / readme.md
julse's picture
Upload 23 files
82d55c6 verified

A newer version of the Gradio SDK is available: 5.40.0

Upgrade

RPcontact Logo

RPcontact: RNA-Protein Contact Prediction

Improved prediction of RNA-protein contacts using RNA and protein language models

Paper Code Demo


Overview

RPcontact is a novel computational tool for accurately predicting RNA-protein contacts, addressing a fundamental challenge in understanding molecular biology processes such as transcription, splicing, and translation. Traditional methods are limited by the scarcity of RNA-protein complex structures and the constraints of experimental techniques. While recent deep learning approaches like AlphaFold 3 and RoseTTAFoldNA have made progress, they still rely heavily on homologous templates.

RPcontact overcomes these limitations by leveraging large language models specifically designed for RNA (ERNIE-RNA) and proteins (ESM-2). Trained exclusively on ribosomal RNA-protein complexes, RPcontact delivers robust and generalized performance, accurately predicting contacts in both dimeric and multimeric non-rRNA-protein complexes. Benchmark results show that RPcontact significantly outperforms binary contacts inferred from models like AlphaFold 3 and RoseTTAFoldNA, making it a valuable tool for structure and function prediction in RNA-protein research.


Quick Start

Requirements

Dependency Recommended Version
Python ≥ 3.8
PyTorch 1.13.1
fair-esm 1.0.2

Install dependencies (example):

pip install numpy pandas matplotlib biopython scikit-learn
pip install torch==1.13.1
pip install fair-esm==1.0.2

Script Overview

Script Function Example Command
predict.py Single RNA-protein pair contact prediction python predict.py
predict_batch.py Batch RNA-protein pairs contact prediction python predict_batch.py
evaluate.py Evaluation and visualization python evaluate.py
app.py Launch web-based demo interface (need install gradio) python app.py

Data Preparation

  • RNA/protein sequences: FASTA format
  • Embedding features: pickle format
  • For batch prediction: provide a CSV file for pairing info

Typical Usage

Single pair prediction:

python predict.py --fasta your_sequence.fasta --out output_dir/

Batch prediction:

python predict_batch.py --rna_fasta rna.fasta --pro_fasta protein.fasta --csv pairs.csv --out output_dir/

Evaluation:

python evaluate.py --fasta your_sequence.fasta --out eval_dir/ --flabel true_labels.pickle

Common Parameters

Parameter Description
--fasta Input FASTA file (for single prediction)
--rna_fasta RNA FASTA file (for batch prediction)
--pro_fasta Protein FASTA file (for batch prediction)
--csv RNA-protein pairing info CSV (for batch prediction)
--ffeat Precomputed embedding feature file (pickle format)
--fmodel Pretrained model file path
--out Output directory
--flabel True label file (for evaluation)
--device Specify device (e.g., cpu or cuda:0)
--draw Whether to visualize results

Output Interpretation

  • The prediction output is a contact probability matrix for each RNA-protein pair. Higher scores indicate a higher probability of interaction.
  • The evaluation script provides accuracy and other metrics, as well as visualization.

Contact & Citation

Questions or suggestions? Contact:

If you find this project helpful, please cite our manuscript. - Jiang, J., Zhang, X., Zhan, J., Miao, Z., & Zhou, Y. (2025). RPcontact: Improved prediction of RNA-protein contacts using RNA and protein language models. bioRxiv, 2025-06.

Make RNA-protein contact prediction easier and more accurate!