File size: 4,811 Bytes
82d55c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
<p align="center">
  <img src="https://raw.githubusercontent.com/JulseJiang/RPcontact/main/example/logo.png" alt="RPcontact Logo" width="120"/>
</p>

# RPcontact: RNA-Protein Contact Prediction

**Improved prediction of RNA-protein contacts using RNA and protein language models**

[Paper](https://www.biorxiv.org/content/10.1101/2025.06.02.657171v1.full)
[Code](https://github.com/rpcontact)
[Demo](https://julse-rpcontact.hf.space/)


---

## Overview

RPcontact is a novel computational tool for accurately predicting RNA-protein contacts, addressing a fundamental challenge in understanding molecular biology processes such as transcription, splicing, and translation. Traditional methods are limited by the scarcity of RNA-protein complex structures and the constraints of experimental techniques. While recent deep learning approaches like AlphaFold 3 and RoseTTAFoldNA have made progress, they still rely heavily on homologous templates.

RPcontact overcomes these limitations by leveraging large language models specifically designed for RNA ([ERNIE-RNA](https://github.com/Bruce-ywj/ERNIE-RNA)) and proteins ([ESM-2](https://github.com/facebookresearch/esm)). Trained exclusively on ribosomal RNA-protein complexes, RPcontact delivers robust and generalized performance, accurately predicting contacts in both dimeric and multimeric non-rRNA-protein complexes. Benchmark results show that RPcontact significantly outperforms binary contacts inferred from models like AlphaFold 3 and RoseTTAFoldNA, making it a valuable tool for structure and function prediction in RNA-protein research.

---

## Quick Start

### Requirements

| Dependency  | Recommended Version |
|-------------|--------------------|
| Python      | ≥ 3.8              |
| PyTorch     | 1.13.1             |
| fair-esm    | 1.0.2              |

Install dependencies (example):
```bash
pip install numpy pandas matplotlib biopython scikit-learn
pip install torch==1.13.1
pip install fair-esm==1.0.2
```

---

### Script Overview

| Script            | Function                            | Example Command                 |
|-------------------|-------------------------------------|---------------------------------|
| predict.py        | Single RNA-protein pair contact prediction  | `python predict.py`             |
| predict_batch.py  | Batch RNA-protein pairs contact prediction   | `python predict_batch.py`       |
| evaluate.py       | Evaluation and visualization        | `python evaluate.py`            |
| app.py       | Launch web-based demo interface (need install gradio)          | `python app.py`            |

---

### Data Preparation

- RNA/protein sequences: FASTA format
- Embedding features: pickle format
- For batch prediction: provide a CSV file for pairing info

---

### Typical Usage

**Single pair prediction:**
```bash
python predict.py --fasta your_sequence.fasta --out output_dir/
```

**Batch prediction:**
```bash
python predict_batch.py --rna_fasta rna.fasta --pro_fasta protein.fasta --csv pairs.csv --out output_dir/
```

**Evaluation:**
```bash
python evaluate.py --fasta your_sequence.fasta --out eval_dir/ --flabel true_labels.pickle
```

---

### Common Parameters

| Parameter     | Description                                             |
|---------------|--------------------------------------------------------|
| --fasta       | Input FASTA file (for single prediction)               |
| --rna_fasta   | RNA FASTA file (for batch prediction)                  |
| --pro_fasta   | Protein FASTA file (for batch prediction)              |
| --csv         | RNA-protein pairing info CSV (for batch prediction)    |
| --ffeat       | Precomputed embedding feature file (pickle format)     |
| --fmodel      | Pretrained model file path                             |
| --out         | Output directory                                       |
| --flabel      | True label file (for evaluation)                       |
| --device      | Specify device (e.g., cpu or cuda:0)                   |
| --draw        | Whether to visualize results                           |

---

## Output Interpretation

- The prediction output is a contact probability matrix for each RNA-protein pair. Higher scores indicate a higher probability of interaction.
- The evaluation script provides accuracy and other metrics, as well as visualization.

---

## Contact & Citation

Questions or suggestions? Contact:

- Jiuhong Jiang  
- Email: jiangjh2023@shanghaitech.edu.cn

If you find this project helpful, please cite our manuscript.
- Jiang, J., Zhang, X., Zhan, J., Miao, Z., & Zhou, Y. (2025). RPcontact: Improved prediction of RNA-protein contacts using RNA and protein language models. bioRxiv, 2025-06.
---

<p align="center"><em>Make RNA-protein contact prediction easier and more accurate!</em></p>