---
tags:
- deep-learning
- vision
- VQA
- Transformer
- CNN
license: mit
datasets:
- LSVQ
- LIVE-VQC
- KoNViD-1k
- YouTube-UGC
- CVD2014
- FineVD
- LIVE-YT-Gaming
- KVQ
model-index:
- name: CAMP-VQA
results: []
pipeline_tag: visual-question-answering
---
# CAMP-VQA
Official Code for the following paper:
**X. Wang, A. Katsenou, J.Shen and D. Bull**. [CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video](https://arxiv.org/abs/2511.07290)
---
## Performance
We evaluated the proposed model, CAMP-VQA, on the seven main-stream UGC benchmark datasets. The experimental testing included:
1. Training and testing were performed on each target dataset, referred to as intra-dataset experiments.
2. Pre-training the model on LSVQ, followed by fine-tuning on the target datasets (denoted as w/ fine-tune), aimed at assessing the model’s transferability and adaptation capabilities.
### **Performance comparison of CAMP-VQA:**
Spearman’s Rank Correlation Coefficient (SRCC)
| Model | Extra Training Data | CVD2014 | KoNViD-1k | LIVE-VQC | YouTube-UGC | LSVQ_test | LSVQ_1080p | FineVD | LIVE-YT-Gaming | KVQ |
|-------------------------|-------------|--------|-----------|----------|-------------|-----------|------------|----------------|----------------|-----------|
| CAMP-VQA | None | 0.933 | 0.927 | 0.922 | 0.901 | 0.920 | 0.908 | 0.919 | 0.903 | 0.956 |
| **CAMP-VQA (w/ fine-tune)** | LSVQ | **0.966** | **0.930** | **0.934** | **0.912** | **0.920** | **0.908** | **0.924** | **0.905** | **0.967** |
Pearson’s Linear Correlation Coefficient (PLCC)
| Model | Extra Training Data | CVD2014 | KoNViD-1k | LIVE-VQC | YouTube-UGC | LSVQ_test | LSVQ_1080p | FineVD | LIVE-YT-Gaming | KVQ |
|-------------------------|-------------|-------|-------|----------|-------------|-----------|------------|-----------|-----|-----------|
| CAMP-VQA | None | 0.944 | 0.936 | 0.940 | 0.920 | 0.933 | 0.920 | 0.923 | 0.922 | 0.958 |
| **CAMP-VQA (w/ fine-tune)** | LSVQ | **0.964** | **0.944** | **0.946** | **0.928** | **0.933** | **0.920** | **0.933** | **0.942** | **0.967** |
### **Cross-dataset evaluation when trained on LSVQ**
| Model | Correlation Metrics | CVD2014 | KoNViD-1k | LIVE-VQC | YouTube-UGC | FineVD | LIVE-YT-Gaming | KVQ |
|----------|---------------------|---------|-------|----------|------------|--------|-------|-------|
| CAMP-VQA | SRCC | 0.907 | 0.926 | 0.919 | 0.880 | 0.865 | 0.864 | 0.811 |
| CAMP-VQA | PLCC | 0.933 | 0.932 | 0.937 | 0.898 | 0.890 | 0.884 | 0.810 |
More reported results can be found in **[correlation_result.ipynb](https://github.com/xinyiW915/CAMP-VQA/blob/main/src/correlation_result.ipynb)**.
## Proposed Model
The goal of the proposed framework is to evaluate visual quality without reliance on the uncompressed version of a video. This framework, as outlined in Fig, comprises three components: SVE, TME and SEE.
## Usage
### 📌 Install Requirement
The repository is built with **Python 3.10** and can be installed via the following commands:
```shell
git clone https://github.com/xinyiW915/CAMP-VQA.git
cd CAMP-VQA
conda create -n campvqa python=3.10 -y
conda activate campvqa
pip install -r requirements.txt
```
### 📥 Download UGC Datasets
The corresponding UGC video datasets can be downloaded from the following sources:
[CVD2014](https://qualinet.github.io/databases/video/cvd2014_video_database/), [KoNViD-1k](https://database.mmsp-kn.de/konvid-1k-database.html), [LIVE-VQC](https://live.ece.utexas.edu/research/LIVEVQC/), [YouTube-UGC](https://media.withyoutube.com/), [LSVQ](https://github.com/baidut/PatchVQ), [FineVD](https://huggingface.co/datasets/IntMeGroup/FineVD), [LIVE-YT-Gaming](https://live.ece.utexas.edu/research/LIVE-YT-Gaming/index.html/), [KVQ](https://lixinustc.github.io/projects/KVQ/)
The metadata for the NR-VQA UGC dataset is available under [`./metadata`](./metadata).
Once downloaded, place the datasets in any other storage location of your choice. Ensure that the `videos_dir` in the [`load_dataset`](./src/main_camp-vqa.py) function inside `main_camp-vqa.py` is updated accordingly.
### 🎬 Test Demo
Run the pre-trained models to evaluate the quality of a single video.
The model weights provided in [`./model/fine_tune`](./model/fine_tune) for CAMP-VQA (w/ fine-tune) and [`./model/best_model`](./model/best_model/) for CAMP-VQA, contain the best-performing saved weights from training.
To evaluate the quality of a specific video, run the following command:
```shell
python camp-vqa_demo.py
-device
-intra_cross_experiment
-is_finetune
-save_model_path
-train_data_name
-test_data_name
-test_video_path
```
Or simply try the default demo video by running:
```shell
python camp-vqa_demo.py
```
## Training
Steps to train CAMP-VQA from scratch on different datasets.
See detailed prompt settings in [prompts.json](./src/config/prompts.json).
### Extract Features
Run the following command to extract features from videos:
```shell
python main_camp-vqa.py -database konvid_1k -num_workers 4 -feature_save_path ../features/
```
### Train Prediction Model
Train our model using extracted features:
```shell
python model_regression.py -data_name konvid_1k -feature_path ../features/camp-vqa/ -save_path ../model/
```
For **LSVQ**, train the model using:
```shell
python model_regression_lsvq.py -data_name lsvq_train -feature_path ../features/camp-vqa/ -save_path ../model/
```
### Fine-Tuning on Trained Model
To fine-tune a pre-trained model on a new dataset:
1. Turn on the `-is_finetune` flag.
2. Set [`-train_data_name`](./src/model_finetune.py) to the dataset used for training.
3. Set [`-test_data_name`](./src/model_finetune.py) to the dataset you want to fine-tune on.
4. Make sure [`-feature_path`](./src/model_finetune.py) points correctly to your save path.
```shell
python model_finetune.py -train_data_name lsvq_train -test_data_name kvq -is_finetune
```
### Cross-dataset Evaluation on Trained Models
Results where models are trained on one dataset and tested on other datasets.
```shell
python model_finetune.py -train_data_name lsvq_train -test_data_name kvq
```
## Ablation Study
We explored the impact of key component on CAMP-VQA performance: semantic embeddings.
Ablation study on the effect of different component semantic embeddings on **KoNViD-1k** and **FineVD** datasets— including **image**, **quality**, **artifact**, and **content** embeddings (*ēimg*, *ēqlt*, *ēart*, *contentembs*).
| ēimg | ēqlt | ēart | contentembs | **KoNViD-1k** (SRCC) | **KoNViD-1k** (PLCC) | **FineVD** (SRCC) | **FineVD** (PLCC) |
|--------|--------------|-----------------|----------------|-----------------------|----------------------|--------------------|-------------------|
| ✓ | | | | 0.778 | 0.804 | 0.804 | 0.817 |
| | ✓ | | | 0.631 | 0.792 | 0.816 | 0.869 |
| | | ✓ | | 0.735 | 0.763 | 0.812 | 0.840 |
| | | | ✓ | 0.409 | 0.451 | 0.401 | 0.409 |
| ✓ | ✓ | | | 0.830 | 0.871 | 0.899 | 0.911 |
| ✓ | ✓ | ✓ | | **0.903** | **0.922** | **0.901** | **0.919** |
| ✓ | ✓ | ✓ | ✓ | 0.892 | 0.919 | 0.896 | 0.917 |
### On Semantic Embeddings (e.g., image, quality, artifact, content):
```shell
python main_semantic_embs_ablation.py -database konvid_1k -feat_name semantic_embs -feature_save_path ../features/semantic_embs/
```
## Acknowledgment
This work was funded by the UKRI MyWorld Strength in Places Programme (SIPF00006/1) as part of my PhD study.
## Citation
If you find this paper and the repo useful, please cite our paper 😊:
```bibtex
@article{wang2025camp,
title={CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video},
author={Wang, Xinyi and Katsenou, Angeliki, Shen, Junxiao and Bull, David},
booktitle={IEEE/CVF Winter Conference on Applications of Computer Vision (WACV2026)},
year={2025},
}
@article{wang2025diva,
title={DIVA-VQA: Detecting Inter-Frame Variations in UGC Video Quality},
author={Wang, Xinyi and Katsenou, Angeliki and Bull, David},
booktitle={IEEE International Conference on Image Processing (ICIP 2025)},
year={2025},
}
@article{wang2024relax,
title={ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment},
author={Wang, Xinyi and Katsenou, Angeliki and Bull, David},
year={2024},
eprint={2407.11496},
archivePrefix={arXiv},
primaryClass={eess.IV},
url={https://arxiv.org/abs/2407.11496},
}
```
## Contact:
Xinyi WANG, ```xinyi.wang@bristol.ac.uk```