CAMP-VQA
Official Code for the following paper:
X. Wang, A. Katsenou, J.Shen and D. Bull. CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video
Performance
We evaluated the proposed model, CAMP-VQA, on the seven main-stream UGC benchmark datasets. The experimental testing included:
- Training and testing were performed on each target dataset, referred to as intra-dataset experiments.
- Pre-training the model on LSVQ, followed by fine-tuning on the target datasets (denoted as w/ fine-tune), aimed at assessing the modelβs transferability and adaptation capabilities.
Performance comparison of CAMP-VQA:
Spearmanβs Rank Correlation Coefficient (SRCC)
| Model | Extra Training Data | CVD2014 | KoNViD-1k | LIVE-VQC | YouTube-UGC | LSVQ_test | LSVQ_1080p | FineVD | LIVE-YT-Gaming | KVQ |
|---|---|---|---|---|---|---|---|---|---|---|
| CAMP-VQA | None | 0.933 | 0.927 | 0.922 | 0.901 | 0.920 | 0.908 | 0.919 | 0.903 | 0.956 |
| CAMP-VQA (w/ fine-tune) | LSVQ | 0.966 | 0.930 | 0.934 | 0.912 | 0.920 | 0.908 | 0.924 | 0.905 | 0.967 |
Pearsonβs Linear Correlation Coefficient (PLCC)
| Model | Extra Training Data | CVD2014 | KoNViD-1k | LIVE-VQC | YouTube-UGC | LSVQ_test | LSVQ_1080p | FineVD | LIVE-YT-Gaming | KVQ |
|---|---|---|---|---|---|---|---|---|---|---|
| CAMP-VQA | None | 0.944 | 0.936 | 0.940 | 0.920 | 0.933 | 0.920 | 0.923 | 0.922 | 0.958 |
| CAMP-VQA (w/ fine-tune) | LSVQ | 0.964 | 0.944 | 0.946 | 0.928 | 0.933 | 0.920 | 0.933 | 0.942 | 0.967 |
Cross-dataset evaluation when trained on LSVQ
| Model | Correlation Metrics | CVD2014 | KoNViD-1k | LIVE-VQC | YouTube-UGC | FineVD | LIVE-YT-Gaming | KVQ |
|---|---|---|---|---|---|---|---|---|
| CAMP-VQA | SRCC | 0.907 | 0.926 | 0.919 | 0.880 | 0.865 | 0.864 | 0.811 |
| CAMP-VQA | PLCC | 0.933 | 0.932 | 0.937 | 0.898 | 0.890 | 0.884 | 0.810 |
More reported results can be found in correlation_result.ipynb.
Proposed Model
The goal of the proposed framework is to evaluate visual quality without reliance on the uncompressed version of a video. This framework, as outlined in Fig, comprises three components: SVE, TME and SEE.
Usage
π Install Requirement
The repository is built with Python 3.10 and can be installed via the following commands:
git clone https://github.com/xinyiW915/CAMP-VQA.git
cd CAMP-VQA
conda create -n campvqa python=3.10 -y
conda activate campvqa
pip install -r requirements.txt
π₯ Download UGC Datasets
The corresponding UGC video datasets can be downloaded from the following sources:
CVD2014, KoNViD-1k, LIVE-VQC, YouTube-UGC, LSVQ, FineVD, LIVE-YT-Gaming, KVQ
The metadata for the NR-VQA UGC dataset is available under ./metadata.
Once downloaded, place the datasets in any other storage location of your choice. Ensure that the videos_dir in the load_dataset function inside main_camp-vqa.py is updated accordingly.
π¬ Test Demo
Run the pre-trained models to evaluate the quality of a single video.
The model weights provided in ./model/fine_tune for CAMP-VQA (w/ fine-tune) and ./model/best_model for CAMP-VQA, contain the best-performing saved weights from training.
To evaluate the quality of a specific video, run the following command:
python camp-vqa_demo.py
-device <DEVICE>
-intra_cross_experiment <intra/cross>
-is_finetune <True/False>
-save_model_path <MODEL_PATH>
-train_data_name <TRAIN_DATA_NAME>
-test_data_name <TEST_DATA_NAME>
-test_video_path <DEMO_TEST_VIDEO>
Or simply try the default demo video by running:
python camp-vqa_demo.py
Training
Steps to train CAMP-VQA from scratch on different datasets.
See detailed prompt settings in prompts.json.
Extract Features
Run the following command to extract features from videos:
python main_camp-vqa.py -database konvid_1k -num_workers 4 -feature_save_path ../features/
Train Prediction Model
Train our model using extracted features:
python model_regression.py -data_name konvid_1k -feature_path ../features/camp-vqa/ -save_path ../model/
For LSVQ, train the model using:
python model_regression_lsvq.py -data_name lsvq_train -feature_path ../features/camp-vqa/ -save_path ../model/
Fine-Tuning on Trained Model
To fine-tune a pre-trained model on a new dataset:
- Turn on the
-is_finetuneflag. - Set
-train_data_nameto the dataset used for training. - Set
-test_data_nameto the dataset you want to fine-tune on. - Make sure
-feature_pathpoints correctly to your save path.
python model_finetune.py -train_data_name lsvq_train -test_data_name kvq -is_finetune
Cross-dataset Evaluation on Trained Models
Results where models are trained on one dataset and tested on other datasets.
python model_finetune.py -train_data_name lsvq_train -test_data_name kvq
Ablation Study
We explored the impact of key component on CAMP-VQA performance: semantic embeddings.
Ablation study on the effect of different component semantic embeddings on KoNViD-1k and FineVD datasetsβ including image, quality, artifact, and content embeddings (Δimg, Δqlt, Δart, contentembs).
| Δimg | Δqlt | Δart | contentembs | KoNViD-1k (SRCC) | KoNViD-1k (PLCC) | FineVD (SRCC) | FineVD (PLCC) |
|---|---|---|---|---|---|---|---|
| β | 0.778 | 0.804 | 0.804 | 0.817 | |||
| β | 0.631 | 0.792 | 0.816 | 0.869 | |||
| β | 0.735 | 0.763 | 0.812 | 0.840 | |||
| β | 0.409 | 0.451 | 0.401 | 0.409 | |||
| β | β | 0.830 | 0.871 | 0.899 | 0.911 | ||
| β | β | β | 0.903 | 0.922 | 0.901 | 0.919 | |
| β | β | β | β | 0.892 | 0.919 | 0.896 | 0.917 |
On Semantic Embeddings (e.g., image, quality, artifact, content):
python main_semantic_embs_ablation.py -database konvid_1k -feat_name semantic_embs -feature_save_path ../features/semantic_embs/
Acknowledgment
This work was funded by the UKRI MyWorld Strength in Places Programme (SIPF00006/1) as part of my PhD study.
Citation
If you find this paper and the repo useful, please cite our paper π:
@article{wang2025camp,
title={CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video},
author={Wang, Xinyi and Katsenou, Angeliki, Shen, Junxiao and Bull, David},
booktitle={IEEE/CVF Winter Conference on Applications of Computer Vision (WACV2026)},
year={2025},
}
@article{wang2025diva,
title={DIVA-VQA: Detecting Inter-Frame Variations in UGC Video Quality},
author={Wang, Xinyi and Katsenou, Angeliki and Bull, David},
booktitle={IEEE International Conference on Image Processing (ICIP 2025)},
year={2025},
}
@article{wang2024relax,
title={ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment},
author={Wang, Xinyi and Katsenou, Angeliki and Bull, David},
year={2024},
eprint={2407.11496},
archivePrefix={arXiv},
primaryClass={eess.IV},
url={https://arxiv.org/abs/2407.11496},
}
Contact:
Xinyi WANG, xinyi.wang@bristol.ac.uk