--- tags: - deep-learning - vision - VQA - Transformer - CNN license: mit datasets: - LSVQ - LIVE-VQC - KoNViD-1k - YouTube-UGC - CVD2014 - FineVD - LIVE-YT-Gaming - KVQ model-index: - name: CAMP-VQA results: [] pipeline_tag: visual-question-answering --- # CAMP-VQA Official Code for the following paper: **X. Wang, A. Katsenou, J.Shen and D. Bull**. [CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video](https://arxiv.org/abs/2511.07290) --- ## Performance We evaluated the proposed model, CAMP-VQA, on the seven main-stream UGC benchmark datasets. The experimental testing included: 1. Training and testing were performed on each target dataset, referred to as intra-dataset experiments. 2. Pre-training the model on LSVQ, followed by fine-tuning on the target datasets (denoted as w/ fine-tune), aimed at assessing the model’s transferability and adaptation capabilities. ### **Performance comparison of CAMP-VQA:** Spearman’s Rank Correlation Coefficient (SRCC) | Model | Extra Training Data | CVD2014 | KoNViD-1k | LIVE-VQC | YouTube-UGC | LSVQ_test | LSVQ_1080p | FineVD | LIVE-YT-Gaming | KVQ | |-------------------------|-------------|--------|-----------|----------|-------------|-----------|------------|----------------|----------------|-----------| | CAMP-VQA | None | 0.933 | 0.927 | 0.922 | 0.901 | 0.920 | 0.908 | 0.919 | 0.903 | 0.956 | | **CAMP-VQA (w/ fine-tune)** | LSVQ | **0.966** | **0.930** | **0.934** | **0.912** | **0.920** | **0.908** | **0.924** | **0.905** | **0.967** | Pearson’s Linear Correlation Coefficient (PLCC) | Model | Extra Training Data | CVD2014 | KoNViD-1k | LIVE-VQC | YouTube-UGC | LSVQ_test | LSVQ_1080p | FineVD | LIVE-YT-Gaming | KVQ | |-------------------------|-------------|-------|-------|----------|-------------|-----------|------------|-----------|-----|-----------| | CAMP-VQA | None | 0.944 | 0.936 | 0.940 | 0.920 | 0.933 | 0.920 | 0.923 | 0.922 | 0.958 | | **CAMP-VQA (w/ fine-tune)** | LSVQ | **0.964** | **0.944** | **0.946** | **0.928** | **0.933** | **0.920** | **0.933** | **0.942** | **0.967** | ### **Cross-dataset evaluation when trained on LSVQ** | Model | Correlation Metrics | CVD2014 | KoNViD-1k | LIVE-VQC | YouTube-UGC | FineVD | LIVE-YT-Gaming | KVQ | |----------|---------------------|---------|-------|----------|------------|--------|-------|-------| | CAMP-VQA | SRCC | 0.907 | 0.926 | 0.919 | 0.880 | 0.865 | 0.864 | 0.811 | | CAMP-VQA | PLCC | 0.933 | 0.932 | 0.937 | 0.898 | 0.890 | 0.884 | 0.810 | More reported results can be found in **[correlation_result.ipynb](https://github.com/xinyiW915/CAMP-VQA/blob/main/src/correlation_result.ipynb)**. ## Proposed Model The goal of the proposed framework is to evaluate visual quality without reliance on the uncompressed version of a video. This framework, as outlined in Fig, comprises three components: SVE, TME and SEE. proposed_CAMP-VQA_framework ## Usage ### 📌 Install Requirement The repository is built with **Python 3.10** and can be installed via the following commands: ```shell git clone https://github.com/xinyiW915/CAMP-VQA.git cd CAMP-VQA conda create -n campvqa python=3.10 -y conda activate campvqa pip install -r requirements.txt ``` ### 📥 Download UGC Datasets The corresponding UGC video datasets can be downloaded from the following sources: [CVD2014](https://qualinet.github.io/databases/video/cvd2014_video_database/), [KoNViD-1k](https://database.mmsp-kn.de/konvid-1k-database.html), [LIVE-VQC](https://live.ece.utexas.edu/research/LIVEVQC/), [YouTube-UGC](https://media.withyoutube.com/), [LSVQ](https://github.com/baidut/PatchVQ), [FineVD](https://huggingface.co/datasets/IntMeGroup/FineVD), [LIVE-YT-Gaming](https://live.ece.utexas.edu/research/LIVE-YT-Gaming/index.html/), [KVQ](https://lixinustc.github.io/projects/KVQ/) The metadata for the NR-VQA UGC dataset is available under [`./metadata`](./metadata). Once downloaded, place the datasets in any other storage location of your choice. Ensure that the `videos_dir` in the [`load_dataset`](./src/main_camp-vqa.py) function inside `main_camp-vqa.py` is updated accordingly. ### 🎬 Test Demo Run the pre-trained models to evaluate the quality of a single video. The model weights provided in [`./model/fine_tune`](./model/fine_tune) for CAMP-VQA (w/ fine-tune) and [`./model/best_model`](./model/best_model/) for CAMP-VQA, contain the best-performing saved weights from training. To evaluate the quality of a specific video, run the following command: ```shell python camp-vqa_demo.py -device -intra_cross_experiment -is_finetune -save_model_path -train_data_name -test_data_name -test_video_path ``` Or simply try the default demo video by running: ```shell python camp-vqa_demo.py ``` ## Training Steps to train CAMP-VQA from scratch on different datasets. See detailed prompt settings in [prompts.json](./src/config/prompts.json). ### Extract Features Run the following command to extract features from videos: ```shell python main_camp-vqa.py -database konvid_1k -num_workers 4 -feature_save_path ../features/ ``` ### Train Prediction Model Train our model using extracted features: ```shell python model_regression.py -data_name konvid_1k -feature_path ../features/camp-vqa/ -save_path ../model/ ``` For **LSVQ**, train the model using: ```shell python model_regression_lsvq.py -data_name lsvq_train -feature_path ../features/camp-vqa/ -save_path ../model/ ``` ### Fine-Tuning on Trained Model To fine-tune a pre-trained model on a new dataset: 1. Turn on the `-is_finetune` flag. 2. Set [`-train_data_name`](./src/model_finetune.py) to the dataset used for training. 3. Set [`-test_data_name`](./src/model_finetune.py) to the dataset you want to fine-tune on. 4. Make sure [`-feature_path`](./src/model_finetune.py) points correctly to your save path. ```shell python model_finetune.py -train_data_name lsvq_train -test_data_name kvq -is_finetune ``` ### Cross-dataset Evaluation on Trained Models Results where models are trained on one dataset and tested on other datasets. ```shell python model_finetune.py -train_data_name lsvq_train -test_data_name kvq ``` ## Ablation Study We explored the impact of key component on CAMP-VQA performance: semantic embeddings. Ablation study on the effect of different component semantic embeddings on **KoNViD-1k** and **FineVD** datasets— including **image**, **quality**, **artifact**, and **content** embeddings (*ēimg*, *ēqlt*, *ēart*, *contentembs*). | ēimg | ēqlt | ēart | contentembs | **KoNViD-1k** (SRCC) | **KoNViD-1k** (PLCC) | **FineVD** (SRCC) | **FineVD** (PLCC) | |--------|--------------|-----------------|----------------|-----------------------|----------------------|--------------------|-------------------| | ✓ | | | | 0.778 | 0.804 | 0.804 | 0.817 | | | ✓ | | | 0.631 | 0.792 | 0.816 | 0.869 | | | | ✓ | | 0.735 | 0.763 | 0.812 | 0.840 | | | | | ✓ | 0.409 | 0.451 | 0.401 | 0.409 | | ✓ | ✓ | | | 0.830 | 0.871 | 0.899 | 0.911 | | ✓ | ✓ | ✓ | | **0.903** | **0.922** | **0.901** | **0.919** | | ✓ | ✓ | ✓ | ✓ | 0.892 | 0.919 | 0.896 | 0.917 | ### On Semantic Embeddings (e.g., image, quality, artifact, content): ```shell python main_semantic_embs_ablation.py -database konvid_1k -feat_name semantic_embs -feature_save_path ../features/semantic_embs/ ``` ## Acknowledgment This work was funded by the UKRI MyWorld Strength in Places Programme (SIPF00006/1) as part of my PhD study. ## Citation If you find this paper and the repo useful, please cite our paper 😊: ```bibtex @article{wang2025camp, title={CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video}, author={Wang, Xinyi and Katsenou, Angeliki, Shen, Junxiao and Bull, David}, booktitle={IEEE/CVF Winter Conference on Applications of Computer Vision (WACV2026)}, year={2025}, } @article{wang2025diva, title={DIVA-VQA: Detecting Inter-Frame Variations in UGC Video Quality}, author={Wang, Xinyi and Katsenou, Angeliki and Bull, David}, booktitle={IEEE International Conference on Image Processing (ICIP 2025)}, year={2025}, } @article{wang2024relax, title={ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment}, author={Wang, Xinyi and Katsenou, Angeliki and Bull, David}, year={2024}, eprint={2407.11496}, archivePrefix={arXiv}, primaryClass={eess.IV}, url={https://arxiv.org/abs/2407.11496}, } ``` ## Contact: Xinyi WANG, ```xinyi.wang@bristol.ac.uk```