CAMP-VQA

Official Code for the following paper:

X. Wang, A. Katsenou, J.Shen and D. Bull. CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video


Performance

We evaluated the proposed model, CAMP-VQA, on the seven main-stream UGC benchmark datasets. The experimental testing included:

  1. Training and testing were performed on each target dataset, referred to as intra-dataset experiments.
  2. Pre-training the model on LSVQ, followed by fine-tuning on the target datasets (denoted as w/ fine-tune), aimed at assessing the model’s transferability and adaptation capabilities.

Performance comparison of CAMP-VQA:

Spearman’s Rank Correlation Coefficient (SRCC)

Model Extra Training Data CVD2014 KoNViD-1k LIVE-VQC YouTube-UGC LSVQ_test LSVQ_1080p FineVD LIVE-YT-Gaming KVQ
CAMP-VQA None 0.933 0.927 0.922 0.901 0.920 0.908 0.919 0.903 0.956
CAMP-VQA (w/ fine-tune) LSVQ 0.966 0.930 0.934 0.912 0.920 0.908 0.924 0.905 0.967

Pearson’s Linear Correlation Coefficient (PLCC)

Model Extra Training Data CVD2014 KoNViD-1k LIVE-VQC YouTube-UGC LSVQ_test LSVQ_1080p FineVD LIVE-YT-Gaming KVQ
CAMP-VQA None 0.944 0.936 0.940 0.920 0.933 0.920 0.923 0.922 0.958
CAMP-VQA (w/ fine-tune) LSVQ 0.964 0.944 0.946 0.928 0.933 0.920 0.933 0.942 0.967

Cross-dataset evaluation when trained on LSVQ

Model Correlation Metrics CVD2014 KoNViD-1k LIVE-VQC YouTube-UGC FineVD LIVE-YT-Gaming KVQ
CAMP-VQA SRCC 0.907 0.926 0.919 0.880 0.865 0.864 0.811
CAMP-VQA PLCC 0.933 0.932 0.937 0.898 0.890 0.884 0.810

More reported results can be found in correlation_result.ipynb.

Proposed Model

The goal of the proposed framework is to evaluate visual quality without reliance on the uncompressed version of a video. This framework, as outlined in Fig, comprises three components: SVE, TME and SEE.

proposed_CAMP-VQA_framework

Usage

πŸ“Œ Install Requirement

The repository is built with Python 3.10 and can be installed via the following commands:

git clone https://github.com/xinyiW915/CAMP-VQA.git
cd CAMP-VQA
conda create -n campvqa python=3.10 -y
conda activate campvqa
pip install -r requirements.txt  

πŸ“₯ Download UGC Datasets

The corresponding UGC video datasets can be downloaded from the following sources:
CVD2014, KoNViD-1k, LIVE-VQC, YouTube-UGC, LSVQ, FineVD, LIVE-YT-Gaming, KVQ

The metadata for the NR-VQA UGC dataset is available under ./metadata.

Once downloaded, place the datasets in any other storage location of your choice. Ensure that the videos_dir in the load_dataset function inside main_camp-vqa.py is updated accordingly.

🎬 Test Demo

Run the pre-trained models to evaluate the quality of a single video.

The model weights provided in ./model/fine_tune for CAMP-VQA (w/ fine-tune) and ./model/best_model for CAMP-VQA, contain the best-performing saved weights from training. To evaluate the quality of a specific video, run the following command:

python camp-vqa_demo.py 
    -device <DEVICE> 
    -intra_cross_experiment <intra/cross>
    -is_finetune <True/False> 
    -save_model_path <MODEL_PATH> 
    -train_data_name <TRAIN_DATA_NAME> 
    -test_data_name <TEST_DATA_NAME>
    -test_video_path <DEMO_TEST_VIDEO>

Or simply try the default demo video by running:

python camp-vqa_demo.py 

Training

Steps to train CAMP-VQA from scratch on different datasets.

See detailed prompt settings in prompts.json.

Extract Features

Run the following command to extract features from videos:

python main_camp-vqa.py -database konvid_1k -num_workers 4 -feature_save_path ../features/

Train Prediction Model

Train our model using extracted features:

python model_regression.py -data_name konvid_1k -feature_path ../features/camp-vqa/ -save_path ../model/

For LSVQ, train the model using:

python model_regression_lsvq.py -data_name lsvq_train -feature_path ../features/camp-vqa/ -save_path ../model/

Fine-Tuning on Trained Model

To fine-tune a pre-trained model on a new dataset:

  1. Turn on the -is_finetune flag.
  2. Set -train_data_name to the dataset used for training.
  3. Set -test_data_name to the dataset you want to fine-tune on.
  4. Make sure -feature_path points correctly to your save path.
python model_finetune.py -train_data_name lsvq_train -test_data_name kvq -is_finetune

Cross-dataset Evaluation on Trained Models

Results where models are trained on one dataset and tested on other datasets.

python model_finetune.py -train_data_name lsvq_train -test_data_name kvq

Ablation Study

We explored the impact of key component on CAMP-VQA performance: semantic embeddings.

Ablation study on the effect of different component semantic embeddings on KoNViD-1k and FineVD datasetsβ€” including image, quality, artifact, and content embeddings (Δ“img, Δ“qlt, Δ“art, contentembs).

Δ“img Δ“qlt Δ“art contentembs KoNViD-1k (SRCC) KoNViD-1k (PLCC) FineVD (SRCC) FineVD (PLCC)
βœ“ 0.778 0.804 0.804 0.817
βœ“ 0.631 0.792 0.816 0.869
βœ“ 0.735 0.763 0.812 0.840
βœ“ 0.409 0.451 0.401 0.409
βœ“ βœ“ 0.830 0.871 0.899 0.911
βœ“ βœ“ βœ“ 0.903 0.922 0.901 0.919
βœ“ βœ“ βœ“ βœ“ 0.892 0.919 0.896 0.917

On Semantic Embeddings (e.g., image, quality, artifact, content):

python main_semantic_embs_ablation.py -database konvid_1k -feat_name semantic_embs -feature_save_path ../features/semantic_embs/

Acknowledgment

This work was funded by the UKRI MyWorld Strength in Places Programme (SIPF00006/1) as part of my PhD study.

Citation

If you find this paper and the repo useful, please cite our paper 😊:

@article{wang2025camp,
      title={CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video},
      author={Wang, Xinyi and Katsenou, Angeliki, Shen, Junxiao and Bull, David},
      booktitle={IEEE/CVF Winter Conference on Applications of Computer Vision (WACV2026)}, 
      year={2025},
}
@article{wang2025diva,
      title={DIVA-VQA: Detecting Inter-Frame Variations in UGC Video Quality},
      author={Wang, Xinyi and Katsenou, Angeliki and Bull, David},
      booktitle={IEEE International Conference on Image Processing (ICIP 2025)}, 
      year={2025},
}
@article{wang2024relax,
      title={ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment},
      author={Wang, Xinyi and Katsenou, Angeliki and Bull, David},
      year={2024},
      eprint={2407.11496},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2407.11496}, 
}

Contact:

Xinyi WANG, xinyi.wang@bristol.ac.uk

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using xinyiW915/CAMP-VQA 1

Collection including xinyiW915/CAMP-VQA