license: other tags: - text-classification - multi-label-classification - propaganda-detection

Propaganda Detector Ensemble (Inference Bundle)

This repository provides an inference bundle for multi-label classification of propaganda techniques. The bundle includes model artifacts and configuration for an ensemble composed of:

  • Gemma: google/gemma-2-2b-it (main) and an 8-bit inference variant
  • ModernBERT: answerdotai/ModernBERT-base (binary / auxiliary classifier component)
  • Classical ML: LinearSVC + TF-IDF, optionally with calibration
  • Post-processing artifacts: label list, per-class thresholds, and ensemble metadata

Intended Use

  • Inference via an API service Modal, browser extension backend, batch scoring, and experimentation.
  • The repository is intended for prediction/inference. Training code and datasets are not included.

Licensing & Attribution (IMPORTANT)

This repository uses mixed licensing/terms because it bundles multiple upstream components.

Gemma components (Gemma-2-2B-IT and derivatives)

Gemma-based weights and any derivatives (including fine-tuned and/or quantized variants) are subject to the Gemma Terms of Use: https://ai.google.dev/gemma/terms

ModernBERT component

answerdotai/ModernBERT-base is released under the Apache License 2.0. If ModernBERT weights or derivatives are included in this repository, their use and distribution are subject to Apache-2.0: https://www.apache.org/licenses/LICENSE-2.0

Dataset attribution (SemEval-2020 Task 11 / PTC)

This work uses the Propaganda Techniques Corpus (PTC) from SemEval 2020 Task 11.

The dataset was modified for this project during preprocessing and label setup. In particular:

  • The original span-level annotations were transformed into a classification-ready format (instances derived from annotated fragments with document context).
  • Underrepresented techniques were merged into super-classes as in the task setup (e.g., “Bandwagon” + “Reductio ad Hitlerum”, and “Whataboutism” + “Straw Men” + “Red Herring”).
  • The technique “Obfuscation, Intentional Vagueness, Confusion” was excluded due to very low frequency (as described in the PTC documentation).

The original dataset is not redistributed in this repository. Any modifications are the responsibility of the authors of this repository.

Please cite the following paper when using the PTC corpus:

Da San Martino, G., Yu, S., Barrón-Cedeño, A., Petrov, R., & Nakov, P. (2019). Fine-Grained Analysis of Propaganda in News Articles (EMNLP-IJCNLP 2019).

@InProceedings{EMNLP19DaSanMartino,
  author = {Da San Martino, Giovanni and
            Yu, Seunghak and
            Barr\'{o}n-Cede\~no, Alberto and
            Petrov, Rostislav and
            Nakov, Preslav},
  title = {Fine-Grained Analysis of Propaganda in News Articles},
  booktitle = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and
               9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019},
  year = {2019}
}

Summary note

If any terms conflict, the most restrictive applicable terms for a given component apply. This repository does not grant any additional rights beyond those stated in the upstream licenses/terms.

Limitations

  • Predictions can be sensitive to domain shift, language, and text length.
  • This model may produce false positives/negatives; use as decision support, not as sole authority.

Contact / Notes

If you use this repository, please ensure compliance with the upstream licenses/terms and provide appropriate dataset attribution.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for brsvaaa/propaganda_detector

Finetuned
(1161)
this model