|
--- |
|
language: |
|
- en |
|
- cs |
|
license: cc-by-4.0 |
|
metrics: |
|
- bleurt |
|
- bleu |
|
- bertscore |
|
--- |
|
# AlignScoreCS |
|
|
|
A MultiTask multilingual model is developed to **assess factual consistency in context-claim pairs** across various Natural Language Understanding (NLU) tasks, |
|
including **Summarization**, **Question Answering (QA)**, **Semantic Textual Similarity (STS)**, **Paraphrase**, **Fact Verification (FV)**, and **Natural Language Inference (NLI)**. |
|
AlignScoreCS is fine-tuned on a vast multi-task dataset consisting of 7 million documents, encompassing these NLU tasks in both **Czech** and **English** languages. |
|
Its multilingual pre-training enables its potential utilization in **various other languages**. The architecture is capable of processing tasks using regression, |
|
binary classification, or ternary classification, although for evaluation purposes, we recommend employing the AlignScore function. |
|
|
|
This work is influenced by its English counterpart [AlignScore: Evaluating Factual Consistency with a Unified Alignment Function](https://arxiv.org/abs/2305.16739). |
|
However, we employed homogeneous batches instead of heterogeneous ones during training and utilized three distinct architectures sharing a single encoder. |
|
This setup allows for the independent use of each architecture with its classification head. |
|
|
|
|
|
## Evaluation |
|
As in the paper AlignScore, we use their AlignScore function which chunk context into roughly 350 tokens and splits claim into sentences |
|
each context chunk is evaluated against each claim sentence and aggregated one consistency score |
|
|
|
AlignScoreCS model is built on three XLM-RoBERTa architectures sharing one encoder |
|
|
|
|
|
MultiTask multilingual model for assessing facticity in various NLU tasks in Czech and English language. We followed the initial paper AlignScore https://arxiv.org/abs/2305.16739. |
|
We trained a model using a shared architecture of checkpoint xlm-roberta-large [xlm-roberta](https://huggingface.co/FacebookAI/xlm-roberta-large) with three linear layers for regression, |
|
binary classification and ternary classification. |
|
|
|
|
|
# Usage |
|
```python |
|
# Assuming you copied the attached Files_and_versions/AlignScore.py file for ease of use in transformers. |
|
from AlignScoreCS import AlignScoreCS |
|
alignScoreCS = AlignScoreCS.from_pretrained("krotima1/AlignScoreCS") |
|
# put the model to cuda to accelerate |
|
print(alignScoreCS.score(context="This is context", claim="This is claim")) |
|
|
|
``` |
|
|
|
# Results |
|
|
|
|
|
|
|
# Training datasets |
|
The following table shows datasets that has been utilized for training the model. We translated these english datasets to Czech using seamLessM4t. |
|
|
|
| NLP Task | Dataset | Training Task | Context (n words) | Claim (n words) | Sample Count | |
|
|-----------------------|-------------------|---------------|-------------------|-----------------|--------------| |
|
| NLI | SNLI | 3-way | 10 | 13 | Cs: 500k | |
|
| | | | | | En: 550k | |
|
| | MultiNLI | 3-way | 16 | 20 | Cs: 393k | |
|
| | | | | | En: 393k | |
|
| | Adversarial NLI | 3-way | 48 | 54 | Cs: 163k | |
|
| | | | | | En: 163k | |
|
| | DocNLI | 2-way | 97 | 285 | Cs: 200k | |
|
| | | | | | En: 942k | |
|
| Fact Verification | NLI-style FEVER | 3-way | 48 | 50 | Cs: 208k | |
|
| | | | | | En: 208k | |
|
| | Vitamin C | 3-way | 23 | 25 | Cs: 371k | |
|
| | | | | | En: 371k | |
|
| Paraphrase | QQP | 2-way | 9 | 11 | Cs: 162k | |
|
| | | | | | En: 364k | |
|
| | PAWS | 2-way | - | 18 | Cs: - | |
|
| | | | | | En: 707k | |
|
| | PAWS labeled | 2-way | 18 | - | Cs: 49k | |
|
| | | | | | En: - | |
|
| | PAWS unlabeled | 2-way | 18 | - | Cs: 487k | |
|
| | | | | | En: - | |
|
| STS | SICK | reg | - | 10 | Cs: - | |
|
| | | | | | En: 4k | |
|
| | STS Benchmark | reg | - | 10 | Cs: - | |
|
| | | | | | En: 6k | |
|
| | Free-N1 | reg | 18 | - | Cs: 20k | |
|
| | | | | | En: - | |
|
| QA | SQuAD v2 | 2-way | 105 | 119 | Cs: 130k | |
|
| | | | | | En: 130k | |
|
| | RACE | 2-way | 266 | 273 | Cs: 200k | |
|
| | | | | | En: 351k | |
|
| Information Retrieval| MS MARCO | 2-way | 49 | 56 | Cs: 200k | |
|
| | | | | | En: 5M | |
|
| Summarization | WikiHow | 2-way | 434 | 508 | Cs: 157k | |
|
| | | | | | En: 157k | |
|
| | SumAug | 2-way | - | - | Cs: - | |
|
| | | | | | En: - | |