Llama 3.1 8B for Historical Newspaper Argument Mining
This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct that has undergone two-stage training for argument mining (argumentative unit extraction and enthymeme reconstruction) in historical newspapers.
Training Pipeline
Stage 1: Supervised Fine-Tuning with LoRA
Initial fine-tuning using LoRA/PEFT on meta-llama/Meta-Llama-3.1-8B-Instruct
Stage 2: GRPO Post-Training
Further optimization on oberbics/llama-3.1-newspaper-arguments-your_name-optimized_full_V2 using TRL with Group Relative Policy Optimization (GRPO), a reinforcement learning method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.
Model Details
Model Description
This model extracts argumentative units from historical newspaper texts across multiple languages (Italian, German, French, and English), providing structured XML output suitable for digital humanities research and historical discourse analysis. The two-stage training process combines supervised learning for argument structure with reinforcement learning to improve quality and eliminate duplicate extractions.
Key Information:
- Developed by: oberbics
- Model type: Causal Language Model (Fine-tuned with LoRA + GRPO)
- Language(s) (NLP): Italian, German, French, English
- License: Llama 3.1 Community License
- Base model: meta-llama/Meta-Llama-3.1-8B-Instruct
- Intermediate model: oberbics/llama-3.1-newspaper-arguments-your_name-optimized_full_V2
Intended Uses
Primary Use Cases
- Extracting argumentative units from (historical) newspaper articles
- Digital humanities research on historical argumentation patterns
- Large-scale corpus analysis of multilingual newspaper archives
- Enthymeme reconstruction - Implicit Argument Mining
Limitations
- Optimized for historical newspaper texts from early 20th century
- May require human verification for complex argumentative structures
- Performance may vary on texts significantly different from training data (1908 newspapers)
Training and Evaluation Data
The model was trained on a custom dataset of historical newspaper texts from Italian, German, French, and English sources, primarily from 1908, with argumentative annotations.
Training Procedure
Stage 1: Supervised Fine-Tuning (LoRA/PEFT)
Training Hyperparameters
- learning_rate: 3e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- lr_scheduler_warmup_steps: 50
- num_epochs: 3
- mixed_precision_training: Native AMP
Training Results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 1.5443 | 1.0879 | 50 | 2.6414 |
| 1.1074 | 2.1758 | 100 | 2.6980 |
Final Evaluation Loss: 2.6980
Stage 2: GRPO Post-Training
This model was further trained using Group Relative Policy Optimization (GRPO), a reinforcement learning method that optimizes the model using group-based rewards to:
- Improve argument extraction quality
- Eliminate duplicate extractions
- Enhance confidence calibration
- Maintain multilingual performance
Training Configuration:
| Parameter | Value |
|---|---|
| LoRA adapters | ~1-2% parameters updated |
| Learning rate | 3e-05 |
| Epochs | 3 |
| Optimizer | 8-bit + AMP |
| Schedule | Cosine + warmup |
Usage Example
Using Transformers (Recommended for Argument Mining)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model
model = AutoModelForCausalLM.from_pretrained(
"oberbics/llama-3.1-8B-newspaper_argument_mining",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("oberbics/llama-3.1-8B-newspaper_argument_mining")
tokenizer.pad_token = tokenizer.eos_token
# System prompt for argument extraction
SYSTEM_PROMPT = '''You are an expert at analyzing historical texts and you hate to summarize
OUTPUT FORMAT - EXACTLY these 4 XML tags and NOTHING else:
<argument>Original argument text OR "NA"</argument>
<claim>Core claim (implication) in one sentence OR "NA"</claim>
<explanation>Why this is an argument OR "NA"</explanation>
<confidence>0-1</confidence>
EXAMPLE WITH STRONG ARGUMENT:
<argument>Il giornale L'Italia moderna economica e finanziaria nel numero di oggi propone che non si facciano sottoscrizioni, le quali per quanto larghe sarebbero sempre impari ai bisogni, ma che il Parlamento stabilisca pochi centesimi addizionali per ogni lira su tutte le imposte e tasse (esclusi soltanto i dazi doganali la cui misura รจ vincolata da trattati di commercio).</argument>
<claim>Private subscriptions are inadequate for earthquake relief; parliamentary taxation would be more effective.</claim>
<explanation>The newspaper explicitly argues against private subscriptions as insufficient and proposes a specific alternative solution through parliamentary taxation, making a clear comparative argument about funding mechanisms.</explanation>
<confidence>0.95</confidence>
EXAMPLE WITHOUT ARGUMENT:
<argument>NA</argument>
<claim>NA</claim>
<explanation>NA</explanation>
<confidence>0.9</confidence>
RULES:
- CRITICAL: NEVER REPEAT ARGUMENTS - Each argument must be COMPLETELY UNIQUE
- Only output arguments that appear verbatim (or nearly verbatim) in the text
- NO SUMMARY; ONLY EXACT EXTRACTION FROM THE TEXT
- Extract only original text without changes or use NA when you did not find an argument
- If no argument exists, use NA for ALL fields
- More than one argument possible for one article'''
# Example article
article = """Your historical newspaper text here"""
# Prepare messages
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"Extract argumentative units from historical text in their original form, no summaries.\n{article}"}
]
# Generate
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=800,
temperature=0.1,
top_p=0.95,
repetition_penalty=1.15,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)
Framework Versions
Stage 1 (Fine-tuning)
- PEFT: 0.17.1
- Transformers: 4.57.1
- PyTorch: 2.9.0+cu128
- Datasets: 4.3.0
- Tokenizers: 0.22.1
Stage 2 (GRPO)
- TRL: 0.25.0.dev0
- Transformers: 4.57.1
- PyTorch: 2.4.0
- Datasets: 4.3.0
- Tokenizers: 0.22.1
Citations
Cite GRPO as:
@article{shao2024deepseekmath,
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
year = 2024,
eprint = {arXiv:2402.03300},
}
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
Cite the base Llama 3.1 model as:
@article{llama3,
title={The Llama 3 Herd of Models},
author={AI@Meta},
year={2024},
journal={arXiv preprint arXiv:2407.21783}
}
License
This model inherits the Llama 3.1 Community License. See LICENSE for details.
Model Card Contact
For questions or issues, please open an issue on the model repository.
- Downloads last month
- 263
Model tree for oberbics/llama-3.1-8B-newspaper_argument_mining
Base model
meta-llama/Llama-3.1-8BSpace using oberbics/llama-3.1-8B-newspaper_argument_mining 1
Evaluation results
- eval_loss on Italian, German, French, and English Historical Newspapers (1908)self-reported2.698