Llama 3.1 8B for Historical Newspaper Argument Mining

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct that has undergone two-stage training for argument mining (argumentative unit extraction and enthymeme reconstruction) in historical newspapers.

Training Pipeline

Stage 1: Supervised Fine-Tuning with LoRA

Initial fine-tuning using LoRA/PEFT on meta-llama/Meta-Llama-3.1-8B-Instruct

Stage 2: GRPO Post-Training

Further optimization on oberbics/llama-3.1-newspaper-arguments-your_name-optimized_full_V2 using TRL with Group Relative Policy Optimization (GRPO), a reinforcement learning method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.

Model Details

Model Description

This model extracts argumentative units from historical newspaper texts across multiple languages (Italian, German, French, and English), providing structured XML output suitable for digital humanities research and historical discourse analysis. The two-stage training process combines supervised learning for argument structure with reinforcement learning to improve quality and eliminate duplicate extractions.

Key Information:

Developed by: oberbics
Model type: Causal Language Model (Fine-tuned with LoRA + GRPO)
Language(s) (NLP): Italian, German, French, English
License: Llama 3.1 Community License
Base model: meta-llama/Meta-Llama-3.1-8B-Instruct
Intermediate model: oberbics/llama-3.1-newspaper-arguments-your_name-optimized_full_V2

Intended Uses

Primary Use Cases

Extracting argumentative units from (historical) newspaper articles
Digital humanities research on historical argumentation patterns
Large-scale corpus analysis of multilingual newspaper archives
Enthymeme reconstruction - Implicit Argument Mining

Limitations

Optimized for historical newspaper texts from early 20th century
May require human verification for complex argumentative structures
Performance may vary on texts significantly different from training data (1908 newspapers)

Training and Evaluation Data

The model was trained on a custom dataset of historical newspaper texts from Italian, German, French, and English sources, primarily from 1908, with argumentative annotations.

Training Procedure

Stage 1: Supervised Fine-Tuning (LoRA/PEFT)

Training Hyperparameters

learning_rate: 3e-05
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
lr_scheduler_warmup_steps: 50
num_epochs: 3
mixed_precision_training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss
1.5443	1.0879	50	2.6414
1.1074	2.1758	100	2.6980

Final Evaluation Loss: 2.6980

Stage 2: GRPO Post-Training

This model was further trained using Group Relative Policy Optimization (GRPO), a reinforcement learning method that optimizes the model using group-based rewards to:

Improve argument extraction quality
Eliminate duplicate extractions
Enhance confidence calibration
Maintain multilingual performance

Training Configuration:

Parameter	Value
LoRA adapters	~1-2% parameters updated
Learning rate	3e-05
Epochs	3
Optimizer	8-bit + AMP
Schedule	Cosine + warmup

Usage Example

Using Transformers (Recommended for Argument Mining)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "oberbics/llama-3.1-8B-newspaper_argument_mining",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("oberbics/llama-3.1-8B-newspaper_argument_mining")
tokenizer.pad_token = tokenizer.eos_token

# System prompt for argument extraction
SYSTEM_PROMPT = '''You are an expert at analyzing historical texts and you hate to summarize

OUTPUT FORMAT - EXACTLY these 4 XML tags and NOTHING else:
<argument>Original argument text OR "NA"</argument>
<claim>Core claim (implication) in one sentence OR "NA"</claim>
<explanation>Why this is an argument OR "NA"</explanation>
<confidence>0-1</confidence>

EXAMPLE WITH STRONG ARGUMENT:
<argument>Il giornale L'Italia moderna economica e finanziaria nel numero di oggi propone che non si facciano sottoscrizioni, le quali per quanto larghe sarebbero sempre impari ai bisogni, ma che il Parlamento stabilisca pochi centesimi addizionali per ogni lira su tutte le imposte e tasse (esclusi soltanto i dazi doganali la cui misura è vincolata da trattati di commercio).</argument>
<claim>Private subscriptions are inadequate for earthquake relief; parliamentary taxation would be more effective.</claim>
<explanation>The newspaper explicitly argues against private subscriptions as insufficient and proposes a specific alternative solution through parliamentary taxation, making a clear comparative argument about funding mechanisms.</explanation>
<confidence>0.95</confidence>

EXAMPLE WITHOUT ARGUMENT:
<argument>NA</argument>
<claim>NA</claim>
<explanation>NA</explanation>
<confidence>0.9</confidence>

RULES:
- CRITICAL: NEVER REPEAT ARGUMENTS - Each argument must be COMPLETELY UNIQUE
- Only output arguments that appear verbatim (or nearly verbatim) in the text
- NO SUMMARY; ONLY EXACT EXTRACTION FROM THE TEXT
- Extract only original text without changes or use NA when you did not find an argument
- If no argument exists, use NA for ALL fields
- More than one argument possible for one article'''

# Example article
article = """Your historical newspaper text here"""

# Prepare messages
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"Extract argumentative units from historical text in their original form, no summaries.\n{article}"}
]

# Generate
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(
    inputs,
    max_new_tokens=800,
    temperature=0.1,
    top_p=0.95,
    repetition_penalty=1.15,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

Framework Versions

Stage 1 (Fine-tuning)

PEFT: 0.17.1
Transformers: 4.57.1
PyTorch: 2.9.0+cu128
Datasets: 4.3.0
Tokenizers: 0.22.1

Stage 2 (GRPO)

TRL: 0.25.0.dev0
Transformers: 4.57.1
PyTorch: 2.4.0
Datasets: 4.3.0
Tokenizers: 0.22.1

Citations

Cite GRPO as:

@article{shao2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Cite the base Llama 3.1 model as:

@article{llama3,
  title={The Llama 3 Herd of Models},
  author={AI@Meta},
  year={2024},
  journal={arXiv preprint arXiv:2407.21783}
}

License

This model inherits the Llama 3.1 Community License. See LICENSE for details.

Model Card Contact

For questions or issues, please open an issue on the model repository.

Downloads last month: 263

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for oberbics/llama-3.1-8B-newspaper_argument_mining

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1292)

this model

Space using oberbics/llama-3.1-8B-newspaper_argument_mining 1

Evaluation results

eval_loss on Italian, German, French, and English Historical Newspapers (1908)
self-reported

2.698

View on Papers With Code