legal-passive-to-active-llama-7b

A specialized LoRA fine-tuned model for transforming legal text from passive voice to active voice, built on Llama-2-7b-Chat. This model simplifies complex legal language while maintaining semantic accuracy and legal precision.

Model Description

This model is a LoRA (Low-Rank Adaptation) fine-tuned version of Llama-2-7b-Chat-hf, specifically optimized for passive-to-active voice transformation in legal documents. It was trained on a curated dataset of 319 legal sentences from authoritative sources including UN documents, GDPR, Fair Work Act, and insurance regulations to understand legal syntax, passive constructions, and voice transformation patterns.

Key Features

Legal Text Simplification: Converts passive voice to active voice in legal documents
Domain-Specific: Fine-tuned on authentic legal text from multiple jurisdictions
Efficient Training: Uses QLoRA for memory-efficient fine-tuning
Semantic Preservation: Maintains legal meaning while simplifying sentence structure
Accessibility: Makes legal documents more readable and accessible

Model Details

Developed by: Rafi Al Attrach
Model type: LoRA fine-tuned Llama-2
Language(s): English
License: Apache 2.0
Finetuned from: meta-llama/Llama-2-7b-chat-hf
Training method: QLoRA (4-bit quantization + LoRA)
Research Focus: Legal text simplification and accessibility (2024)

Technical Specifications

Base Model: Llama-2-7b-Chat-hf
LoRA Rank: 64
Training Samples: 319 legal sentences
Data Sources: UN legal documents, GDPR, Fair Work Act, Insurance regulations
Evaluation: BERTScore metrics and human evaluation
Performance: ~6% improvement over base model in human evaluation

Uses

Direct Use

This model is designed for:

Legal document simplification: Converting passive legal text to active voice
Accessibility improvement: Making legal documents more readable
Legal writing assistance: Helping legal professionals write clearer documents
Educational purposes: Teaching legal language transformation
Document processing: Batch processing of legal texts

Example Use Cases

# Transform a legal passive sentence to active voice
passive_sentence = "The contract shall be executed by both parties within 30 days."
# Model output: "Both parties shall execute the contract within 30 days."

# Simplify GDPR text
passive_sentence = "Personal data may be processed by the controller for legitimate interests."
# Model output: "The controller may process personal data for legitimate interests."

How to Get Started

Installation

pip install transformers torch peft accelerate bitsandbytes

Loading the Model

GPU Usage (Recommended)

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model with 4-bit quantization
base_model = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    load_in_4bit=True,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-llama-7b")
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Set pad token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

CPU Usage (Alternative)

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model (CPU compatible)
base_model = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.float32,
    device_map="cpu"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-llama-7b")
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Set pad token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

Usage Example

def transform_passive_to_active(passive_sentence, max_length=512):
    # Create instruction prompt
    instruction = """You are a legal text transformation expert. Your task is to convert passive voice sentences to active voice while maintaining the exact legal meaning and terminology.

Input: Transform the following legal sentence from passive to active voice.

Legal Sentence: """
    
    prompt = instruction + passive_sentence
    inputs = tokenizer(prompt, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
passive = "The agreement shall be signed by the authorized representatives."
active = transform_passive_to_active(passive)
print(active)

Training Details

Training Data

Dataset Size: 319 legal sentences
Source Documents:
- United Nations legal documents
- General Data Protection Regulation (GDPR)
- Fair Work Act (Australia)
- Insurance Council of Australia regulations
Data Split: 85% training, 15% testing (with 15% of training for validation)
Domain: Legal text across multiple jurisdictions

Training Procedure

Method: QLoRA (4-bit quantization + LoRA)
LoRA Configuration: Rank 64, Alpha 16
Library: unsloth (2.2x faster, 43% less VRAM)
Hardware: Tesla T4 GPU (Google Colab)
Training Loss: Downward trending validation loss indicating good generalization

Evaluation Metrics

BERTScore: Semantic similarity evaluation
Human Evaluation: Binary correctness assessment by legal evaluators
Performance Improvement: ~6% increase over base Llama-2 model

Performance

The model was evaluated using both automatic metrics (BERTScore - Precision, Recall, F1) and human evaluation:

BERTScore F1: High semantic similarity preservation
Human Evaluation: ~6% improvement over base model
Strengths: Good transformation of standard passive constructions
Challenges: Complex sentences with nuanced word placement (e.g., "only")

Limitations and Bias

Known Limitations

Word Position Sensitivity: Struggles with sentences where word position significantly alters meaning
Dataset Size: Limited to 319 training samples
Non-Determinism: LLM outputs may vary between runs
Domain Coverage: Primarily trained on English common law and EU legal documents
'By' Constructions: Occasionally faces challenges with sentences containing 'by' (subject indicator)

Recommendations

Validate transformed sentences for legal accuracy before use
Use human review for critical legal documents
Consider context and jurisdiction when applying transformations
Test with domain-specific legal texts for best results

Citation

If you use this model in your research, please cite:

@misc{legal-passive-active-llama2,
  title={legal-passive-to-active-llama2-7b: A LoRA Fine-tuned Model for Legal Voice Transformation},
  author={Rafi Al Attrach},
  year={2024},
  url={https://huggingface.co/rafiaa/legal-passive-to-active-llama-7b}
}

Related Models

Base Model: meta-llama/Llama-2-7b-chat-hf
Enhanced Version: rafiaa/legal-passive-to-active-mistral-7b (Recommended - better performance)

Model Card Contact

Author: Rafi Al Attrach
Model Repository: HuggingFace Model
Issues: Please report issues through the HuggingFace model page

Acknowledgments

Research Project: Legal text simplification and accessibility research (2024)
Training Data: Public legal documents and regulations
Base Model: Meta's Llama-2-7b-Chat-hf

This model is part of a research project on legal text simplification and accessibility, focusing on passive-to-active voice transformation in legal documents.

Downloads last month: 7

Model tree for rafiaa/legal-passive-to-active-llama-7b

Base model

meta-llama/Llama-2-7b-chat-hf

Adapter

(1179)

this model