legal-passive-to-active-llama-7b

A specialized LoRA fine-tuned model for transforming legal text from passive voice to active voice, built on Llama-2-7b-Chat. This model simplifies complex legal language while maintaining semantic accuracy and legal precision.

Model Description

This model is a LoRA (Low-Rank Adaptation) fine-tuned version of Llama-2-7b-Chat-hf, specifically optimized for passive-to-active voice transformation in legal documents. It was trained on a curated dataset of 319 legal sentences from authoritative sources including UN documents, GDPR, Fair Work Act, and insurance regulations to understand legal syntax, passive constructions, and voice transformation patterns.

Key Features

  • Legal Text Simplification: Converts passive voice to active voice in legal documents
  • Domain-Specific: Fine-tuned on authentic legal text from multiple jurisdictions
  • Efficient Training: Uses QLoRA for memory-efficient fine-tuning
  • Semantic Preservation: Maintains legal meaning while simplifying sentence structure
  • Accessibility: Makes legal documents more readable and accessible

Model Details

  • Developed by: Rafi Al Attrach
  • Model type: LoRA fine-tuned Llama-2
  • Language(s): English
  • License: Apache 2.0
  • Finetuned from: meta-llama/Llama-2-7b-chat-hf
  • Training method: QLoRA (4-bit quantization + LoRA)
  • Research Focus: Legal text simplification and accessibility (2024)

Technical Specifications

  • Base Model: Llama-2-7b-Chat-hf
  • LoRA Rank: 64
  • Training Samples: 319 legal sentences
  • Data Sources: UN legal documents, GDPR, Fair Work Act, Insurance regulations
  • Evaluation: BERTScore metrics and human evaluation
  • Performance: ~6% improvement over base model in human evaluation

Uses

Direct Use

This model is designed for:

  • Legal document simplification: Converting passive legal text to active voice
  • Accessibility improvement: Making legal documents more readable
  • Legal writing assistance: Helping legal professionals write clearer documents
  • Educational purposes: Teaching legal language transformation
  • Document processing: Batch processing of legal texts

Example Use Cases

# Transform a legal passive sentence to active voice
passive_sentence = "The contract shall be executed by both parties within 30 days."
# Model output: "Both parties shall execute the contract within 30 days."
# Simplify GDPR text
passive_sentence = "Personal data may be processed by the controller for legitimate interests."
# Model output: "The controller may process personal data for legitimate interests."

How to Get Started

Installation

pip install transformers torch peft accelerate bitsandbytes

Loading the Model

GPU Usage (Recommended)

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model with 4-bit quantization
base_model = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    load_in_4bit=True,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-llama-7b")
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Set pad token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

CPU Usage (Alternative)

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model (CPU compatible)
base_model = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.float32,
    device_map="cpu"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-llama-7b")
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Set pad token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

Usage Example

def transform_passive_to_active(passive_sentence, max_length=512):
    # Create instruction prompt
    instruction = """You are a legal text transformation expert. Your task is to convert passive voice sentences to active voice while maintaining the exact legal meaning and terminology.

Input: Transform the following legal sentence from passive to active voice.

Legal Sentence: """
    
    prompt = instruction + passive_sentence
    inputs = tokenizer(prompt, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
passive = "The agreement shall be signed by the authorized representatives."
active = transform_passive_to_active(passive)
print(active)

Training Details

Training Data

  • Dataset Size: 319 legal sentences
  • Source Documents:
    • United Nations legal documents
    • General Data Protection Regulation (GDPR)
    • Fair Work Act (Australia)
    • Insurance Council of Australia regulations
  • Data Split: 85% training, 15% testing (with 15% of training for validation)
  • Domain: Legal text across multiple jurisdictions

Training Procedure

  • Method: QLoRA (4-bit quantization + LoRA)
  • LoRA Configuration: Rank 64, Alpha 16
  • Library: unsloth (2.2x faster, 43% less VRAM)
  • Hardware: Tesla T4 GPU (Google Colab)
  • Training Loss: Downward trending validation loss indicating good generalization

Evaluation Metrics

  • BERTScore: Semantic similarity evaluation
  • Human Evaluation: Binary correctness assessment by legal evaluators
  • Performance Improvement: ~6% increase over base Llama-2 model

Performance

The model was evaluated using both automatic metrics (BERTScore - Precision, Recall, F1) and human evaluation:

  • BERTScore F1: High semantic similarity preservation
  • Human Evaluation: ~6% improvement over base model
  • Strengths: Good transformation of standard passive constructions
  • Challenges: Complex sentences with nuanced word placement (e.g., "only")

Limitations and Bias

Known Limitations

  • Word Position Sensitivity: Struggles with sentences where word position significantly alters meaning
  • Dataset Size: Limited to 319 training samples
  • Non-Determinism: LLM outputs may vary between runs
  • Domain Coverage: Primarily trained on English common law and EU legal documents
  • 'By' Constructions: Occasionally faces challenges with sentences containing 'by' (subject indicator)

Recommendations

  • Validate transformed sentences for legal accuracy before use
  • Use human review for critical legal documents
  • Consider context and jurisdiction when applying transformations
  • Test with domain-specific legal texts for best results

Citation

If you use this model in your research, please cite:

@misc{legal-passive-active-llama2,
  title={legal-passive-to-active-llama2-7b: A LoRA Fine-tuned Model for Legal Voice Transformation},
  author={Rafi Al Attrach},
  year={2024},
  url={https://huggingface.co/rafiaa/legal-passive-to-active-llama-7b}
}

Related Models

Model Card Contact

  • Author: Rafi Al Attrach
  • Model Repository: HuggingFace Model
  • Issues: Please report issues through the HuggingFace model page

Acknowledgments

  • Research Project: Legal text simplification and accessibility research (2024)
  • Training Data: Public legal documents and regulations
  • Base Model: Meta's Llama-2-7b-Chat-hf

This model is part of a research project on legal text simplification and accessibility, focusing on passive-to-active voice transformation in legal documents.

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for rafiaa/legal-passive-to-active-llama-7b

Adapter
(1179)
this model