Spaces:
Sleeping
title: AI Contract Risk Analyzer
emoji: π
colorFrom: blue
colorTo: purple
sdk: docker
app_file: Dockerfile
pinned: false
license: mit
AI Contract Risk Analyzer π€βοΈ
Democratizing Legal Intelligence Through AI
Comprehensive contract risk analysis using an integrated pipeline with Legal-BERT, multi-model NLP, and LLM interpretation
π― Overview
The AI Contract Risk Analyzer is a production-grade legal document analysis platform that leverages state-of-the-art NLP and machine learning to provide instant, comprehensive contract risk assessment. Built with a unified orchestration architecture, it integrates Legal-BERT for clause understanding, semantic embeddings for similarity matching, and LLMs for natural language explanations.
Key Features
- π Multi-Format Support: PDF, DOCX, TXT document processing
- π 9 Contract Categories: Employment, NDA, Lease, Service agreements, etc.
- β‘ Sub-60s Analysis: Real-time risk scoring and clause extraction via pre-loaded models
- π Privacy-First: Ephemeral processing, zero data retention
- π LLM Integration: Ollama (local), OpenAI, Anthropic support with fallback
- π Comprehensive Reports: Executive summaries, negotiation playbooks, market comparisons, and downloadable PDFs
- π Integrated Pipeline: A single orchestrator (
PreloadedAnalysisService) ensures consistent context propagation from classification through to final reporting
π Table of Contents
- Architecture
- Installation
- Quick Start
- API Documentation
- Technical Details
- Configuration
- Development
- Performance
- Documentation & Blog
- License
ποΈ Architecture
System Overview
This diagram illustrates the core components and their interactions, highlighting the unified orchestration and the flow of context (specifically the ContractType) through the system.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client Layer β
β (Browser / Mobile / CLI / API Client) β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β REST API
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β FastAPI Backend β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Routes: /analyze, /jobs/{id}, /validate, /health β β
β β Async Processing: BackgroundTasks + Job Queue β β
β β Middleware: CORS, Error Handling, Logging β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β Services Orchestration Layer β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ β
β β Classifier ββββΆβ Clause ββββΆβ Risk Analyzer β β
β β (Legal-BERT)β β Extractor β β (Multi-Factor) β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ β
β β Term β β Protection β β Market β β
β β Analyzer β β Checker β β Comparator β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ β
β βββββββββββββββ ββββββββββββββββ β
β β LLM β β Negotiation β β
β β Interpreter β β Engine β β
β βββββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β Model Management Layer β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Model Registry (Singleton, Thread-Safe) β β
β β - LRU Cache Eviction β β
β β - GPU/CPU Auto-Detection β β
β β - Lazy Loading β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β LLM Manager (Multi-Provider) β β
β β - Ollama (Local, Free) β β
β β - OpenAI (GPT-3.5/4) β β
β β - Anthropic (Claude) β β
β β - Auto-Fallback & Rate Limiting β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β AI Models Layer β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Legal-BERT (nlpaueb/legal-bert-base-uncased) β β
β β - Domain-adapted BERT for legal text β β
β β - 110M parameters, 768-dim embeddings β β
β β - Fine-tuned on 12GB legal corpus β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Sentence-BERT (all-MiniLM-L6-v2) β β
β β - 22M parameters, 384-dim embeddings β β
β β - Semantic similarity engine β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Integrated Analysis Pipeline Flowchart
graph TB
Start[User Uploads Contract] --> Read[Document Reader]
Read --> Validate{Contract Validator}
Validate -->|Invalid| Error[Return Error]
Validate -->|Valid| Classify[Contract Classifier]
Classify --> Extract[RiskClauseExtractor]
Extract --> Analyze[TermAnalyzer + ProtectionChecker]
Analyze --> Score[RiskAnalyzer]
Score --> Generate[Output Generators]
Generate --> Sum[SummaryGenerator]
Generate --> Interp[LLM Interpreter]
Generate --> Neg[Negotiation Engine]
Generate --> PDF[PDF Report Generator]
Sum --> End[JSON Response]
Interp --> End
Neg --> End
PDF --> End
style Start fill:#e1f5e1
style End fill:#e1f5e1
style Error fill:#ffe1e1
style Classify fill:#e1e5ff
style Extract fill:#e1e5ff
style Score fill:#ffe5e1
style Generate fill:#fff5e1
Component Diagram
graph LR
subgraph "Client"
UI[Browser / API Client]
end
subgraph "FastAPI Backend"
API[FastAPI Server]
PAS[PreloadedAnalysisService]
end
subgraph "Core Services"
CC[Contract Classifier]
RCE[Risk Clause Extractor]
TA[Term Analyzer]
PC[Protection Checker]
RA[Comprehensive Risk Analyzer]
SG[Summary Generator]
LI[LLM Interpreter]
NE[Negotiation Engine]
PR[PDF Report Generator]
end
subgraph "Model Management"
MM[Model Manager]
MR[Model Registry]
LM[LLM Manager]
end
subgraph "AI Models"
LB[Legal-BERT]
ST[Sentence-BERT]
OLM[Ollama]
OAI[OpenAI]
ANT[Anthropic]
end
UI --> API
API --> PAS
PAS --> CC
PAS --> RCE
PAS --> TA
PAS --> PC
PAS --> RA
PAS --> SG
PAS --> LI
PAS --> NE
PAS --> PR
CC -.-> RCE
RCE --> TA
RCE --> PC
TA --> RA
PC --> RA
RCE --> RA
RA --> SG
RA --> LI
RA --> NE
SG --> PR
LI --> PR
NE --> PR
PAS --> MM
MM --> MR
MM --> LM
MR --> LB
MR --> ST
LM --> OLM
LM --> OAI
LM --> ANT
π Installation
Prerequisites
# System Requirements
Python: 3.10 or higher
RAM: 16GB recommended (8GB minimum)
Storage: 10GB for models
GPU: Optional (3x speedup with NVIDIA GPU + CUDA 11.8+)
Quick Install
# Clone repository
git clone https://github.com/itobuztech/contract-guard-ai.git
cd contract-guard-ai
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Download spaCy model (optional, for advanced text processing)
python -m spacy download en_core_web_sm
curl -fsSL https://ollama.ai/install.sh | sh
# Initialize models (on first run)
python -c "from model_manager.model_loader import ModelLoader; ModelLoader()"
β‘ Quick Start
1. Start Required Services
# Start Ollama (for local LLM features)
ollama serve
# Pull LLM model
ollama pull llama3:8b
2. Configure Environment
# Copy example environment file
cp .env.example .env
# Edit .env with your settings
nano .env
# .env file
APP_NAME="AI Contract Risk Analyzer"
HOST="0.0.0.0"
PORT=8000
# Ollama (Local LLM - Free)
OLLAMA_BASE_URL="http://localhost:11434"
OLLAMA_MODEL="llama3:8b"
# Optional: OpenAI (for premium LLM features)
OPENAI_API_KEY="sk-..."
# Optional: Anthropic (for premium LLM features)
ANTHROPIC_API_KEY="sk-ant-..."
# Analysis Configuration
MAX_CLAUSES_TO_ANALYZE=15
MIN_CONTRACT_LENGTH=300
3. Launch Application
# Option A: Start API only
python app.py
# Option B: Use Uvicorn directly
uvicorn app:app --reload --host 0.0.0.0 --port 8000
π§ Technical Details
Core Technologies
AI/ML Stack
# Legal Language Models
Legal-BERT: nlpaueb/legal-bert-base-uncased # 110M params, 768-dim
Sentence-BERT: all-MiniLM-L6-v2 # 22M params, 384-dim
# LLM Integration
Ollama: llama3:8b (local, free)
OpenAI: gpt-3.5-turbo, gpt-4
Anthropic: claude-3-sonnet, claude-3-opus
# Deep Learning Framework
PyTorch: 2.1+
Transformers: 4.35+ (Hugging Face)
Backend Stack
# API Framework
FastAPI: 0.104+ (async, type-safe)
Uvicorn: ASGI server (1000+ req/sec)
Pydantic: 2.5+ (data validation)
# Document Processing
PyMuPDF: 1.23+ (superior PDF extraction)
PyPDF2: 3.0+ (fallback PDF reader)
python-docx: 1.1+ (Word documents)
# Async & Performance
aiofiles: async file I/O
asyncio: concurrent processing
Project Structure
contract-guard-ai/
β
βββ app.py # FastAPI application (main entry)
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ README.md # This file
β
βββ config/ # Configuration management
β βββ __init__.py
β βββ settings.py # App settings (FastAPI config)
β βββ model_config.py # Model paths and configurations
β βββ risk_rules.py # Risk scoring rules and weights
β
βββ model_manager/ # Model loading and caching
β βββ __init__.py
β βββ model_loader.py # Lazy model loading
β βββ model_registry.py # Singleton registry with LRU cache
β βββ model_cache.py # Disk-based caching
β βββ llm_manager.py # Multi-provider LLM integration
β
βββ services/ # Business logic services
β βββ __init__.py
β βββ data_models.py # All services' dataclass schema
β βββ contract_classifier.py # Contract type classification
β βββ clause_extractor.py # Clause extraction (Legal-BERT)
β βββ risk_analyzer.py # Multi-factor risk scoring
β βββ term_analyzer.py # Unfavorable terms detection
β βββ protection_checker.py # Missing protections checker
β βββ llm_interpreter.py # LLM-powered clause interpretation
β βββ negotiation_engine.py # Negotiation points generation
β
βββ utils/ # Utility functions
β βββ __init__.py
β βββ document_reader.py # PDF/DOCX text extraction
β βββ text_processor.py # NLP preprocessing
β βββ validators.py # Contract validation
β βββ logger.py # Structured logging
β
βββ models/ # Downloaded AI models (cached)
β βββ legal-bert/
β βββ embeddings/
β
βββ cache/ # Runtime cache
β βββ models/
β
βββ logs/ # Application logs
β βββ contract_analyzer.log
β βββ contract_analyzer_error.log
β βββ contract_analyzer_performance.log
β
βββ static/ # Frontend files
β βββ index.html
β
βββ uploads/ # Temporary upload storage
β
βββ docs/ # Documentation
βββ API_DOCUMENTATION.md
βββ BLOGPOST.md
Mathematical Foundations
Risk Scoring Algorithm
# Overall risk score calculation
R_overall = Ξ£ (Ξ±_i Γ r_i) for i in [1, n]
Where:
α_i = weight for risk category i (Σα_i = 1)
r_i = risk score for category i β [0, 100]
# Category risk score
r_i = f(keyword_score, pattern_score, clause_score, missing_score, benchmark_score)
# Weighted combination
if has_clauses:
r_i = (0.50 Γ clause_score +
0.20 Γ keyword_score +
0.15 Γ pattern_score +
0.15 Γ missing_score)
else:
r_i = (0.40 Γ keyword_score +
0.35 Γ pattern_score +
0.25 Γ missing_score)
Semantic Similarity
# Cosine similarity for clause comparison
sim(clause1, clause2) = cos(e1, e2)
= (e1 Β· e2) / (||e1|| Γ ||e2||)
Where:
e1, e2 = SBERT embeddings β R^384
Β· = dot product
||Β·|| = L2 norm
Confidence Calibration (Platt Scaling)
# Calibrated probability
P(correct | score) = 1 / (1 + exp(A Γ score + B))
Where:
A, B = parameters learned from validation data
score = raw model confidence
Memory Usage
Legal-BERT Model: ~450MB
Sentence-BERT Model: ~100MB
LLM Manager: ~50MB
Total (Idle): ~600MB
Total (Peak): ~1.2GB
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
Research & Models
- Legal-BERT: Ilias Chalkidis, Manos Fergadiotis, et al. (AUEB NLP Group)
- Sentence-BERT: Nils Reimers and Iryna Gurevych
- Hugging Face: Model hosting and Transformers library
- PyTorch Team: Deep learning framework
Libraries & Tools
- FastAPI: SebastiΓ‘n RamΓrez and contributors
- Ollama: Jeffrey Morgan and Ollama team
- PyMuPDF: Artifex Software
- spaCy: Explosion AI team
π Project Status
Current Version: 1.0.0
Status: β
MVP Ready
Last Updated: November 2025
| Component | Status | Coverage |
|---|---|---|
| Core API | β Stable | 92% |
| Model Management | β Stable | 88% |
| Services | β Stable | 85% |
| Documentation | β Complete | 100% |
| Frontend | β Stable | 80% |
| Tests | π‘ In Progress | 50% |
π Documentation & Blog
For detailed technical documentation, including API endpoints, request/response schemas, and error handling, see the API_DOCUMENTATION.md file.
To learn about the research behind the system and our vision for democratizing legal intelligence, read our full BLOGPOST.md file.
Made with β€οΈ by the Itobuz Technologies Private Limited
β’ Documentation β’ Blog
Β© 2025 AI Contract Risk Analyzer. Making legal intelligence accessible to everyone.