contract-guard-ai / README.md
satyakimitra's picture
README updated
bf475df
metadata
title: AI Contract Risk Analyzer
emoji: πŸ“
colorFrom: blue
colorTo: purple
sdk: docker
app_file: Dockerfile
pinned: false
license: mit

AI Contract Risk Analyzer πŸ€–βš–οΈ

Python 3.10+ FastAPI License: MIT Hugging Face Spaces Transformers PyTorch Legal-BERT Sentence-BERT Ollama Docker spaCy

Democratizing Legal Intelligence Through AI
Comprehensive contract risk analysis using an integrated pipeline with Legal-BERT, multi-model NLP, and LLM interpretation

🎯 Overview

The AI Contract Risk Analyzer is a production-grade legal document analysis platform that leverages state-of-the-art NLP and machine learning to provide instant, comprehensive contract risk assessment. Built with a unified orchestration architecture, it integrates Legal-BERT for clause understanding, semantic embeddings for similarity matching, and LLMs for natural language explanations.

Key Features

  • πŸ“„ Multi-Format Support: PDF, DOCX, TXT document processing
  • πŸ” 9 Contract Categories: Employment, NDA, Lease, Service agreements, etc.
  • ⚑ Sub-60s Analysis: Real-time risk scoring and clause extraction via pre-loaded models
  • πŸ”’ Privacy-First: Ephemeral processing, zero data retention
  • 🌐 LLM Integration: Ollama (local), OpenAI, Anthropic support with fallback
  • πŸ“Š Comprehensive Reports: Executive summaries, negotiation playbooks, market comparisons, and downloadable PDFs
  • πŸ”„ Integrated Pipeline: A single orchestrator (PreloadedAnalysisService) ensures consistent context propagation from classification through to final reporting

πŸ“‹ Table of Contents


πŸ—οΈ Architecture

System Overview

This diagram illustrates the core components and their interactions, highlighting the unified orchestration and the flow of context (specifically the ContractType) through the system.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Client Layer                           β”‚
β”‚  (Browser / Mobile / CLI / API Client)                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚ REST API
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  FastAPI Backend                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Routes: /analyze, /jobs/{id}, /validate, /health    β”‚  β”‚
β”‚  β”‚ Async Processing: BackgroundTasks + Job Queue       β”‚  β”‚
β”‚  β”‚ Middleware: CORS, Error Handling, Logging           β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Services Orchestration Layer                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Classifier  │──▢│ Clause       │──▢│ Risk Analyzer   β”‚   β”‚
β”‚  β”‚ (Legal-BERT)β”‚  β”‚ Extractor    β”‚  β”‚ (Multi-Factor)  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Term        β”‚  β”‚ Protection   β”‚  β”‚ Market          β”‚   β”‚
β”‚  β”‚ Analyzer    β”‚  β”‚ Checker      β”‚  β”‚ Comparator      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”‚
β”‚  β”‚ LLM         β”‚  β”‚ Negotiation  β”‚                         β”‚
β”‚  β”‚ Interpreter β”‚  β”‚ Engine       β”‚                         β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                Model Management Layer                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Model Registry (Singleton, Thread-Safe)             β”‚   β”‚
β”‚  β”‚ - LRU Cache Eviction                                β”‚   β”‚
β”‚  β”‚ - GPU/CPU Auto-Detection                            β”‚   β”‚
β”‚  β”‚ - Lazy Loading                                      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ LLM Manager (Multi-Provider)                        β”‚   β”‚
β”‚  β”‚ - Ollama (Local, Free)                              β”‚   β”‚
β”‚  β”‚ - OpenAI (GPT-3.5/4)                                β”‚   β”‚
β”‚  β”‚ - Anthropic (Claude)                                β”‚   β”‚
β”‚  β”‚ - Auto-Fallback & Rate Limiting                     β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   AI Models Layer                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Legal-BERT (nlpaueb/legal-bert-base-uncased)        β”‚  β”‚
β”‚  β”‚ - Domain-adapted BERT for legal text                β”‚  β”‚
β”‚  β”‚ - 110M parameters, 768-dim embeddings               β”‚  β”‚
β”‚  β”‚ - Fine-tuned on 12GB legal corpus                   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Sentence-BERT (all-MiniLM-L6-v2)                    β”‚  β”‚
β”‚  β”‚ - 22M parameters, 384-dim embeddings                β”‚  β”‚
β”‚  β”‚ - Semantic similarity engine                        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Integrated Analysis Pipeline Flowchart

graph TB
    Start[User Uploads Contract] --> Read[Document Reader]
    Read --> Validate{Contract Validator}
    Validate -->|Invalid| Error[Return Error]
    Validate -->|Valid| Classify[Contract Classifier]
    
    Classify --> Extract[RiskClauseExtractor]
    Extract --> Analyze[TermAnalyzer + ProtectionChecker]
    Analyze --> Score[RiskAnalyzer]
    Score --> Generate[Output Generators]
    
    Generate --> Sum[SummaryGenerator]
    Generate --> Interp[LLM Interpreter]
    Generate --> Neg[Negotiation Engine]
    Generate --> PDF[PDF Report Generator]
    
    Sum --> End[JSON Response]
    Interp --> End
    Neg --> End
    PDF --> End
    
    style Start fill:#e1f5e1
    style End fill:#e1f5e1
    style Error fill:#ffe1e1
    style Classify fill:#e1e5ff
    style Extract fill:#e1e5ff
    style Score fill:#ffe5e1
    style Generate fill:#fff5e1

Component Diagram

graph LR
    subgraph "Client"
        UI[Browser / API Client]
    end

    subgraph "FastAPI Backend"
        API[FastAPI Server]
        PAS[PreloadedAnalysisService]
    end

    subgraph "Core Services"
        CC[Contract Classifier]
        RCE[Risk Clause Extractor]
        TA[Term Analyzer]
        PC[Protection Checker]
        RA[Comprehensive Risk Analyzer]
        SG[Summary Generator]
        LI[LLM Interpreter]
        NE[Negotiation Engine]
        PR[PDF Report Generator]
    end

    subgraph "Model Management"
        MM[Model Manager]
        MR[Model Registry]
        LM[LLM Manager]
    end

    subgraph "AI Models"
        LB[Legal-BERT]
        ST[Sentence-BERT]
        OLM[Ollama]
        OAI[OpenAI]
        ANT[Anthropic]
    end

    UI --> API
    API --> PAS
    PAS --> CC
    PAS --> RCE
    PAS --> TA
    PAS --> PC
    PAS --> RA
    PAS --> SG
    PAS --> LI
    PAS --> NE
    PAS --> PR

    CC -.-> RCE
    RCE --> TA
    RCE --> PC
    TA --> RA
    PC --> RA
    RCE --> RA

    RA --> SG
    RA --> LI
    RA --> NE
    SG --> PR
    LI --> PR
    NE --> PR

    PAS --> MM
    MM --> MR
    MM --> LM

    MR --> LB
    MR --> ST
    LM --> OLM
    LM --> OAI
    LM --> ANT

πŸš€ Installation

Prerequisites

# System Requirements
Python: 3.10 or higher
RAM: 16GB recommended (8GB minimum)
Storage: 10GB for models
GPU: Optional (3x speedup with NVIDIA GPU + CUDA 11.8+)

Quick Install

# Clone repository
git clone https://github.com/itobuztech/contract-guard-ai.git  
cd contract-guard-ai

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download spaCy model (optional, for advanced text processing)
python -m spacy download en_core_web_sm

curl -fsSL https://ollama.ai/install.sh | sh

# Initialize models (on first run)
python -c "from model_manager.model_loader import ModelLoader; ModelLoader()"

⚑ Quick Start

1. Start Required Services

# Start Ollama (for local LLM features)
ollama serve

# Pull LLM model
ollama pull llama3:8b

2. Configure Environment

# Copy example environment file
cp .env.example .env

# Edit .env with your settings
nano .env
# .env file
APP_NAME="AI Contract Risk Analyzer"
HOST="0.0.0.0"
PORT=8000

# Ollama (Local LLM - Free)
OLLAMA_BASE_URL="http://localhost:11434"
OLLAMA_MODEL="llama3:8b"

# Optional: OpenAI (for premium LLM features)
OPENAI_API_KEY="sk-..."

# Optional: Anthropic (for premium LLM features)
ANTHROPIC_API_KEY="sk-ant-..."

# Analysis Configuration
MAX_CLAUSES_TO_ANALYZE=15
MIN_CONTRACT_LENGTH=300

3. Launch Application

# Option A: Start API only
python app.py

# Option B: Use Uvicorn directly
uvicorn app:app --reload --host 0.0.0.0 --port 8000

πŸ”§ Technical Details

Core Technologies

AI/ML Stack

# Legal Language Models
Legal-BERT: nlpaueb/legal-bert-base-uncased  # 110M params, 768-dim
Sentence-BERT: all-MiniLM-L6-v2              # 22M params, 384-dim

# LLM Integration
Ollama: llama3:8b (local, free)
OpenAI: gpt-3.5-turbo, gpt-4
Anthropic: claude-3-sonnet, claude-3-opus

# Deep Learning Framework
PyTorch: 2.1+
Transformers: 4.35+ (Hugging Face)

Backend Stack

# API Framework
FastAPI: 0.104+ (async, type-safe)
Uvicorn: ASGI server (1000+ req/sec)
Pydantic: 2.5+ (data validation)

# Document Processing
PyMuPDF: 1.23+ (superior PDF extraction)
PyPDF2: 3.0+ (fallback PDF reader)
python-docx: 1.1+ (Word documents)

# Async & Performance
aiofiles: async file I/O
asyncio: concurrent processing

Project Structure

contract-guard-ai/
β”‚
β”œβ”€β”€ app.py                      # FastAPI application (main entry)
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ .env.example                # Environment variables template
β”œβ”€β”€ README.md                   # This file
β”‚
β”œβ”€β”€ config/                     # Configuration management
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ settings.py             # App settings (FastAPI config)
β”‚   β”œβ”€β”€ model_config.py         # Model paths and configurations
β”‚   └── risk_rules.py           # Risk scoring rules and weights
β”‚
β”œβ”€β”€ model_manager/              # Model loading and caching
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ model_loader.py         # Lazy model loading
β”‚   β”œβ”€β”€ model_registry.py       # Singleton registry with LRU cache
β”‚   β”œβ”€β”€ model_cache.py          # Disk-based caching
β”‚   └── llm_manager.py          # Multi-provider LLM integration
β”‚
β”œβ”€β”€ services/                   # Business logic services
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ data_models.py          # All services' dataclass schema
β”‚   β”œβ”€β”€ contract_classifier.py  # Contract type classification
β”‚   β”œβ”€β”€ clause_extractor.py     # Clause extraction (Legal-BERT)
β”‚   β”œβ”€β”€ risk_analyzer.py        # Multi-factor risk scoring
β”‚   β”œβ”€β”€ term_analyzer.py        # Unfavorable terms detection
β”‚   β”œβ”€β”€ protection_checker.py   # Missing protections checker
β”‚   β”œβ”€β”€ llm_interpreter.py      # LLM-powered clause interpretation
β”‚   β”œβ”€β”€ negotiation_engine.py   # Negotiation points generation
β”‚
β”œβ”€β”€ utils/                      # Utility functions
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ document_reader.py      # PDF/DOCX text extraction
β”‚   β”œβ”€β”€ text_processor.py       # NLP preprocessing
β”‚   β”œβ”€β”€ validators.py           # Contract validation
β”‚   └── logger.py               # Structured logging
β”‚
β”œβ”€β”€ models/                     # Downloaded AI models (cached)
β”‚   β”œβ”€β”€ legal-bert/
β”‚   └── embeddings/
β”‚
β”œβ”€β”€ cache/                      # Runtime cache
β”‚   └── models/
β”‚
β”œβ”€β”€ logs/                       # Application logs
β”‚   β”œβ”€β”€ contract_analyzer.log
β”‚   β”œβ”€β”€ contract_analyzer_error.log
β”‚   └── contract_analyzer_performance.log
β”‚
β”œβ”€β”€ static/                     # Frontend files
β”‚   └── index.html
β”‚
β”œβ”€β”€ uploads/                    # Temporary upload storage
β”‚
└── docs/                       # Documentation
   β”œβ”€β”€ API_DOCUMENTATION.md
   └── BLOGPOST.md

Mathematical Foundations

Risk Scoring Algorithm

# Overall risk score calculation
R_overall = Ξ£ (Ξ±_i Γ— r_i)  for i in [1, n]

Where:
  α_i = weight for risk category i (Σα_i = 1)
  r_i = risk score for category i ∈ [0, 100]

# Category risk score
r_i = f(keyword_score, pattern_score, clause_score, missing_score, benchmark_score)

# Weighted combination
if has_clauses:
    r_i = (0.50 Γ— clause_score +
           0.20 Γ— keyword_score +
           0.15 Γ— pattern_score +
           0.15 Γ— missing_score)
else:
    r_i = (0.40 Γ— keyword_score +
           0.35 Γ— pattern_score +
           0.25 Γ— missing_score)

Semantic Similarity

# Cosine similarity for clause comparison
sim(clause1, clause2) = cos(e1, e2)
                      = (e1 Β· e2) / (||e1|| Γ— ||e2||)

Where:
  e1, e2 = SBERT embeddings ∈ R^384
  Β· = dot product
  ||Β·|| = L2 norm

Confidence Calibration (Platt Scaling)

# Calibrated probability
P(correct | score) = 1 / (1 + exp(A Γ— score + B))

Where:
  A, B = parameters learned from validation data
  score = raw model confidence

Memory Usage

Legal-BERT Model: ~450MB
Sentence-BERT Model: ~100MB
LLM Manager: ~50MB
Total (Idle): ~600MB
Total (Peak): ~1.2GB

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

Research & Models

  • Legal-BERT: Ilias Chalkidis, Manos Fergadiotis, et al. (AUEB NLP Group)
  • Sentence-BERT: Nils Reimers and Iryna Gurevych
  • Hugging Face: Model hosting and Transformers library
  • PyTorch Team: Deep learning framework

Libraries & Tools

  • FastAPI: SebastiΓ‘n RamΓ­rez and contributors
  • Ollama: Jeffrey Morgan and Ollama team
  • PyMuPDF: Artifex Software
  • spaCy: Explosion AI team

πŸ“ˆ Project Status

Current Version: 1.0.0
Status: βœ… MVP Ready
Last Updated: November 2025

Component Status Coverage
Core API βœ… Stable 92%
Model Management βœ… Stable 88%
Services βœ… Stable 85%
Documentation βœ… Complete 100%
Frontend βœ… Stable 80%
Tests 🟑 In Progress 50%

πŸ“š Documentation & Blog

  • For detailed technical documentation, including API endpoints, request/response schemas, and error handling, see the API_DOCUMENTATION.md file.

  • To learn about the research behind the system and our vision for democratizing legal intelligence, read our full BLOGPOST.md file.


Made with ❀️ by the Itobuz Technologies Private Limited

β€’ Documentation β€’ Blog


Β© 2025 AI Contract Risk Analyzer. Making legal intelligence accessible to everyone.