File size: 7,706 Bytes
6e9c28f 15bddfa 6e9c28f 15bddfa 62b2ed2 15bddfa c8dd6f4 a467728 c8dd6f4 0d99018 c8dd6f4 a467728 0d99018 a467728 0d99018 928766e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
---
title: Drift Detector
emoji: π
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 5.33.0
app_file: app.py
pinned: false
license: mit
tags:
- mcp-server-track
- agent-demo-track
---
This was made with the combined efforts of Saransh Halwai(HF username: [Sars6](https://huggingface.co/Sars6)), Harsh Bhati(HF username: [HarshBhati](https://huggingface.co/HarshBhati)), and Anurag Prasad(HF username: [LegendXInfinity](https://huggingface.co/LegendXInfinity))
GitHub repo: [Drift Detector](https://github.com/saranshhalwai/drift-detector)
# Drift Detector
Drift Detector is an MCP server, designed to detect drift in LLM performance over time by using the power of the **sampling** functionality of MCP.
This implementation is intended as a **proof of concept** and is **NOT intended** for production use without significant changes.
## The Idea
The drift detector is a server that can be connected to any LLM client that supports the MCP sampling functionality.
It allows you to monitor the performance of your LLM models over time, detecting any drift in their behavior.
This is particularly useful for applications where the model's performance may change due to various factors, such as changes in the data distribution, model updates, or other external influences.
## How to run
To run the Drift Detector, you need to have Python installed on your machine. Follow these steps:
1. Clone the repository:
```bash
git clone https://github.com/saranshhalwai/drift-detector
cd drift-detector
```
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Start the server:
```bash
gradio app.py
```
4. Open your web browser and navigate to `http://localhost:7860` to access the Drift Detector interface.
## Interface
The interface consists of the following components:
- **Model Selection** - A panel allowing you to:
- Select models from a dropdown list
- Search for models by name or description
- Create new models with custom system prompts
- Enhance prompts with AI assistance
- **Model Operations** - A tabbed interface with:
- **Chatbot** - Interact with the selected model through a conversational interface
- **Drift Analysis** - Analyze and visualize model drift over time, including:
- Calculate new drift scores for the selected model
- View historical drift data in JSON format
- Visualize drift trends through interactive charts
The drift detection functionality allows you to track changes in model performance over time, which is essential for monitoring and maintaining model quality.
## Under the Hood
Our GitHub repo consists of two main components:
- **Drift Detector Server**
A low-level MCP server that detects drift in LLM performance of the connected client.
- **Target Client**
A client implemented using the fast-agent library, which connects to the Drift Detector server and demonstrates it's functionality.
The gradio interface in [app.py](app.py) is an example dashboard which allows users to interact with the Drift Detector server and visualize drift data.
### Database Integration
The system uses SQLite (by default) to store:
- Model information (name, capabilities, creation date)
- Drift history (date and score for each drift calculation)
- Diagnostic data (baseline and current questions/answers)
This enables persistent tracking of model performance over time, allowing for:
- Historical trend analysis
- Comparison between different models
- Early detection of performance degradation
### Drift Detector Server
The Drift Detector server is implemented using the MCP python SDK.
It exposes the following tools:
1. **run_initial_diagnostics**
- **Purpose**: Establishes a baseline for model behavior using adaptive sampling techniques
- **Parameters**:
- `model`: The name of the model to run diagnostics on
- `model_capabilities`: Full description of the model's capabilities and special features
- **Sampling Process**:
- First generates a tailored questionnaire based on model-specific capabilities
- Collects responses by sampling the target model with controlled parameters (temperature=0.7)
- Each question is processed individually to ensure proper context isolation
- Baseline samples are stored as paired question-answer JSON records for future comparison
- **Output**: Confirmation message indicating successful baseline creation
2. **check_drift**
- **Purpose**: Measures potential drift by comparative sampling against the baseline
- **Parameters**:
- `model`: The name of the model to check for drift
- **Sampling Process**:
- Retrieves the original questions from the baseline
- Re-samples the model with identical questions using the same sampling parameters
- Maintains consistent context conditions to ensure fair comparison
- Uses differential analysis to compare semantic and functional differences between sample sets
- **Drift Evaluation**:
- Calculates a numerical drift score based on answer divergence
- Provides threshold-based alerts when drift exceeds acceptable limits (score > 50)
- Stores the latest sample responses for audit and trend analysis
## Flow
The intended flow is as follows:
1. When the client contacts the server for the first time, it will run the `run_initial_diagnostics` tool.
2. The server will generate a tailored questionnaire based on the model's capabilities.
3. This questionnaire will be used to collect responses from the model, establishing a baseline for future comparisons.
4. Once the baseline is established, the server will store the paired question-answer JSON records in the database.
5. The client can then use the `check_drift` tool to measure potential drift in the model's performance.
6. The server will retrieve the original questions from the baseline and re-sample the model with identical questions.
7. The server will maintain consistent context conditions to ensure fair comparison.
8. If significant drift is detected (score > 50), the server will provide an alert and store the latest sample responses for audit and trend analysis.
9. The client can visualize the drift data through the Gradio interface, allowing users to track changes in model performance over time.
## Drift History Visualization
The system provides comprehensive visualization of drift history:
1. **Historical Data**: Real drift history is now fetched from the database rather than using mock data
2. **Interactive Charts**: Drift scores are plotted over time to identify trends
3. **Threshold Indicators**: Visual indicators show when drift exceeds acceptable limits
4. **Data Conversion**: Drift scores are normalized to percentages (0-100) for consistent display
5. **Error Handling**: Robust error handling for missing or malformed data
This real-time visualization allows users to:
- Identify gradual performance degradation
- Spot sudden changes in model behavior
- Make informed decisions about model retraining or replacement
- Compare drift patterns across different deployment environments
## Future Improvements
Potential enhancements for the Drift Detector include:
1. A full mcp server hosted over the cloud.
2. authentication and authorization for secure access.
1. Support for multiple database backends (PostgreSQL, MySQL)
2. Enhanced analytics and reporting features
3. Integration with CI/CD pipelines for automated monitoring
4. Advanced drift detection algorithms with explainability
5. Multi-metric drift analysis (beyond a single drift score)
6. User role-based access control for enterprise environments
# Demo Video
[]
|