Spaces:
Running
Running
File size: 16,966 Bytes
a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 d999c28 a8b5cb5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 |
# 🔄 Enhanced Memory System: STM + LTM + Hybrid Context Retrieval
## Overview
The Medical Chatbot now implements an **advanced memory system** with **Short-Term Memory (STM)** and **Long-Term Memory (LTM)** that intelligently manages conversation context, semantic knowledge, and conversational continuity. This system goes beyond simple RAG to provide truly intelligent, contextually aware responses that remember and build upon previous interactions.
## 🏗️ Architecture
### Memory Hierarchy
```
User Query → Enhanced Memory System → Intelligent Context Selection → LLM Response
↓
┌─────────────────┬─────────────────┬─────────────────┐
│ STM (5 items) │ LTM (60 items)│ RAG Search │
│ (Recent Summaries)│ (Semantic Store)│ (Knowledge Base)│
└─────────────────┴─────────────────┴─────────────────┘
↓
Gemini Flash Lite Contextual Analysis
↓
Summarized Context + Semantic Knowledge
```
### Memory Types
#### 1. **Short-Term Memory (STM)**
- **Capacity:** 5 recent conversation summaries
- **Content:** Chunked and summarized LLM responses with enriched topics
- **Features:** Semantic deduplication, intelligent merging, topic enrichment
- **Purpose:** Maintain conversational continuity and immediate context
#### 2. **Long-Term Memory (LTM)**
- **Capacity:** 60 semantic chunks (~20 conversational rounds)
- **Content:** FAISS-indexed medical knowledge chunks
- **Features:** Semantic similarity search, usage tracking, smart eviction
- **Purpose:** Provide deep medical knowledge and historical context
#### 3. **RAG Knowledge Base**
- **Content:** External medical knowledge and guidelines
- **Features:** Real-time retrieval, semantic matching
- **Purpose:** Supplement with current medical information
## 🔧 Key Components
### 1. Enhanced Memory Manager (`memory.py`)
#### STM Management
```python
def get_recent_chat_history(self, user_id: str, num_turns: int = 5) -> List[Dict]:
"""
Get the most recent STM summaries (not raw Q/A).
Returns: [{"user": "", "bot": "Topic: ...\n<summary>", "timestamp": time}, ...]
"""
```
**STM Features:**
- **Capacity:** 5 recent conversation summaries
- **Content:** Chunked and summarized LLM responses with enriched topics
- **Deduplication:** Semantic similarity-based merging (≥0.92 identical, ≥0.75 merge)
- **Topic Enrichment:** Uses user question context to generate detailed topics
#### LTM Management
```python
def get_relevant_chunks(self, user_id: str, query: str, top_k: int = 3, min_sim: float = 0.30) -> List[str]:
"""Return texts of chunks whose cosine similarity ≥ min_sim."""
```
**LTM Features:**
- **Capacity:** 60 semantic chunks (~20 conversational rounds)
- **Indexing:** FAISS-based semantic search
- **Smart Eviction:** Usage-based decay and recency scoring
- **Merging:** Intelligent deduplication and content fusion
#### Enhanced Chunking
```python
def chunk_response(self, response: str, lang: str, question: str = "") -> List[Dict]:
"""
Enhanced chunking with question context for richer topics.
Returns: [{"tag": "detailed_topic", "text": "summary"}, ...]
"""
```
**Chunking Features:**
- **Question Context:** Incorporates user's latest question for topic generation
- **Rich Topics:** Detailed topics (10-20 words) capturing context, condition, and action
- **Medical Focus:** Excludes disclaimers, includes exact medication names/doses
- **Semantic Grouping:** Groups by medical topic, symptom, assessment, plan, or instruction
### 2. Intelligent Context Retrieval
#### Contextual Summarization
```python
def get_contextual_chunks(self, user_id: str, current_query: str, lang: str = "EN") -> str:
"""
Creates a single, coherent summary from STM + LTM + RAG.
Returns: A single summary string for the main LLM.
"""
```
**Features:**
- **Unified Summary:** Combines STM (5 turns) + LTM (semantic) + RAG (knowledge)
- **Gemini Analysis:** Uses Gemini Flash Lite for intelligent context selection
- **Conversational Flow:** Maintains continuity while providing medical relevance
- **Fallback Strategy:** Graceful degradation if analysis fails
## 🚀 How It Works
### Step 1: Enhanced Memory Processing
```python
# Process new exchange through STM and LTM
chunks = memory.chunk_response(response, lang, question=query)
for chunk in chunks:
memory._upsert_stm(user_id, chunk, lang) # STM with dedupe/merge
memory._upsert_ltm(user_id, chunks, lang) # LTM with semantic storage
```
### Step 2: Context Retrieval
```python
# Get STM summaries (5 recent turns)
recent_history = memory.get_recent_chat_history(user_id, num_turns=5)
# Get LTM semantic chunks
rag_chunks = memory.get_relevant_chunks(user_id, current_query, top_k=3)
# Get external RAG knowledge
external_rag = retrieve_medical_info(current_query)
```
### Step 3: Intelligent Context Summarization
The system sends all context sources to Gemini Flash Lite for unified summarization:
```
You are a medical assistant creating a concise summary of conversation context for continuity.
Current user query: "{current_query}"
Available context information:
Recent conversation history:
{recent_history}
Semantically relevant historical medical information:
{rag_chunks}
Task: Create a brief, coherent summary that captures the key points from the conversation history and relevant medical information that are important for understanding the current query.
Guidelines:
1. Focus on medical symptoms, diagnoses, treatments, or recommendations mentioned
2. Include any patient concerns or questions that are still relevant
3. Highlight any follow-up needs or pending clarifications
4. Keep the summary concise but comprehensive enough for context
5. Maintain conversational flow and continuity
Output: Provide a single, well-structured summary paragraph that can be used as context for the main LLM to provide a coherent response.
```
### Step 4: Unified Context Integration
The single, coherent summary is integrated into the main LLM prompt, providing:
- **Conversational continuity** (from STM summaries)
- **Medical knowledge** (from LTM semantic chunks)
- **Current information** (from external RAG)
- **Unified narrative** (single summary instead of multiple chunks)
## 📊 Benefits
### 1. **Advanced Memory Management**
- **STM:** Maintains 5 recent conversation summaries with intelligent deduplication
- **LTM:** Stores 60 semantic chunks (~20 rounds) with FAISS indexing
- **Smart Merging:** Combines similar content while preserving unique details
- **Topic Enrichment:** Detailed topics using user question context
### 2. **Intelligent Context Summarization**
- **Unified Summary:** Single coherent narrative instead of multiple chunks
- **Gemini Analysis:** AI-powered context selection and summarization
- **Medical Focus:** Prioritizes symptoms, diagnoses, treatments, and recommendations
- **Conversational Flow:** Maintains natural dialogue continuity
### 3. **Enhanced Chunking & Topics**
- **Question Context:** Incorporates user's latest question for richer topics
- **Detailed Topics:** 10-20 word descriptions capturing context, condition, and action
- **Medical Precision:** Includes exact medication names, doses, and clinical instructions
- **Semantic Grouping:** Organizes by medical topic, symptom, assessment, plan, or instruction
### 4. **Robust Fallback Strategy**
- **Primary:** Gemini Flash Lite contextual summarization
- **Secondary:** LTM semantic search with usage-based scoring
- **Tertiary:** STM recent summaries
- **Final:** External RAG knowledge base
### 5. **Performance & Scalability**
- **Efficient Storage:** Semantic deduplication reduces memory footprint
- **Fast Retrieval:** FAISS indexing for sub-millisecond LTM search
- **Smart Eviction:** Usage-based decay and recency scoring
- **Minimal Latency:** Optimized for real-time medical consultations
## 🧪 Example Scenarios
### Scenario 1: STM Deduplication & Merging
```
User: "I have chest pain"
Bot: "This could be angina. Symptoms include pressure, tightness, and shortness of breath."
User: "What about chest pain with shortness of breath?"
Bot: "Chest pain with shortness of breath is concerning for angina or heart attack..."
User: "Tell me more about the symptoms"
Bot: "Angina symptoms include chest pressure, tightness, shortness of breath, and may radiate to arms..."
```
**Result:** STM merges similar responses, creating a comprehensive summary: "Patient has chest pain symptoms consistent with angina, including pressure, tightness, shortness of breath, and potential radiation to arms. This represents a concerning cardiac presentation requiring immediate evaluation."
### Scenario 2: LTM Semantic Retrieval
```
User: "What medications should I avoid with my condition?"
Bot: "Based on your previous discussion about hypertension and the medications mentioned..."
```
**Result:** LTM retrieves relevant medical information about hypertension medications and contraindications from previous conversations, even if not in recent STM.
### Scenario 3: Enhanced Topic Generation
```
User: "I'm having trouble sleeping"
Bot: "Topic: Sleep disturbance evaluation and management for adult patient with insomnia symptoms"
```
**Result:** The topic incorporates the user's question context to create a detailed, medical-specific description instead of just "Sleep problems."
### Scenario 4: Unified Context Summarization
```
User: "Can you repeat the treatment plan?"
Bot: "Based on our conversation about your hypertension and sleep issues, your treatment plan includes..."
```
**Result:** The system creates a unified summary combining STM (recent sleep discussion), LTM (hypertension history), and RAG (current treatment guidelines) into a single coherent narrative.
## ⚙️ Configuration
### Environment Variables
```bash
FlashAPI=your_gemini_api_key # For both main LLM and contextual analysis
```
### Enhanced Memory Settings
```python
memory = MemoryManager(
max_users=1000, # Maximum users in memory
history_per_user=5, # STM capacity (5 recent summaries)
max_chunks=60 # LTM capacity (~20 conversational rounds)
)
```
### Memory Parameters
```python
# STM retrieval (5 recent turns)
recent_history = memory.get_recent_chat_history(user_id, num_turns=5)
# LTM semantic search
rag_chunks = memory.get_relevant_chunks(user_id, query, top_k=3, min_sim=0.30)
# Unified context summarization
contextual_summary = memory.get_contextual_chunks(user_id, current_query, lang)
```
### Similarity Thresholds
```python
# STM deduplication thresholds
IDENTICAL_THRESHOLD = 0.92 # Replace older with newer
MERGE_THRESHOLD = 0.75 # Merge similar content
# LTM semantic search
MIN_SIMILARITY = 0.30 # Minimum similarity for retrieval
TOP_K = 3 # Number of chunks to retrieve
```
## 🔍 Monitoring & Debugging
### Enhanced Logging
The system provides comprehensive logging for all memory operations:
```python
# STM operations
logger.info(f"[Contextual] Retrieved {len(recent_history)} recent history items")
logger.info(f"[Contextual] Retrieved {len(rag_chunks)} RAG chunks")
# Chunking operations
logger.info(f"[Memory] 📦 Gemini summarized chunk output: {output}")
logger.warning(f"[Memory] ❌ Gemini chunking failed: {e}")
# Contextual summarization
logger.info(f"[Contextual] Gemini created summary: {summary[:100]}...")
logger.warning(f"[Contextual] Gemini summarization failed: {e}")
```
### Performance Metrics
- **STM Operations:** Deduplication rate, merge frequency, topic enrichment quality
- **LTM Operations:** FAISS search latency, semantic similarity scores, eviction patterns
- **Context Summarization:** Gemini response time, summary quality, fallback usage
- **Memory Usage:** Storage efficiency, retrieval hit rates, cache performance
## 🚨 Error Handling
### Enhanced Fallback Strategy
1. **Primary:** Gemini Flash Lite contextual summarization
2. **Secondary:** LTM semantic search with usage-based scoring
3. **Tertiary:** STM recent summaries
4. **Final:** External RAG knowledge base
5. **Emergency:** No context (minimal response)
### Error Scenarios & Recovery
- **Gemini API failure** → Fall back to LTM semantic search
- **LTM corruption** → Rebuild FAISS index from remaining chunks
- **STM corruption** → Reset to empty STM, continue with LTM
- **Memory corruption** → Reset user session, clear all memory
- **Chunking failure** → Store raw response as fallback chunk
## 🔮 Future Enhancements
### 1. **Persistent Memory Storage**
- **Database Integration:** Store LTM in PostgreSQL/SQLite with FAISS index persistence
- **Session Recovery:** Resume conversations after system restarts
- **Memory Export:** Allow users to export their conversation history
- **Cross-device Sync:** Synchronize memory across different devices
### 2. **Advanced Memory Features**
- **Fact Store:** Dedicated storage for critical medical facts (allergies, chronic conditions, medications)
- **Memory Compression:** Summarize older STM entries into LTM when STM overflows
- **Contextual Tags:** Add metadata tags (encounter type, modality, urgency) to bias retrieval
- **Memory Analytics:** Track memory usage patterns and optimize storage strategies
### 3. **Intelligent Memory Management**
- **Adaptive Thresholds:** Dynamically adjust similarity thresholds based on conversation context
- **Memory Prioritization:** Protect critical medical information from eviction
- **Usage-based Retention:** Keep frequently accessed information longer
- **Semantic Clustering:** Group related memories for better organization
### 4. **Enhanced Medical Context**
- **Clinical Decision Support:** Integrate with medical guidelines and protocols
- **Risk Assessment:** Track and alert on potential medical risks across conversations
- **Medication Reconciliation:** Maintain accurate medication lists across sessions
- **Follow-up Scheduling:** Track recommended follow-ups and reminders
### 5. **Multi-modal Memory**
- **Image Memory:** Store and retrieve medical images with descriptions
- **Voice Memory:** Convert voice interactions to text for memory storage
- **Document Memory:** Process and store medical documents and reports
- **Temporal Memory:** Track changes in symptoms and conditions over time
## 📝 Testing
### Memory System Testing
```bash
cd Medical-Chatbot
python test_memory_system.py
```
### Test Scenarios
1. **STM Deduplication Test:** Verify similar responses are merged correctly
2. **LTM Semantic Search Test:** Test FAISS retrieval with various queries
3. **Context Summarization Test:** Validate unified summary generation
4. **Topic Enrichment Test:** Check detailed topic generation with question context
5. **Memory Capacity Test:** Verify STM (5 items) and LTM (60 items) limits
6. **Fallback Strategy Test:** Test system behavior when Gemini API fails
### Expected Behaviors
- **STM:** Similar responses merge, unique details preserved
- **LTM:** Semantic search returns relevant chunks with usage tracking
- **Topics:** Detailed, medical-specific descriptions (10-20 words)
- **Summaries:** Coherent narratives combining STM + LTM + RAG
- **Performance:** Sub-second retrieval times for all operations
## 🎯 Summary
The enhanced memory system transforms the Medical Chatbot into a sophisticated, memory-aware medical assistant that:
✅ **Maintains Short-Term Memory (STM)** with 5 recent conversation summaries and intelligent deduplication
✅ **Provides Long-Term Memory (LTM)** with 60 semantic chunks and FAISS-based retrieval
✅ **Generates Enhanced Topics** using question context for detailed, medical-specific descriptions
✅ **Creates Unified Summaries** combining STM + LTM + RAG into coherent narratives
✅ **Implements Smart Merging** that preserves unique details while eliminating redundancy
✅ **Ensures Conversational Continuity** across extended medical consultations
✅ **Optimizes Performance** with sub-second retrieval and efficient memory management
This advanced memory system addresses the limitations of simple RAG systems by providing:
- **Intelligent context management** that remembers and builds upon previous interactions
- **Medical precision** with detailed topics and exact clinical information
- **Scalable architecture** that can handle extended conversations without performance degradation
- **Robust fallback strategies** ensuring system reliability in all scenarios
The result is a medical chatbot that truly understands conversation context, remembers patient history, and provides increasingly relevant and personalized medical guidance over time.
|