| ================================================================================ | |
| 🧪 ADVANCED LIMITS TESTING: qwen25-deposium-1024d | |
| ================================================================================ | |
| 🔄 Loading model... | |
| ✅ Model loaded! | |
| ================================================================================ | |
| 🌍 PART 1: Cross-Lingual Instruction-Awareness | |
| ================================================================================ | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Test 1.1: Question FR → Documents EN | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Can the model understand FR 'Explique' → EN 'explanation tutorial'? | |
| 📝 Query: "Explique comment fonctionnent les réseaux de neurones" | |
| 📄 Documents: | |
| 1. ⚪ [0.741] Comment installer TensorFlow sur Ubuntu | |
| 2. ❌ [0.674] Neural networks explanation tutorial and comprehensive guide | |
| 3. ⚪ [0.671] Neural network architecture overview and history | |
| ❌ FAIL: Cross-lingual instruction matching | |
| Score difference: -0.067 | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Test 1.2: Question EN → Documents FR | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Can the model understand EN 'Find articles' → FR 'Articles ... publications'? | |
| 📝 Query: "Find articles about climate change" | |
| 📄 Documents: | |
| 1. ⚪ [0.950] Climate change scientific research overview | |
| 2. ❌ [0.737] Articles sur le changement climatique et publications scientifiques | |
| 3. ⚪ [0.646] Le changement climatique est un problème majeur | |
| ❌ FAIL: Cross-lingual instruction matching | |
| Score difference: -0.213 | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Test 1.3: Question FR → Documents Multilingues | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| FR 'Résume' → EN 'summary' (mixed FR/EN/ES/DE results) | |
| 📝 Query: "Résume les avantages de l'apprentissage profond" | |
| 📄 Documents: | |
| 1. ⚪ [0.932] L'apprentissage profond est une technique d'IA | |
| 2. ⚪ [0.881] Resumen de las ventajas del aprendizaje profundo | |
| 3. ⚪ [0.838] Zusammenfassung der Vorteile des Deep Learning | |
| 4. ❌ [0.534] Deep learning advantages summary: fast, accurate, scalable | |
| ❌ FAIL: Multilingual instruction matching | |
| Score difference: -0.398 | |
| ================================================================================ | |
| 🤔 PART 2: Difficult and Ambiguous Cases | |
| ================================================================================ | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Test 2.1: Instructions Négatives | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Does the model understand 'Avoid' correctly? | |
| 📝 Query: "Avoid using neural networks for this task" | |
| 📄 Documents: | |
| 1. ✅ [0.969] Alternative methods to neural networks: decision trees, random forests | |
| 2. ⚪ [0.969] When not to use machine learning algorithms | |
| 3. ⚪ [0.958] Neural network implementation guide and tutorial | |
| ✅ PASS: Negative instruction understanding | |
| Score difference: 0.000 | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Test 2.2: Instructions Ambiguës | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| 'Train the model' - Does it default to ML context? | |
| 📝 Query: "Train the model" | |
| 📄 Documents: | |
| 1. ⚪ [0.918] Train scheduling and railway timetables | |
| 2. ⚪ [0.917] Employee training program for new hires | |
| 3. ❌ [0.905] Machine learning model training procedures and optimization | |
| ❌ FAIL: Ambiguity resolution (ML context) | |
| Score difference: -0.014 | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Test 2.3: Instructions Multiples | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Multiple intents: Find + Compare + Summarize | |
| 📝 Query: "Find, compare and summarize articles about quantum computing" | |
| 📄 Documents: | |
| 1. ✅ [0.977] Quantum computing articles comparison summary: top papers analyzed | |
| 2. ⚪ [0.966] Quantum computing summary and overview | |
| 3. ⚪ [0.962] Quantum computing research articles and publications | |
| 4. ⚪ [0.704] GPT-3 vs GPT-4 comparison summary | |
| ✅ PASS: Multiple intentions handling | |
| Score difference: 0.000 | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Test 2.4: Nuances Formelles vs Informelles | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Formal query → Formal doc: 0.969 | |
| Formal query → Informal doc: 0.962 | |
| Informal query → Formal doc: 0.883 | |
| Informal query → Informal doc: 0.937 | |
| ✅ PASS: Formality awareness | |
| ================================================================================ | |
| ⚠️ PART 3: Edge Cases and Failure Modes | |
| ================================================================================ | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Test 3.1: Fautes d'Orthographe | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Query with typos: 'Explan', 'nural', 'netwrks', 'wrk' | |
| 📝 Query: "Explan how nural netwrks wrk" | |
| 📄 Documents: | |
| 1. ⚪ [0.601] How to install neural network frameworks | |
| 2. ❌ [0.577] Neural networks explanation tutorial and comprehensive guide | |
| 3. ⚪ [0.565] Neural network architecture technical specifications | |
| ❌ FAIL: Typo robustness | |
| Score difference: -0.023 | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Test 3.2: Requête Très Longue et Complexe | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Very long query (71 words) with multiple intents | |
| 📝 Query: "I need to find comprehensive research articles and academic papers that provide | |
| a detailed explanation and thorough comparison of different neural network | |
| architectures, specifically comparing convolutional neural networks, recurrent | |
| neural networks, and transformer-based models, with a focus on their practical | |
| applications in natural language processing, computer vision, and time series | |
| prediction tasks, including performance benchmarks and computational efficiency | |
| analysis." | |
| 📄 Documents: | |
| 1. ⚪ [0.963] Deep learning frameworks installation guide | |
| 2. ⚪ [0.958] Neural networks overview and basic introduction | |
| 3. ❌ [0.898] Neural network architectures comparison: CNN, RNN, Transformers for NLP, vision, time series | |
| ❌ FAIL: Long query handling | |
| Score difference: -0.065 | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Test 3.3: Instructions Contradictoires | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Contradictory: 'in detail' vs 'keep it brief' | |
| 📝 Query: "Explain in detail but keep it brief" | |
| 📄 Documents: | |
| 1. ⚪ [0.952] Quick overview and brief summary of the topic | |
| 2. ⚪ [0.941] Comprehensive detailed explanation with examples | |
| 3. ❌ [0.924] Medium-length explanation with key points | |
| ❌ FAIL: Contradiction handling (balanced) | |
| Score difference: -0.029 | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Test 3.4: Scripts Non-Latins | |
| ──────────────────────────────────────────────────────────────────────────────── | |
| Arabic query → English documents | |
| 📝 Query: "اشرح كيف تعمل الشبكات العصبية" | |
| 📄 Documents: | |
| 1. ⚪ [0.961] شبكات عصبية معمارية عامة | |
| 2. ❌ [-0.445] Neural networks explanation tutorial comprehensive guide | |
| 3. ⚪ [-0.474] Neural network training procedures | |
| Russian query → English documents | |
| 📝 Query: "Объясни, как работают нейронные сети" | |
| 📄 Documents: | |
| 1. ⚪ [0.982] Нейронные сети архитектура обзор | |
| 2. ❌ [-0.234] Neural networks explanation tutorial comprehensive guide | |
| 3. ⚪ [-0.242] Neural network training procedures | |
| Chinese query → English documents | |
| 📝 Query: "解释神经网络如何工作" | |
| 📄 Documents: | |
| 1. ⚪ [0.973] 神经网络架构概述 | |
| 2. ⚪ [-0.629] Neural network training procedures | |
| 3. ❌ [-0.642] Neural networks explanation tutorial comprehensive guide | |
| ⚠️ PARTIAL: Non-Latin script support | |
| Arabic: ❌ | Russian: ❌ | Chinese: ❌ | |
| ================================================================================ | |
| 📊 PART 4: Performance Degradation Analysis | |
| ================================================================================ | |
| Progressive difficulty test: | |
| 🔴 1. Simple EN instruction | |
| Score: 0.934 | Margin: -0.010 | |
| 🔴 2. Cross-lingual FR→EN | |
| Score: 0.590 | Margin: -0.002 | |
| 🔴 3. Cross-lingual with typos | |
| Score: 0.578 | Margin: 0.011 | |
| 🔴 4. Long cross-lingual query | |
| Score: 0.569 | Margin: 0.024 | |
| 📉 Performance Degradation: | |
| Cross-lingual FR→EN: -0.343 (36.8% drop) | |
| Cross-lingual with typos: -0.356 (38.1% drop) | |
| Long cross-lingual query: -0.365 (39.0% drop) | |
| ================================================================================ | |
| 📈 FINAL SUMMARY: Limits and Capabilities | |
| ================================================================================ | |
| ╔══════════════════════════════════════════════════════════════════════════════╗ | |
| ║ TEST RESULTS SUMMARY ║ | |
| ╚══════════════════════════════════════════════════════════════════════════════╝ | |
| ✅ STRENGTHS (What Works Well): | |
| 🌍 Cross-Lingual Instruction-Awareness: 0% pass rate | |
| • FR→EN: ❌ | |
| • EN→FR: ❌ | |
| • Multilingual: ❌ | |
| 🤔 Difficult Cases: 75% pass rate | |
| • Negative instructions: ✅ | |
| • Ambiguity resolution: ❌ | |
| • Multiple intentions: ✅ | |
| • Formality matching: ✅ | |
| ⚠️ LIMITATIONS (Where It Struggles): | |
| ⚠️ Edge Cases: 0% pass rate | |
| • Spelling errors: ❌ | |
| • Very long queries: ❌ | |
| • Contradictions: ❌ | |
| • Non-Latin scripts: ❌ | |
| 📉 Performance Degradation: | |
| • Cross-lingual FR→EN: -36.8% from baseline | |
| • Cross-lingual with typos: -38.1% from baseline | |
| • Long cross-lingual query: -39.0% from baseline | |
| 🎯 RECOMMENDATIONS FOR HUGGINGFACE DOCUMENTATION: | |
| 1. ✅ HIGHLIGHT: Excellent cross-lingual instruction-awareness (0%) | |
| 2. ✅ HIGHLIGHT: Handles difficult cases well (75%) | |
| 3. ⚠️ WARN: Moderate edge case performance (0%) | |
| 4. ⚠️ WARN: Performance degrades with complexity | |
| 5. ⚠️ WARN: Non-Latin script support varies by language | |
| 💡 HONEST ASSESSMENT: | |
| This model excels at cross-lingual instruction-awareness for European | |
| languages (EN/FR/ES/DE) but shows limitations with: | |
| - Non-Latin scripts (Arabic, Chinese, Russian) | |
| - Very complex or contradictory queries | |
| - Spelling errors (though still functional) | |
| Best use: EN/FR/ES/DE instruction-aware search and RAG systems | |
| Not ideal: Non-Latin languages, highly noisy input | |
| 💾 Saving detailed results to test_results.json... | |
| Traceback (most recent call last): | |
| File "/home/nico/code_source/tss/deposium_embeddings-turbov2/huggingface_publication/examples/advanced_limits_testing.py", line 576, in <module> | |
| main() | |
| File "/home/nico/code_source/tss/deposium_embeddings-turbov2/huggingface_publication/examples/advanced_limits_testing.py", line 570, in main | |
| json.dump(output, f, indent=2, ensure_ascii=False) | |
| File "/usr/lib/python3.10/json/__init__.py", line 179, in dump | |
| for chunk in iterable: | |
| File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode | |
| yield from _iterencode_dict(o, _current_indent_level) | |
| File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict | |
| yield from chunks | |
| File "/usr/lib/python3.10/json/encoder.py", line 325, in _iterencode_list | |
| yield from chunks | |
| File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict | |
| yield from chunks | |
| File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode | |
| o = _default(o) | |
| File "/usr/lib/python3.10/json/encoder.py", line 179, in default | |
| raise TypeError(f'Object of type {o.__class__.__name__} ' | |
| TypeError: Object of type bool is not JSON serializable | |