================================================================================ πŸ§ͺ ADVANCED LIMITS TESTING: qwen25-deposium-1024d ================================================================================ πŸ”„ Loading model... βœ… Model loaded! ================================================================================ 🌍 PART 1: Cross-Lingual Instruction-Awareness ================================================================================ ──────────────────────────────────────────────────────────────────────────────── Test 1.1: Question FR β†’ Documents EN ──────────────────────────────────────────────────────────────────────────────── Can the model understand FR 'Explique' β†’ EN 'explanation tutorial'? πŸ“ Query: "Explique comment fonctionnent les rΓ©seaux de neurones" πŸ“„ Documents: 1. βšͺ [0.741] Comment installer TensorFlow sur Ubuntu 2. ❌ [0.674] Neural networks explanation tutorial and comprehensive guide 3. βšͺ [0.671] Neural network architecture overview and history ❌ FAIL: Cross-lingual instruction matching Score difference: -0.067 ──────────────────────────────────────────────────────────────────────────────── Test 1.2: Question EN β†’ Documents FR ──────────────────────────────────────────────────────────────────────────────── Can the model understand EN 'Find articles' β†’ FR 'Articles ... publications'? πŸ“ Query: "Find articles about climate change" πŸ“„ Documents: 1. βšͺ [0.950] Climate change scientific research overview 2. ❌ [0.737] Articles sur le changement climatique et publications scientifiques 3. βšͺ [0.646] Le changement climatique est un problΓ¨me majeur ❌ FAIL: Cross-lingual instruction matching Score difference: -0.213 ──────────────────────────────────────────────────────────────────────────────── Test 1.3: Question FR β†’ Documents Multilingues ──────────────────────────────────────────────────────────────────────────────── FR 'RΓ©sume' β†’ EN 'summary' (mixed FR/EN/ES/DE results) πŸ“ Query: "RΓ©sume les avantages de l'apprentissage profond" πŸ“„ Documents: 1. βšͺ [0.932] L'apprentissage profond est une technique d'IA 2. βšͺ [0.881] Resumen de las ventajas del aprendizaje profundo 3. βšͺ [0.838] Zusammenfassung der Vorteile des Deep Learning 4. ❌ [0.534] Deep learning advantages summary: fast, accurate, scalable ❌ FAIL: Multilingual instruction matching Score difference: -0.398 ================================================================================ πŸ€” PART 2: Difficult and Ambiguous Cases ================================================================================ ──────────────────────────────────────────────────────────────────────────────── Test 2.1: Instructions NΓ©gatives ──────────────────────────────────────────────────────────────────────────────── Does the model understand 'Avoid' correctly? πŸ“ Query: "Avoid using neural networks for this task" πŸ“„ Documents: 1. βœ… [0.969] Alternative methods to neural networks: decision trees, random forests 2. βšͺ [0.969] When not to use machine learning algorithms 3. βšͺ [0.958] Neural network implementation guide and tutorial βœ… PASS: Negative instruction understanding Score difference: 0.000 ──────────────────────────────────────────────────────────────────────────────── Test 2.2: Instructions AmbiguΓ«s ──────────────────────────────────────────────────────────────────────────────── 'Train the model' - Does it default to ML context? πŸ“ Query: "Train the model" πŸ“„ Documents: 1. βšͺ [0.918] Train scheduling and railway timetables 2. βšͺ [0.917] Employee training program for new hires 3. ❌ [0.905] Machine learning model training procedures and optimization ❌ FAIL: Ambiguity resolution (ML context) Score difference: -0.014 ──────────────────────────────────────────────────────────────────────────────── Test 2.3: Instructions Multiples ──────────────────────────────────────────────────────────────────────────────── Multiple intents: Find + Compare + Summarize πŸ“ Query: "Find, compare and summarize articles about quantum computing" πŸ“„ Documents: 1. βœ… [0.977] Quantum computing articles comparison summary: top papers analyzed 2. βšͺ [0.966] Quantum computing summary and overview 3. βšͺ [0.962] Quantum computing research articles and publications 4. βšͺ [0.704] GPT-3 vs GPT-4 comparison summary βœ… PASS: Multiple intentions handling Score difference: 0.000 ──────────────────────────────────────────────────────────────────────────────── Test 2.4: Nuances Formelles vs Informelles ──────────────────────────────────────────────────────────────────────────────── Formal query β†’ Formal doc: 0.969 Formal query β†’ Informal doc: 0.962 Informal query β†’ Formal doc: 0.883 Informal query β†’ Informal doc: 0.937 βœ… PASS: Formality awareness ================================================================================ ⚠️ PART 3: Edge Cases and Failure Modes ================================================================================ ──────────────────────────────────────────────────────────────────────────────── Test 3.1: Fautes d'Orthographe ──────────────────────────────────────────────────────────────────────────────── Query with typos: 'Explan', 'nural', 'netwrks', 'wrk' πŸ“ Query: "Explan how nural netwrks wrk" πŸ“„ Documents: 1. βšͺ [0.601] How to install neural network frameworks 2. ❌ [0.577] Neural networks explanation tutorial and comprehensive guide 3. βšͺ [0.565] Neural network architecture technical specifications ❌ FAIL: Typo robustness Score difference: -0.023 ──────────────────────────────────────────────────────────────────────────────── Test 3.2: RequΓͺte TrΓ¨s Longue et Complexe ──────────────────────────────────────────────────────────────────────────────── Very long query (71 words) with multiple intents πŸ“ Query: "I need to find comprehensive research articles and academic papers that provide a detailed explanation and thorough comparison of different neural network architectures, specifically comparing convolutional neural networks, recurrent neural networks, and transformer-based models, with a focus on their practical applications in natural language processing, computer vision, and time series prediction tasks, including performance benchmarks and computational efficiency analysis." πŸ“„ Documents: 1. βšͺ [0.963] Deep learning frameworks installation guide 2. βšͺ [0.958] Neural networks overview and basic introduction 3. ❌ [0.898] Neural network architectures comparison: CNN, RNN, Transformers for NLP, vision, time series ❌ FAIL: Long query handling Score difference: -0.065 ──────────────────────────────────────────────────────────────────────────────── Test 3.3: Instructions Contradictoires ──────────────────────────────────────────────────────────────────────────────── Contradictory: 'in detail' vs 'keep it brief' πŸ“ Query: "Explain in detail but keep it brief" πŸ“„ Documents: 1. βšͺ [0.952] Quick overview and brief summary of the topic 2. βšͺ [0.941] Comprehensive detailed explanation with examples 3. ❌ [0.924] Medium-length explanation with key points ❌ FAIL: Contradiction handling (balanced) Score difference: -0.029 ──────────────────────────────────────────────────────────────────────────────── Test 3.4: Scripts Non-Latins ──────────────────────────────────────────────────────────────────────────────── Arabic query β†’ English documents πŸ“ Query: "Ψ§Ψ΄Ψ±Ψ­ ΩƒΩŠΩ ΨͺΨΉΩ…Ω„ Ψ§Ω„Ψ΄Ψ¨ΩƒΨ§Ψͺ Ψ§Ω„ΨΉΨ΅Ψ¨ΩŠΨ©" πŸ“„ Documents: 1. βšͺ [0.961] Ψ΄Ψ¨ΩƒΨ§Ψͺ عءبية Ω…ΨΉΩ…Ψ§Ψ±ΩŠΨ© ΨΉΨ§Ω…Ψ© 2. ❌ [-0.445] Neural networks explanation tutorial comprehensive guide 3. βšͺ [-0.474] Neural network training procedures Russian query β†’ English documents πŸ“ Query: "Объясни, ΠΊΠ°ΠΊ Ρ€Π°Π±ΠΎΡ‚Π°ΡŽΡ‚ Π½Π΅ΠΉΡ€ΠΎΠ½Π½Ρ‹Π΅ сСти" πŸ“„ Documents: 1. βšͺ [0.982] НСйронныС сСти Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Π° ΠΎΠ±Π·ΠΎΡ€ 2. ❌ [-0.234] Neural networks explanation tutorial comprehensive guide 3. βšͺ [-0.242] Neural network training procedures Chinese query β†’ English documents πŸ“ Query: "θ§£ι‡Šη₯žη»η½‘η»œε¦‚δ½•ε·₯作" πŸ“„ Documents: 1. βšͺ [0.973] η₯žη»η½‘η»œζžΆζž„ζ¦‚θΏ° 2. βšͺ [-0.629] Neural network training procedures 3. ❌ [-0.642] Neural networks explanation tutorial comprehensive guide ⚠️ PARTIAL: Non-Latin script support Arabic: ❌ | Russian: ❌ | Chinese: ❌ ================================================================================ πŸ“Š PART 4: Performance Degradation Analysis ================================================================================ Progressive difficulty test: πŸ”΄ 1. Simple EN instruction Score: 0.934 | Margin: -0.010 πŸ”΄ 2. Cross-lingual FRβ†’EN Score: 0.590 | Margin: -0.002 πŸ”΄ 3. Cross-lingual with typos Score: 0.578 | Margin: 0.011 πŸ”΄ 4. Long cross-lingual query Score: 0.569 | Margin: 0.024 πŸ“‰ Performance Degradation: Cross-lingual FRβ†’EN: -0.343 (36.8% drop) Cross-lingual with typos: -0.356 (38.1% drop) Long cross-lingual query: -0.365 (39.0% drop) ================================================================================ πŸ“ˆ FINAL SUMMARY: Limits and Capabilities ================================================================================ ╔══════════════════════════════════════════════════════════════════════════════╗ β•‘ TEST RESULTS SUMMARY β•‘ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• βœ… STRENGTHS (What Works Well): 🌍 Cross-Lingual Instruction-Awareness: 0% pass rate β€’ FRβ†’EN: ❌ β€’ ENβ†’FR: ❌ β€’ Multilingual: ❌ πŸ€” Difficult Cases: 75% pass rate β€’ Negative instructions: βœ… β€’ Ambiguity resolution: ❌ β€’ Multiple intentions: βœ… β€’ Formality matching: βœ… ⚠️ LIMITATIONS (Where It Struggles): ⚠️ Edge Cases: 0% pass rate β€’ Spelling errors: ❌ β€’ Very long queries: ❌ β€’ Contradictions: ❌ β€’ Non-Latin scripts: ❌ πŸ“‰ Performance Degradation: β€’ Cross-lingual FRβ†’EN: -36.8% from baseline β€’ Cross-lingual with typos: -38.1% from baseline β€’ Long cross-lingual query: -39.0% from baseline 🎯 RECOMMENDATIONS FOR HUGGINGFACE DOCUMENTATION: 1. βœ… HIGHLIGHT: Excellent cross-lingual instruction-awareness (0%) 2. βœ… HIGHLIGHT: Handles difficult cases well (75%) 3. ⚠️ WARN: Moderate edge case performance (0%) 4. ⚠️ WARN: Performance degrades with complexity 5. ⚠️ WARN: Non-Latin script support varies by language πŸ’‘ HONEST ASSESSMENT: This model excels at cross-lingual instruction-awareness for European languages (EN/FR/ES/DE) but shows limitations with: - Non-Latin scripts (Arabic, Chinese, Russian) - Very complex or contradictory queries - Spelling errors (though still functional) Best use: EN/FR/ES/DE instruction-aware search and RAG systems Not ideal: Non-Latin languages, highly noisy input πŸ’Ύ Saving detailed results to test_results.json... Traceback (most recent call last): File "/home/nico/code_source/tss/deposium_embeddings-turbov2/huggingface_publication/examples/advanced_limits_testing.py", line 576, in main() File "/home/nico/code_source/tss/deposium_embeddings-turbov2/huggingface_publication/examples/advanced_limits_testing.py", line 570, in main json.dump(output, f, indent=2, ensure_ascii=False) File "/usr/lib/python3.10/json/__init__.py", line 179, in dump for chunk in iterable: File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/usr/lib/python3.10/json/encoder.py", line 325, in _iterencode_list yield from chunks File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode o = _default(o) File "/usr/lib/python3.10/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.__class__.__name__} ' TypeError: Object of type bool is not JSON serializable