File size: 16,245 Bytes
6597245 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 |
================================================================================
🧪 ADVANCED LIMITS TESTING: qwen25-deposium-1024d
================================================================================
🔄 Loading model...
✅ Model loaded!
================================================================================
🌍 PART 1: Cross-Lingual Instruction-Awareness
================================================================================
────────────────────────────────────────────────────────────────────────────────
Test 1.1: Question FR → Documents EN
────────────────────────────────────────────────────────────────────────────────
Can the model understand FR 'Explique' → EN 'explanation tutorial'?
📝 Query: "Explique comment fonctionnent les réseaux de neurones"
📄 Documents:
1. ⚪ [0.741] Comment installer TensorFlow sur Ubuntu
2. ❌ [0.674] Neural networks explanation tutorial and comprehensive guide
3. ⚪ [0.671] Neural network architecture overview and history
❌ FAIL: Cross-lingual instruction matching
Score difference: -0.067
────────────────────────────────────────────────────────────────────────────────
Test 1.2: Question EN → Documents FR
────────────────────────────────────────────────────────────────────────────────
Can the model understand EN 'Find articles' → FR 'Articles ... publications'?
📝 Query: "Find articles about climate change"
📄 Documents:
1. ⚪ [0.950] Climate change scientific research overview
2. ❌ [0.737] Articles sur le changement climatique et publications scientifiques
3. ⚪ [0.646] Le changement climatique est un problème majeur
❌ FAIL: Cross-lingual instruction matching
Score difference: -0.213
────────────────────────────────────────────────────────────────────────────────
Test 1.3: Question FR → Documents Multilingues
────────────────────────────────────────────────────────────────────────────────
FR 'Résume' → EN 'summary' (mixed FR/EN/ES/DE results)
📝 Query: "Résume les avantages de l'apprentissage profond"
📄 Documents:
1. ⚪ [0.932] L'apprentissage profond est une technique d'IA
2. ⚪ [0.881] Resumen de las ventajas del aprendizaje profundo
3. ⚪ [0.838] Zusammenfassung der Vorteile des Deep Learning
4. ❌ [0.534] Deep learning advantages summary: fast, accurate, scalable
❌ FAIL: Multilingual instruction matching
Score difference: -0.398
================================================================================
🤔 PART 2: Difficult and Ambiguous Cases
================================================================================
────────────────────────────────────────────────────────────────────────────────
Test 2.1: Instructions Négatives
────────────────────────────────────────────────────────────────────────────────
Does the model understand 'Avoid' correctly?
📝 Query: "Avoid using neural networks for this task"
📄 Documents:
1. ✅ [0.969] Alternative methods to neural networks: decision trees, random forests
2. ⚪ [0.969] When not to use machine learning algorithms
3. ⚪ [0.958] Neural network implementation guide and tutorial
✅ PASS: Negative instruction understanding
Score difference: 0.000
────────────────────────────────────────────────────────────────────────────────
Test 2.2: Instructions Ambiguës
────────────────────────────────────────────────────────────────────────────────
'Train the model' - Does it default to ML context?
📝 Query: "Train the model"
📄 Documents:
1. ⚪ [0.918] Train scheduling and railway timetables
2. ⚪ [0.917] Employee training program for new hires
3. ❌ [0.905] Machine learning model training procedures and optimization
❌ FAIL: Ambiguity resolution (ML context)
Score difference: -0.014
────────────────────────────────────────────────────────────────────────────────
Test 2.3: Instructions Multiples
────────────────────────────────────────────────────────────────────────────────
Multiple intents: Find + Compare + Summarize
📝 Query: "Find, compare and summarize articles about quantum computing"
📄 Documents:
1. ✅ [0.977] Quantum computing articles comparison summary: top papers analyzed
2. ⚪ [0.966] Quantum computing summary and overview
3. ⚪ [0.962] Quantum computing research articles and publications
4. ⚪ [0.704] GPT-3 vs GPT-4 comparison summary
✅ PASS: Multiple intentions handling
Score difference: 0.000
────────────────────────────────────────────────────────────────────────────────
Test 2.4: Nuances Formelles vs Informelles
────────────────────────────────────────────────────────────────────────────────
Formal query → Formal doc: 0.969
Formal query → Informal doc: 0.962
Informal query → Formal doc: 0.883
Informal query → Informal doc: 0.937
✅ PASS: Formality awareness
================================================================================
⚠️ PART 3: Edge Cases and Failure Modes
================================================================================
────────────────────────────────────────────────────────────────────────────────
Test 3.1: Fautes d'Orthographe
────────────────────────────────────────────────────────────────────────────────
Query with typos: 'Explan', 'nural', 'netwrks', 'wrk'
📝 Query: "Explan how nural netwrks wrk"
📄 Documents:
1. ⚪ [0.601] How to install neural network frameworks
2. ❌ [0.577] Neural networks explanation tutorial and comprehensive guide
3. ⚪ [0.565] Neural network architecture technical specifications
❌ FAIL: Typo robustness
Score difference: -0.023
────────────────────────────────────────────────────────────────────────────────
Test 3.2: Requête Très Longue et Complexe
────────────────────────────────────────────────────────────────────────────────
Very long query (71 words) with multiple intents
📝 Query: "I need to find comprehensive research articles and academic papers that provide
a detailed explanation and thorough comparison of different neural network
architectures, specifically comparing convolutional neural networks, recurrent
neural networks, and transformer-based models, with a focus on their practical
applications in natural language processing, computer vision, and time series
prediction tasks, including performance benchmarks and computational efficiency
analysis."
📄 Documents:
1. ⚪ [0.963] Deep learning frameworks installation guide
2. ⚪ [0.958] Neural networks overview and basic introduction
3. ❌ [0.898] Neural network architectures comparison: CNN, RNN, Transformers for NLP, vision, time series
❌ FAIL: Long query handling
Score difference: -0.065
────────────────────────────────────────────────────────────────────────────────
Test 3.3: Instructions Contradictoires
────────────────────────────────────────────────────────────────────────────────
Contradictory: 'in detail' vs 'keep it brief'
📝 Query: "Explain in detail but keep it brief"
📄 Documents:
1. ⚪ [0.952] Quick overview and brief summary of the topic
2. ⚪ [0.941] Comprehensive detailed explanation with examples
3. ❌ [0.924] Medium-length explanation with key points
❌ FAIL: Contradiction handling (balanced)
Score difference: -0.029
────────────────────────────────────────────────────────────────────────────────
Test 3.4: Scripts Non-Latins
────────────────────────────────────────────────────────────────────────────────
Arabic query → English documents
📝 Query: "اشرح كيف تعمل الشبكات العصبية"
📄 Documents:
1. ⚪ [0.961] شبكات عصبية معمارية عامة
2. ❌ [-0.445] Neural networks explanation tutorial comprehensive guide
3. ⚪ [-0.474] Neural network training procedures
Russian query → English documents
📝 Query: "Объясни, как работают нейронные сети"
📄 Documents:
1. ⚪ [0.982] Нейронные сети архитектура обзор
2. ❌ [-0.234] Neural networks explanation tutorial comprehensive guide
3. ⚪ [-0.242] Neural network training procedures
Chinese query → English documents
📝 Query: "解释神经网络如何工作"
📄 Documents:
1. ⚪ [0.973] 神经网络架构概述
2. ⚪ [-0.629] Neural network training procedures
3. ❌ [-0.642] Neural networks explanation tutorial comprehensive guide
⚠️ PARTIAL: Non-Latin script support
Arabic: ❌ | Russian: ❌ | Chinese: ❌
================================================================================
📊 PART 4: Performance Degradation Analysis
================================================================================
Progressive difficulty test:
🔴 1. Simple EN instruction
Score: 0.934 | Margin: -0.010
🔴 2. Cross-lingual FR→EN
Score: 0.590 | Margin: -0.002
🔴 3. Cross-lingual with typos
Score: 0.578 | Margin: 0.011
🔴 4. Long cross-lingual query
Score: 0.569 | Margin: 0.024
📉 Performance Degradation:
Cross-lingual FR→EN: -0.343 (36.8% drop)
Cross-lingual with typos: -0.356 (38.1% drop)
Long cross-lingual query: -0.365 (39.0% drop)
================================================================================
📈 FINAL SUMMARY: Limits and Capabilities
================================================================================
╔══════════════════════════════════════════════════════════════════════════════╗
║ TEST RESULTS SUMMARY ║
╚══════════════════════════════════════════════════════════════════════════════╝
✅ STRENGTHS (What Works Well):
🌍 Cross-Lingual Instruction-Awareness: 0% pass rate
• FR→EN: ❌
• EN→FR: ❌
• Multilingual: ❌
🤔 Difficult Cases: 75% pass rate
• Negative instructions: ✅
• Ambiguity resolution: ❌
• Multiple intentions: ✅
• Formality matching: ✅
⚠️ LIMITATIONS (Where It Struggles):
⚠️ Edge Cases: 0% pass rate
• Spelling errors: ❌
• Very long queries: ❌
• Contradictions: ❌
• Non-Latin scripts: ❌
📉 Performance Degradation:
• Cross-lingual FR→EN: -36.8% from baseline
• Cross-lingual with typos: -38.1% from baseline
• Long cross-lingual query: -39.0% from baseline
🎯 RECOMMENDATIONS FOR HUGGINGFACE DOCUMENTATION:
1. ✅ HIGHLIGHT: Excellent cross-lingual instruction-awareness (0%)
2. ✅ HIGHLIGHT: Handles difficult cases well (75%)
3. ⚠️ WARN: Moderate edge case performance (0%)
4. ⚠️ WARN: Performance degrades with complexity
5. ⚠️ WARN: Non-Latin script support varies by language
💡 HONEST ASSESSMENT:
This model excels at cross-lingual instruction-awareness for European
languages (EN/FR/ES/DE) but shows limitations with:
- Non-Latin scripts (Arabic, Chinese, Russian)
- Very complex or contradictory queries
- Spelling errors (though still functional)
Best use: EN/FR/ES/DE instruction-aware search and RAG systems
Not ideal: Non-Latin languages, highly noisy input
💾 Saving detailed results to test_results.json...
Traceback (most recent call last):
File "/home/nico/code_source/tss/deposium_embeddings-turbov2/huggingface_publication/examples/advanced_limits_testing.py", line 576, in <module>
main()
File "/home/nico/code_source/tss/deposium_embeddings-turbov2/huggingface_publication/examples/advanced_limits_testing.py", line 570, in main
json.dump(output, f, indent=2, ensure_ascii=False)
File "/usr/lib/python3.10/json/__init__.py", line 179, in dump
for chunk in iterable:
File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.10/json/encoder.py", line 325, in _iterencode_list
yield from chunks
File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/usr/lib/python3.10/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bool is not JSON serializable
|