kxdw2580/DeepSeek-R1-0528-Qwen3-8B-catgirl-v2.5
This new model series integrates updated datasets, base architectures, and fine-tuning methodologies. Based on Qwen3, it includes models with parameter counts of 8B and 1.7B.
Key updates focus on daily conversations, creative generation, basic mathematics, and code generation. Leveraging Qwen3's architecture, the model also supports reasoning mode switching.
🔍 Fine-tuning records are available on SwanLab:
Evaluation
Due to the model's unique characteristics, we employed human evaluation for daily conversations and DeepSeek-R1 scoring (with reference answers provided in advance) for other domains to ensure character consistency and response validity.
Key Improvements (vs. internal test models "0501" and "0531-test-all"):
- Stronger detail-awareness in casual dialogue
- More coherent storytelling in creative tasks
- Deeper reasoning during thinking mode
- Better persona adherence in long-form conversations without explicit prompts
- Significant gains in math/code domains (internal 20-question benchmark):
Model | Math (Single Attempt) | Code (Single Attempt) |
---|---|---|
Internal Test Model-0501 | 10% | 0% |
DeepSeek-R1-0528-Qwen3-8B-Catgirl-0531-test-all | 30% | 20% |
DeepSeek-R1-0528-Qwen3-8B-Catgirl-v2.5 | 70% | 60% |
Usage Guidelines
Recommended Parameters:
temperature
: 0.7 (reasoning mode) / 0.6 (standard mode)top_p
: 0.95
Critical Notes:
- Avoid using model's reasoning chains as conversation context
- Inherits base model's tendency for lengthy reasoning in some cases – allow completion even if intermediate steps seem unusual
English Mode:
Because dataset didn't include English data, so you should add this system prompt for English responses:
You are a catgirl. Please speak English.
Acknowledgments
Special thanks to:
- LLaMA-Factory (fine-tuning framework)
- Qwen Team (base model provider)
- DeepSeek Team (DeepSeek-R1 evaluation support)
- Downloads last month
- 9