ZYH-LLM-Qwen2.5-14B-V5
The ZYH-LLM-Qwen2.5-14B fifth-generation model was officially released!
It merges high-performance instruction, code, and reasoning models built on the Qwen2.5-14B.
Recently, many high-performance reasoning models have emerged, such as:
- deepcogito/cogito-v1-preview-qwen-14B
- FractalAIResearch/Fathom-R1-14B
- agentica-org/DeepCoder-14B-Preview
- PKU-DS-LAB/FairyR1-14B-Preview
- qihoo360/Light-R1-14B-DS
These lay a good foundation for further improving model performance.
First stage:
Step 1:
Create a code model
models:
- model: Qwen/Qwen2.5-Coder-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-Coder-14B
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-Coder-14B-della
Step 2:
Create five different instruction models
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Qwen/Qwen2.5-14B
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-della-Base
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: arcee-ai/Virtuoso-Small-v2
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-della-V2
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: arcee-ai/SuperNova-Medius
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-della-Nova
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Azure99/Blossom-V6-14B
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-della-V6
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-della-EVA
Step 3:
Use the arcee_fusion merging method to incorporate cogito-v1-preview-qwen-14B into five instruction models.
models:
- model: cogito-v1-preview-qwen-14B
merge_method: arcee_fusion
base_model: Qwen2.5-14B-della-Base
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
out_dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-Base-cogito
models:
- model: cogito-v1-preview-qwen-14B
merge_method: arcee_fusion
base_model: Qwen2.5-14B-della-V2
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
out_dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-V2-cogito
models:
- model: cogito-v1-preview-qwen-14B
merge_method: arcee_fusion
base_model: Qwen2.5-14B-della-V6
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
out_dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-V6-cogito
models:
- model: cogito-v1-preview-qwen-14B
merge_method: arcee_fusion
base_model: Qwen2.5-14B-della-Nova
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
out_dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-Nova-cogito
models:
- model: cogito-v1-preview-qwen-14B
merge_method: arcee_fusion
base_model: Qwen2.5-14B-della-EVA
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
out_dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-EVA-cogito
Second stage:
Step 1:
Create three instruction models with a bias towards reasoning.
merge_method: model_stock
base_model: Qwen2.5-14B-Base-cogito
models:
- model: Qwen2.5-14B-V2-cogito
- model: Qwen2.5-Coder-14B-della
- model: agentica-org/DeepCoder-14B-Preview
- model: PKU-DS-LAB/FairyR1-14B-Preview
dtype: bfloat16
tokenizer_source: base
int8_mask: true
normalize: true
name: Qwen2.5-14B-cogito-mst-Coder
merge_method: model_stock
base_model: Qwen2.5-14B-Base-cogito
models:
- model: Qwen2.5-14B-V2-cogito
- model: Qwen2.5-14B-V6-cogito
- model: FractalAIResearch/Fathom-R1-14B
- model: qihoo360/Light-R1-14B-DS
dtype: bfloat16
tokenizer_source: base
int8_mask: true
normalize: true
name: Qwen2.5-14B-cogito-mst-V6
merge_method: model_stock
base_model: Qwen2.5-14B-Base-cogito
models:
- model: Qwen2.5-14B-V2-cogito
- model: Qwen2.5-14B-Nova-cogito
- model: FractalAIResearch/Fathom-R1-14B
- model: qihoo360/Light-R1-14B-DS
dtype: bfloat16
tokenizer_source: base
int8_mask: true
normalize: true
name: Qwen2.5-14B-cogito-mst-Nova
Step 2:
Create a pure instruction model to restore the generality of the final model.
merge_method: model_stock
base_model: Qwen2.5-14B-Base-cogito
models:
- model: Qwen2.5-14B-V2-cogito
- model: Qwen2.5-14B-V6-cogito
- model: Qwen2.5-14B-Nova-cogito
- model: Qwen2.5-14B-EVA-cogito
dtype: bfloat16
tokenizer_source: base
int8_mask: true
normalize: true
name: Qwen2.5-14B-cogito-mst-it
Third stage:
Step 1:
Create a base model with a context of 1 million tokens.
merge_method: sce
models:
# Pivot model
- model: Qwen/Qwen2.5-14B-Instruct-1M
# Target models
- model: Qwen/Qwen2.5-14B
base_model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
select_topk: 1
dtype: bfloat16
tokenizer_source: base
normalize: true
int8_mask: true
name: Qwen2.5-14B-1M
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
density: 1
weight: 1
lambda: 0.9
- model: Qwen/Qwen2.5-14B-Instruct-1M
parameters:
density: 1
weight: 1
lambda: 0.9
merge_method: della
base_model: Qwen2.5-14B-1M
parameters:
density: 1
weight: 1
lambda: 0.9
normalize: true
int8_mask: true
dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-della-Base-1M
Step 2:
Use the arcee_fusion merging method to incorporate cogito-v1-preview-qwen-14B into a base model with a context of 1 million tokens.
models:
- model: cogito-v1-preview-qwen-14B
merge_method: arcee_fusion
base_model: Qwen2.5-14B-della-Base-1M
parameters:
normalize: true
int8_mask: true
dtype: bfloat16
out_dtype: bfloat16
tokenizer_source: base
name: Qwen2.5-14B-cogito-Base-1M
Final stage:
merge_method: model_stock
base_model: Qwen2.5-14B-cogito-Base-1M
models:
- model: Qwen2.5-14B-cogito-mst-Coder
- model: Qwen2.5-14B-cogito-mst-V6
- model: Qwen2.5-14B-cogito-mst-Nova
- model: Qwen2.5-14B-cogito-mst-it
dtype: bfloat16
tokenizer_source: base
int8_mask: true
normalize: true
name: ZYH-LLM-Qwen2.5-14B-V5
- Downloads last month
- 0