|
--- |
|
license: apache-2.0 |
|
base_model: Qwen/Qwen3-0.6B-Base |
|
tags: |
|
- merge |
|
- sft |
|
- dpo |
|
- qwen3 |
|
- math |
|
- code |
|
- mcqa |
|
- mnlp-m3 |
|
datasets: |
|
- albertfares/MNLP_M3_dpo_dataset |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# MNLP M3 Merged Model (SFT + DPO) |
|
|
|
This model combines the best of both worlds: |
|
- **SFT Component**: `mgatti/MNLP_M3_mcqa_model` - Multiple-choice QA capabilities |
|
- **DPO Component**: `albertfares/MNLP_M3_dpo_model` - Preference-aligned responses |
|
|
|
## Model Details |
|
|
|
- **Base Model**: Qwen/Qwen3-0.6B-Base |
|
- **SFT Model**: Multiple-choice QA fine-tuned model |
|
- **DPO Model**: Direct preference optimized model |
|
- **Merge Strategy**: Advanced model weight merging |
|
- **Combined Capabilities**: MCQA + preference alignment |
|
|
|
## Capabilities |
|
|
|
β
**Multiple-Choice Question Answering** (from SFT component) |
|
β
**Preference-Aligned Generation** (from DPO component) |
|
β
**Math and Code Generation** (from MNLP M3 training) |
|
β
**Reasoning Tasks** (combined strengths) |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("merged_mnlp_m3_sft_dpo") |
|
tokenizer = AutoTokenizer.from_pretrained("merged_mnlp_m3_sft_dpo") |
|
|
|
# For MCQA |
|
prompt = "Which of the following is correct? A) 2+2=5 B) 2+2=4 C) 2+2=3" |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_length=200) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|
# For general generation |
|
prompt = "Explain the concept of recursion in programming" |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_length=300, temperature=0.7) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
## Training Data |
|
|
|
- **SFT**: Multiple-choice QA dataset |
|
- **DPO**: MNLP M3 preference dataset with math, code, and reasoning |
|
|
|
This merged model should excel at both structured QA tasks and open-ended generation with preference alignment. |
|
|