albertfares
/

DPO_MCQA_model

Text Generation

Model card Files Files and versions

DPO_MCQA_model / README.md

albertfares's picture

Upload merged DPO + MCQA model

8efeb06 verified 3 months ago

|

history blame contribute delete

1.96 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-0.6B-Base
	tags:
	- merge
	- sft
	- dpo
	- qwen3
	- math
	- code
	- mcqa
	- mnlp-m3
	datasets:
	- albertfares/MNLP_M3_dpo_dataset
	language:
	- en
	pipeline_tag: text-generation
	---

	# MNLP M3 Merged Model (SFT + DPO)

	This model combines the best of both worlds:
	- SFT Component: `mgatti/MNLP_M3_mcqa_model` - Multiple-choice QA capabilities
	- DPO Component: `albertfares/MNLP_M3_dpo_model` - Preference-aligned responses

	## Model Details

	- Base Model: Qwen/Qwen3-0.6B-Base
	- SFT Model: Multiple-choice QA fine-tuned model
	- DPO Model: Direct preference optimized model
	- Merge Strategy: Advanced model weight merging
	- Combined Capabilities: MCQA + preference alignment

	## Capabilities

	✅ Multiple-Choice Question Answering (from SFT component)
	✅ Preference-Aligned Generation (from DPO component)
	✅ Math and Code Generation (from MNLP M3 training)
	✅ Reasoning Tasks (combined strengths)

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("merged_mnlp_m3_sft_dpo")
	tokenizer = AutoTokenizer.from_pretrained("merged_mnlp_m3_sft_dpo")

	# For MCQA
	prompt = "Which of the following is correct? A) 2+2=5 B) 2+2=4 C) 2+2=3"
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=200)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))

	# For general generation
	prompt = "Explain the concept of recursion in programming"
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=300, temperature=0.7)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Training Data

	- SFT: Multiple-choice QA dataset
	- DPO: MNLP M3 preference dataset with math, code, and reasoning

	This merged model should excel at both structured QA tasks and open-ended generation with preference alignment.