File size: 1,319 Bytes
f582c27 0699fd0 f582c27 0699fd0 f582c27 0699fd0 f582c27 0699fd0 f582c27 0699fd0 f582c27 0699fd0 9e4908f 0699fd0 9e4908f 0699fd0 5ab83fa 0699fd0 5ab83fa 0699fd0 d781aad 0699fd0 d781aad 0699fd0 d781aad 0699fd0 f582c27 0699fd0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
---
license: apache-2.0
base_model: meta-llama/Llama-3.2-1B-Instruct
tags:
- dpo
- peft
- llama
- preference-learning
model-index:
- name: llama3-dpo-llm judge
results: []
---
# Llama-3.2-1B DPO LLM Judge
This model is a fine-tuned version of [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) using Direct Preference Optimization (DPO).
## Model Details
- **Base Model**: meta-llama/Llama-3.2-1B-Instruct
- **Training Method**: Direct Preference Optimization (DPO)
- **Preference Source**: LLM Judge
- **LoRA Configuration**:
- r: 8
- alpha: 16
- target_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj']
- **Training Steps**: 250
- **Learning Rate**: 0.0002
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
model = PeftModel.from_pretrained(base_model, "pyamy/llama3-dpo-llm judge")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
```
## Training Details
- Dataset: 50 instructions from LIMA
- Responses per instruction: 5
- Preference judgment: LLM Judge
- Training framework: TRL DPOTrainer
## Performance
See evaluation results in the repository for detailed performance metrics.
|