Qwen2.5-3B-Korean-QLoRA (PEFT Adapter)
Model Description
Qwen2.5-3B-Korean-QLoRAλ Qwen/Qwen2.5-3B-Instructλ₯Ό νκ΅μ΄λ‘ νμΈνλν LoRA μ΄λν°μ λλ€.
μ΄ λ¦¬ν¬μ§ν 리λ PEFT μ΄λν°λ§ μ 곡νλ©°, μ¬μ© μ λ² μ΄μ€ λͺ¨λΈμ΄ νμν©λλ€.
Merged λͺ¨λΈμ΄ νμνμ κ²½μ°: MyeongHo0621/Qwen2.5-3B-Korean
π― Key Features
- π°π· Korean Optimization: 200,000κ° κ³ νμ§ νκ΅μ΄ λν λ°μ΄ν°λ‘ νμ΅
- πΎ Lightweight: μ΄λν°λ§ ~479MB (λ² μ΄μ€ λͺ¨λΈ 6GB λλΉ)
- π¬ Research Friendly: νμΈνλ μ°κ΅¬ λ° μ€νμ μ ν©
- π Fast Loading: LoRA μ΄λν°λ‘ λΉ λ₯Έ λ‘λ© λ° μ ν
- βοΈ Apache 2.0: μμ μ μ¬μ© κ°λ₯
π Quick Start
Installation
pip install torch transformers peft
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# 1. λ² μ΄μ€ λͺ¨λΈ λ‘λ©
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-3B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# 2. LoRA μ΄λν° μ μ© (λ£¨νΈ κ²½λ‘ = μ΅μ’
λͺ¨λΈ)
model = PeftModel.from_pretrained(
base_model,
"MyeongHo0621/Qwen2.5-3B-Korean-QLoRA"
)
# λλ final ν΄λ μ¬μ©: subfolder="final"
# 3. ν ν¬λμ΄μ λ‘λ©
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
# 4. μΆλ‘
messages = [
{"role": "system", "content": "You are a helpful Korean assistant."},
{"role": "user", "content": "νκ΅μ μλλ μ΄λμΈκ°μ?"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
π¦ Repository Structure
MyeongHo0621/Qwen2.5-3B-Korean-QLoRA/
βββ (루νΈ) # μ΅μ’
νμ΅ λͺ¨λΈ (step 4689)
β βββ adapter_model.safetensors # LoRA κ°μ€μΉ (~479MB)
β βββ adapter_config.json # LoRA μ€μ
β βββ tokenizer.json # ν ν¬λμ΄μ
β βββ ...
βββ final/ # λͺ¨λΈ μ μ₯λ³Έ (λ°±μ
)
βββ adapter_model.safetensors
βββ ...
π§ Training Details
Dataset
- Source: MyeongHo0621/smol-koreantalk
- Samples: 200,000 high-quality Korean conversational pairs
- Domain: General conversation, instruction following, knowledge Q&A
Training Configuration
| Hyperparameter | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-3B-Instruct |
| Method | QLoRA (4-bit NF4) |
| LoRA Rank (r) | 64 |
| LoRA Alpha | 128 |
| LoRA Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Learning Rate | 2e-4 |
| Batch Size | 32 (per device) |
| Gradient Accumulation | 4 (effective: 128) |
| Warmup Ratio | 0.1 |
| Epochs | 3 |
| Total Steps | 4689 |
| Max Length | 2048 |
| Quantization | 4-bit NF4 (training) |
π‘ Use Cases
β Recommended
- νμΈνλ μ°κ΅¬ λ° μ€ν
- LoRA μ΄λν° λΉκ΅ λΆμ
- λ©λͺ¨λ¦¬ ν¨μ¨μ μΈ μΆλ‘
- λΉ λ₯Έ λͺ¨λΈ μ ν (μ¬λ¬ LoRA μ΄λν° κ΅μ²΄)
- κ΅μ‘ λ° νμ΅ λͺ©μ
β οΈ Alternatives
- νλ‘λμ μλΉ: MyeongHo0621/Qwen2.5-3B-Korean κΆμ₯ (Merged λͺ¨λΈ)
- Ollama/Llama.cpp: MyeongHo0621/Qwen2.5-3B-Korean (GGUF ν¬ν¨)
π Merging the Adapter
μ΄λν°λ₯Ό λ² μ΄μ€ λͺ¨λΈκ³Ό λ³ν©νλ €λ©΄:
from transformers import AutoModelForCausalLM
from peft import PeftModel
# λ² μ΄μ€ λͺ¨λΈ λ‘λ©
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
# LoRA μ΄λν° λ‘λ©
model = PeftModel.from_pretrained(base_model, "MyeongHo0621/Qwen2.5-3B-Korean-QLoRA")
# Merge
merged_model = model.merge_and_unload()
# μ μ₯
merged_model.save_pretrained("./qwen25-3b-korean-merged")
μ΄λ―Έ λ³ν©λ λͺ¨λΈμ΄ νμνμλ©΄ MyeongHo0621/Qwen2.5-3B-Koreanμ μ¬μ©νμΈμ!
π Performance
| Model | Size | Load Time | Memory (Inference) | Use Case |
|---|---|---|---|---|
| LoRA Adapter | ~479MB | ~5s | ~4-6GB | Research, Experiments |
| Merged Model | ~6GB | ~10s | ~4-6GB | Production, vLLM |
| GGUF Q4_K_M | ~2GB | ~3s | ~2-3GB | Local, Ollama |
π Related Repositories
Merged Model (Production)
- MyeongHo0621/Qwen2.5-3B-Korean
- Merged model (μ¦μ μ¬μ© κ°λ₯)
- GGUF files (Ollama, Llama.cpp)
- vLLM, SGLang, Transformers μ§μ
Dataset
- MyeongHo0621/smol-koreantalk
- κ³ νμ§ νκ΅μ΄ λν λ°μ΄ν°
π Citation
@misc{qwen25-korean-qlora-2025,
author = {MyeongHo Shin},
title = {Qwen2.5-3B-Korean-QLoRA: Korean LoRA Adapter for Qwen2.5-3B},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean-QLoRA}},
}
π Acknowledgments
- Base Model: Qwen2.5-3B-Instruct by Alibaba Cloud
- Dataset: smol-koreantalk
- Tools: Unsloth, PEFT, Transformers
π Contact
- Author: MyeongHo Shin
- HuggingFace: @MyeongHo0621
βοΈ License
Apache 2.0 - μμ μ μ¬μ©, μμ , λ°°ν¬ κ°λ₯
π‘ Tips
Faster Inference
# 4-bit μμνλ‘ λ©λͺ¨λ¦¬ μ μ½
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-3B-Instruct",
quantization_config=bnb_config,
device_map="auto"
)
Multiple LoRA Adapters
# μ¬λ¬ μ΄λν°λ₯Ό λΉ λ₯΄κ² μ ν
model.unload()
model = PeftModel.from_pretrained(base_model, "another-lora-adapter")
Training Your Own Adapter
μ΄ μ΄λν°λ₯Ό κΈ°λ°μΌλ‘ μΆκ° νμΈνλ:
from peft import get_peft_model, LoraConfig
# μλ‘μ΄ LoRA λ μ΄μ΄ μΆκ°
peft_config = LoraConfig(
r=64,
lora_alpha=128,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
task_type="CAUSAL_LM"
)
model = get_peft_model(model, peft_config)
# ... training code ...
- Downloads last month
- 226