Qwen2.5-3B-Korean-QLoRA (PEFT Adapter)

Model Description

Qwen2.5-3B-Korean-QLoRAλŠ” Qwen/Qwen2.5-3B-Instructλ₯Ό ν•œκ΅­μ–΄λ‘œ νŒŒμΈνŠœλ‹ν•œ LoRA μ–΄λŒ‘ν„°μž…λ‹ˆλ‹€.

이 λ¦¬ν¬μ§€ν† λ¦¬λŠ” PEFT μ–΄λŒ‘ν„°λ§Œ μ œκ³΅ν•˜λ©°, μ‚¬μš© μ‹œ 베이슀 λͺ¨λΈμ΄ ν•„μš”ν•©λ‹ˆλ‹€.

Merged λͺ¨λΈμ΄ ν•„μš”ν•˜μ‹  경우: MyeongHo0621/Qwen2.5-3B-Korean

🎯 Key Features

  • πŸ‡°πŸ‡· Korean Optimization: 200,000개 κ³ ν’ˆμ§ˆ ν•œκ΅­μ–΄ λŒ€ν™” λ°μ΄ν„°λ‘œ ν•™μŠ΅
  • πŸ’Ύ Lightweight: μ–΄λŒ‘ν„°λ§Œ ~479MB (베이슀 λͺ¨λΈ 6GB λŒ€λΉ„)
  • πŸ”¬ Research Friendly: νŒŒμΈνŠœλ‹ 연ꡬ 및 μ‹€ν—˜μ— 적합
  • πŸš€ Fast Loading: LoRA μ–΄λŒ‘ν„°λ‘œ λΉ λ₯Έ λ‘œλ”© 및 μ „ν™˜
  • βš–οΈ Apache 2.0: 상업적 μ‚¬μš© κ°€λŠ₯

πŸš€ Quick Start

Installation

pip install torch transformers peft

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# 1. 베이슀 λͺ¨λΈ λ‘œλ”©
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# 2. LoRA μ–΄λŒ‘ν„° 적용 (루트 경둜 = μ΅œμ’… λͺ¨λΈ)
model = PeftModel.from_pretrained(
    base_model,
    "MyeongHo0621/Qwen2.5-3B-Korean-QLoRA"
)
# λ˜λŠ” final 폴더 μ‚¬μš©: subfolder="final"

# 3. ν† ν¬λ‚˜μ΄μ € λ‘œλ”©
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")

# 4. μΆ”λ‘ 
messages = [
    {"role": "system", "content": "You are a helpful Korean assistant."},
    {"role": "user", "content": "ν•œκ΅­μ˜ μˆ˜λ„λŠ” μ–΄λ””μΈκ°€μš”?"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

πŸ“¦ Repository Structure

MyeongHo0621/Qwen2.5-3B-Korean-QLoRA/
β”œβ”€β”€ (루트)                        # μ΅œμ’… ν•™μŠ΅ λͺ¨λΈ (step 4689)
β”‚   β”œβ”€β”€ adapter_model.safetensors # LoRA κ°€μ€‘μΉ˜ (~479MB)
β”‚   β”œβ”€β”€ adapter_config.json       # LoRA μ„€μ •
β”‚   β”œβ”€β”€ tokenizer.json            # ν† ν¬λ‚˜μ΄μ €
β”‚   └── ...
└── final/                        # λͺ¨λΈ μ €μž₯λ³Έ (λ°±μ—…)
    β”œβ”€β”€ adapter_model.safetensors
    └── ...

πŸ”§ Training Details

Dataset

  • Source: MyeongHo0621/smol-koreantalk
  • Samples: 200,000 high-quality Korean conversational pairs
  • Domain: General conversation, instruction following, knowledge Q&A

Training Configuration

Hyperparameter Value
Base Model Qwen/Qwen2.5-3B-Instruct
Method QLoRA (4-bit NF4)
LoRA Rank (r) 64
LoRA Alpha 128
LoRA Dropout 0.05
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Learning Rate 2e-4
Batch Size 32 (per device)
Gradient Accumulation 4 (effective: 128)
Warmup Ratio 0.1
Epochs 3
Total Steps 4689
Max Length 2048
Quantization 4-bit NF4 (training)

πŸ’‘ Use Cases

βœ… Recommended

  • νŒŒμΈνŠœλ‹ 연ꡬ 및 μ‹€ν—˜
  • LoRA μ–΄λŒ‘ν„° 비ꡐ 뢄석
  • λ©”λͺ¨λ¦¬ 효율적인 μΆ”λ‘ 
  • λΉ λ₯Έ λͺ¨λΈ μ „ν™˜ (μ—¬λŸ¬ LoRA μ–΄λŒ‘ν„° ꡐ체)
  • ꡐ윑 및 ν•™μŠ΅ λͺ©μ 

⚠️ Alternatives


πŸ”„ Merging the Adapter

μ–΄λŒ‘ν„°λ₯Ό 베이슀 λͺ¨λΈκ³Ό λ³‘ν•©ν•˜λ €λ©΄:

from transformers import AutoModelForCausalLM
from peft import PeftModel

# 베이슀 λͺ¨λΈ λ‘œλ”©
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct")

# LoRA μ–΄λŒ‘ν„° λ‘œλ”©
model = PeftModel.from_pretrained(base_model, "MyeongHo0621/Qwen2.5-3B-Korean-QLoRA")

# Merge
merged_model = model.merge_and_unload()

# μ €μž₯
merged_model.save_pretrained("./qwen25-3b-korean-merged")

이미 λ³‘ν•©λœ λͺ¨λΈμ΄ ν•„μš”ν•˜μ‹œλ©΄ MyeongHo0621/Qwen2.5-3B-Korean을 μ‚¬μš©ν•˜μ„Έμš”!


πŸ“Š Performance

Model Size Load Time Memory (Inference) Use Case
LoRA Adapter ~479MB ~5s ~4-6GB Research, Experiments
Merged Model ~6GB ~10s ~4-6GB Production, vLLM
GGUF Q4_K_M ~2GB ~3s ~2-3GB Local, Ollama

πŸ”— Related Repositories

Merged Model (Production)

  • MyeongHo0621/Qwen2.5-3B-Korean
    • Merged model (μ¦‰μ‹œ μ‚¬μš© κ°€λŠ₯)
    • GGUF files (Ollama, Llama.cpp)
    • vLLM, SGLang, Transformers 지원

Dataset


πŸ“ Citation

@misc{qwen25-korean-qlora-2025,
  author = {MyeongHo Shin},
  title = {Qwen2.5-3B-Korean-QLoRA: Korean LoRA Adapter for Qwen2.5-3B},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean-QLoRA}},
}

πŸ™ Acknowledgments


πŸ“ž Contact


βš–οΈ License

Apache 2.0 - 상업적 μ‚¬μš©, μˆ˜μ •, 배포 κ°€λŠ₯


πŸ’‘ Tips

Faster Inference

# 4-bit μ–‘μžν™”λ‘œ λ©”λͺ¨λ¦¬ μ ˆμ•½
from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct",
    quantization_config=bnb_config,
    device_map="auto"
)

Multiple LoRA Adapters

# μ—¬λŸ¬ μ–΄λŒ‘ν„°λ₯Ό λΉ λ₯΄κ²Œ μ „ν™˜
model.unload()
model = PeftModel.from_pretrained(base_model, "another-lora-adapter")

Training Your Own Adapter

이 μ–΄λŒ‘ν„°λ₯Ό 기반으둜 μΆ”κ°€ νŒŒμΈνŠœλ‹:

from peft import get_peft_model, LoraConfig

# μƒˆλ‘œμš΄ LoRA λ ˆμ΄μ–΄ μΆ”κ°€
peft_config = LoraConfig(
    r=64,
    lora_alpha=128,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, peft_config)
# ... training code ...
Downloads last month
226
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for MyeongHo0621/Qwen2.5-3B-Korean-QLoRA

Base model

Qwen/Qwen2.5-3B
Adapter
(604)
this model