Qwen2.5-3B-Korean

Model Description

Qwen2.5-3B-Korean은 Qwen/Qwen2.5-3B-Instructλ₯Ό ν•œκ΅­μ–΄λ‘œ νŒŒμΈνŠœλ‹ν•œ Merged λͺ¨λΈμž…λ‹ˆλ‹€.

이 λ¦¬ν¬μ§€ν† λ¦¬λŠ” LoRA μ–΄λŒ‘ν„°κ°€ 이미 λ³‘ν•©λœ μ™„μ „ν•œ λͺ¨λΈκ³Ό GGUF νŒŒμΌμ„ μ œκ³΅ν•©λ‹ˆλ‹€.

PEFT/LoRA μ–΄λŒ‘ν„°κ°€ ν•„μš”ν•˜μ‹  경우: MyeongHo0621/Qwen2.5-3B-Korean-QLoRA

🎯 Key Features

  • πŸ‡°πŸ‡· Korean Optimization: 200,000개 κ³ ν’ˆμ§ˆ ν•œκ΅­μ–΄ λŒ€ν™” λ°μ΄ν„°λ‘œ ν•™μŠ΅
  • πŸ“¦ Ready-to-Use: LoRA 병합 μ™„λ£Œ, μ¦‰μ‹œ μ‚¬μš© κ°€λŠ₯
  • πŸš€ Multi-Format: Safetensors (루트) + GGUF (gguf/)
  • πŸ’» All Frameworks: Transformers, vLLM, SGLang, Ollama, Llama.cpp
  • βš–οΈ Apache 2.0: 상업적 μ‚¬μš© κ°€λŠ₯

πŸ“¦ Available Formats

Format Path Use Case Size
Safetensors / (루트) Transformers, vLLM, SGLang ~6GB
GGUF Q4_K_M gguf/qwen25-3b-korean-Q4_K_M.gguf Ollama, Llama.cpp (ꢌμž₯) ~2GB
GGUF Q5_K_M gguf/qwen25-3b-korean-Q5_K_M.gguf κ³ ν’ˆμ§ˆ ~2.5GB
GGUF Q8_0 gguf/qwen25-3b-korean-Q8_0.gguf 졜고 ν’ˆμ§ˆ ~3.5GB
GGUF F16 gguf/qwen25-3b-korean-F16.gguf 벀치마크 ~6GB

πŸš€ Quick Start

1️⃣ Transformers (κ°€μž₯ 간단)

from transformers import AutoModelForCausalLM, AutoTokenizer

# λͺ¨λΈ λ‘œλ”© (Merged λͺ¨λΈ)
model = AutoModelForCausalLM.from_pretrained(
    "MyeongHo0621/Qwen2.5-3B-Korean",
    torch_dtype="auto",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("MyeongHo0621/Qwen2.5-3B-Korean")

# μ±„νŒ… ν…œν”Œλ¦Ώ μ‚¬μš©
messages = [
    {"role": "system", "content": "You are a helpful Korean assistant."},
    {"role": "user", "content": "ν•œκ΅­μ˜ μˆ˜λ„λŠ” μ–΄λ””μΈκ°€μš”?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

2️⃣ vLLM (Production Serving)

from vllm import LLM, SamplingParams

# Merged λͺ¨λΈ λ‘œλ”©
llm = LLM(
    model="MyeongHo0621/Qwen2.5-3B-Korean",
    quantization="bitsandbytes",  # μ˜΅μ…˜: 4-bit μ–‘μžν™”
    gpu_memory_utilization=0.6
)

prompts = ["ν•œκ΅­μ˜ μˆ˜λ„λŠ” μ–΄λ””μΈκ°€μš”?"]
params = SamplingParams(temperature=0.7, max_tokens=512)

outputs = llm.generate(prompts, params)
for output in outputs:
    print(output.outputs[0].text)

Server Mode:

vllm serve MyeongHo0621/Qwen2.5-3B-Korean \
    --quantization bitsandbytes \
    --port 8000

3️⃣ SGLang (Fastest)

import sglang as sgl

runtime = sgl.Runtime(
    model_path="MyeongHo0621/Qwen2.5-3B-Korean",
    quantization="bitsandbytes"
)

sgl.set_default_backend(runtime)

@sgl.function
def chat(s, prompt):
    s += sgl.user(prompt)
    s += sgl.assistant(sgl.gen("response", max_tokens=512))

state = chat.run(prompt="ν•œκ΅­μ˜ μˆ˜λ„λŠ”?")
print(state["response"])

4️⃣ Ollama (Local Desktop)

# 1. GGUF λ‹€μš΄λ‘œλ“œ
huggingface-cli download MyeongHo0621/Qwen2.5-3B-Korean \
    gguf/qwen25-3b-korean-Q4_K_M.gguf \
    --local-dir ./

# 2. Modelfile 생성
cat > Modelfile << 'EOF'
FROM ./gguf/qwen25-3b-korean-Q4_K_M.gguf

TEMPLATE """<|im_start|>system
You are a helpful Korean assistant.<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER temperature 0.7
EOF

# 3. λͺ¨λΈ 생성 & μ‹€ν–‰
ollama create qwen25-korean -f Modelfile
ollama run qwen25-korean "ν•œκ΅­μ˜ μˆ˜λ„λŠ”?"

5️⃣ Llama.cpp (CPU/Edge)

# 1. GGUF λ‹€μš΄λ‘œλ“œ
huggingface-cli download MyeongHo0621/Qwen2.5-3B-Korean \
    gguf/qwen25-3b-korean-Q4_K_M.gguf \
    --local-dir ./

# 2. μΆ”λ‘  (GPU)
./llama.cpp/main \
    -m ./gguf/qwen25-3b-korean-Q4_K_M.gguf \
    -p "<|im_start|>user\nν•œκ΅­μ˜ μˆ˜λ„λŠ”?<|im_end|>\n<|im_start|>assistant\n" \
    -n 512 \
    --temp 0.7 \
    -ngl 99

# 3. μΆ”λ‘  (CPU)
./llama.cpp/main \
    -m ./gguf/qwen25-3b-korean-Q4_K_M.gguf \
    -p "<|im_start|>user\nν•œκ΅­μ˜ μˆ˜λ„λŠ”?<|im_end|>\n<|im_start|>assistant\n" \
    -n 512 \
    -t 8

πŸ”§ Training Details

Dataset

  • Source: MyeongHo0621/smol-koreantalk
  • Samples: 200,000 ν•œκ΅­μ–΄ λŒ€ν™” 쌍
  • Domain: 일반 λŒ€ν™”, μ§€μ‹œ μˆ˜ν–‰, 지식 Q&A

Training Configuration

Hyperparameter Value
Method QLoRA (4-bit NF4)
LoRA Rank 64
LoRA Alpha 128
Learning Rate 2e-4
Batch Size 128 (effective)
Epochs 3
Steps 4689
Max Length 2048

πŸ“Š Repository Structure

MyeongHo0621/Qwen2.5-3B-Korean/
β”œβ”€β”€ config.json                 # λͺ¨λΈ μ„€μ •
β”œβ”€β”€ model.safetensors          # Merged λͺ¨λΈ (~6GB)
β”œβ”€β”€ tokenizer.json             # ν† ν¬λ‚˜μ΄μ €
β”œβ”€β”€ tokenizer_config.json
└── gguf/                      # GGUF νŒŒμΌλ“€
    β”œβ”€β”€ qwen25-3b-korean-Q4_K_M.gguf  (~2GB) ⭐ ꢌμž₯
    β”œβ”€β”€ qwen25-3b-korean-Q5_K_M.gguf  (~2.5GB)
    β”œβ”€β”€ qwen25-3b-korean-Q8_0.gguf    (~3.5GB)
    └── qwen25-3b-korean-F16.gguf     (~6GB)

πŸ”— Related Repositories

  • PEFT Adapter: MyeongHo0621/Qwen2.5-3B-Korean-QLoRA
    • LoRA μ–΄λŒ‘ν„°λ§Œ ν•„μš”ν•œ 경우
    • νŒŒμΈνŠœλ‹ μ—°κ΅¬μš©
    • ~479MB (κ²½λŸ‰)

πŸ“ Citation

@misc{qwen25-korean-2025,
  author = {MyeongHo Shin},
  title = {Qwen2.5-3B-Korean: Korean-Optimized Conversational Model},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean}},
}

πŸ™ Acknowledgments


πŸ“ž Contact


βš–οΈ License

Apache 2.0 - 상업적 μ‚¬μš©, μˆ˜μ •, 배포 κ°€λŠ₯


Evaluation results

Benchmark Results

General Benchmarks

Task Score Metric
gsm8k 42.00% acc
mmlu 58.00% acc
hellaswag 71.00% acc_norm
winogrande 65.00% acc
arc_easy 78.00% acc
arc_challenge 48.00% acc_norm

Average Score: 60.33%

Downloads last month
136
Safetensors
Model size
3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for MyeongHo0621/Qwen2.5-3B-Korean

Base model

Qwen/Qwen2.5-3B
Quantized
(163)
this model
Quantizations
1 model