Qwen2.5-3B-Korean
Model Description
Qwen2.5-3B-Koreanμ Qwen/Qwen2.5-3B-Instructλ₯Ό νκ΅μ΄λ‘ νμΈνλν Merged λͺ¨λΈμ λλ€.
μ΄ λ¦¬ν¬μ§ν 리λ LoRA μ΄λν°κ° μ΄λ―Έ λ³ν©λ μμ ν λͺ¨λΈκ³Ό GGUF νμΌμ μ 곡ν©λλ€.
PEFT/LoRA μ΄λν°κ° νμνμ κ²½μ°: MyeongHo0621/Qwen2.5-3B-Korean-QLoRA
π― Key Features
- π°π· Korean Optimization: 200,000κ° κ³ νμ§ νκ΅μ΄ λν λ°μ΄ν°λ‘ νμ΅
- π¦ Ready-to-Use: LoRA λ³ν© μλ£, μ¦μ μ¬μ© κ°λ₯
- π Multi-Format: Safetensors (루νΈ) + GGUF (gguf/)
- π» All Frameworks: Transformers, vLLM, SGLang, Ollama, Llama.cpp
- βοΈ Apache 2.0: μμ μ μ¬μ© κ°λ₯
π¦ Available Formats
| Format | Path | Use Case | Size |
|---|---|---|---|
| Safetensors | / (루νΈ) |
Transformers, vLLM, SGLang | ~6GB |
| GGUF Q4_K_M | gguf/qwen25-3b-korean-Q4_K_M.gguf |
Ollama, Llama.cpp (κΆμ₯) | ~2GB |
| GGUF Q5_K_M | gguf/qwen25-3b-korean-Q5_K_M.gguf |
κ³ νμ§ | ~2.5GB |
| GGUF Q8_0 | gguf/qwen25-3b-korean-Q8_0.gguf |
μ΅κ³ νμ§ | ~3.5GB |
| GGUF F16 | gguf/qwen25-3b-korean-F16.gguf |
λ²€μΉλ§ν¬ | ~6GB |
π Quick Start
1οΈβ£ Transformers (κ°μ₯ κ°λ¨)
from transformers import AutoModelForCausalLM, AutoTokenizer
# λͺ¨λΈ λ‘λ© (Merged λͺ¨λΈ)
model = AutoModelForCausalLM.from_pretrained(
"MyeongHo0621/Qwen2.5-3B-Korean",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("MyeongHo0621/Qwen2.5-3B-Korean")
# μ±ν
ν
νλ¦Ώ μ¬μ©
messages = [
{"role": "system", "content": "You are a helpful Korean assistant."},
{"role": "user", "content": "νκ΅μ μλλ μ΄λμΈκ°μ?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
2οΈβ£ vLLM (Production Serving)
from vllm import LLM, SamplingParams
# Merged λͺ¨λΈ λ‘λ©
llm = LLM(
model="MyeongHo0621/Qwen2.5-3B-Korean",
quantization="bitsandbytes", # μ΅μ
: 4-bit μμν
gpu_memory_utilization=0.6
)
prompts = ["νκ΅μ μλλ μ΄λμΈκ°μ?"]
params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(prompts, params)
for output in outputs:
print(output.outputs[0].text)
Server Mode:
vllm serve MyeongHo0621/Qwen2.5-3B-Korean \
--quantization bitsandbytes \
--port 8000
3οΈβ£ SGLang (Fastest)
import sglang as sgl
runtime = sgl.Runtime(
model_path="MyeongHo0621/Qwen2.5-3B-Korean",
quantization="bitsandbytes"
)
sgl.set_default_backend(runtime)
@sgl.function
def chat(s, prompt):
s += sgl.user(prompt)
s += sgl.assistant(sgl.gen("response", max_tokens=512))
state = chat.run(prompt="νκ΅μ μλλ?")
print(state["response"])
4οΈβ£ Ollama (Local Desktop)
# 1. GGUF λ€μ΄λ‘λ
huggingface-cli download MyeongHo0621/Qwen2.5-3B-Korean \
gguf/qwen25-3b-korean-Q4_K_M.gguf \
--local-dir ./
# 2. Modelfile μμ±
cat > Modelfile << 'EOF'
FROM ./gguf/qwen25-3b-korean-Q4_K_M.gguf
TEMPLATE """<|im_start|>system
You are a helpful Korean assistant.<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER temperature 0.7
EOF
# 3. λͺ¨λΈ μμ± & μ€ν
ollama create qwen25-korean -f Modelfile
ollama run qwen25-korean "νκ΅μ μλλ?"
5οΈβ£ Llama.cpp (CPU/Edge)
# 1. GGUF λ€μ΄λ‘λ
huggingface-cli download MyeongHo0621/Qwen2.5-3B-Korean \
gguf/qwen25-3b-korean-Q4_K_M.gguf \
--local-dir ./
# 2. μΆλ‘ (GPU)
./llama.cpp/main \
-m ./gguf/qwen25-3b-korean-Q4_K_M.gguf \
-p "<|im_start|>user\nνκ΅μ μλλ?<|im_end|>\n<|im_start|>assistant\n" \
-n 512 \
--temp 0.7 \
-ngl 99
# 3. μΆλ‘ (CPU)
./llama.cpp/main \
-m ./gguf/qwen25-3b-korean-Q4_K_M.gguf \
-p "<|im_start|>user\nνκ΅μ μλλ?<|im_end|>\n<|im_start|>assistant\n" \
-n 512 \
-t 8
π§ Training Details
Dataset
- Source: MyeongHo0621/smol-koreantalk
- Samples: 200,000 νκ΅μ΄ λν μ
- Domain: μΌλ° λν, μ§μ μν, μ§μ Q&A
Training Configuration
| Hyperparameter | Value |
|---|---|
| Method | QLoRA (4-bit NF4) |
| LoRA Rank | 64 |
| LoRA Alpha | 128 |
| Learning Rate | 2e-4 |
| Batch Size | 128 (effective) |
| Epochs | 3 |
| Steps | 4689 |
| Max Length | 2048 |
π Repository Structure
MyeongHo0621/Qwen2.5-3B-Korean/
βββ config.json # λͺ¨λΈ μ€μ
βββ model.safetensors # Merged λͺ¨λΈ (~6GB)
βββ tokenizer.json # ν ν¬λμ΄μ
βββ tokenizer_config.json
βββ gguf/ # GGUF νμΌλ€
βββ qwen25-3b-korean-Q4_K_M.gguf (~2GB) β κΆμ₯
βββ qwen25-3b-korean-Q5_K_M.gguf (~2.5GB)
βββ qwen25-3b-korean-Q8_0.gguf (~3.5GB)
βββ qwen25-3b-korean-F16.gguf (~6GB)
π Related Repositories
- PEFT Adapter: MyeongHo0621/Qwen2.5-3B-Korean-QLoRA
- LoRA μ΄λν°λ§ νμν κ²½μ°
- νμΈνλ μ°κ΅¬μ©
- ~479MB (κ²½λ)
π Citation
@misc{qwen25-korean-2025,
author = {MyeongHo Shin},
title = {Qwen2.5-3B-Korean: Korean-Optimized Conversational Model},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/MyeongHo0621/Qwen2.5-3B-Korean}},
}
π Acknowledgments
- Base Model: Qwen2.5-3B-Instruct by Alibaba Cloud
- Dataset: smol-koreantalk
- Tools: Unsloth, PEFT, vLLM, SGLang, Llama.cpp
π Contact
- Author: MyeongHo Shin
- HuggingFace: @MyeongHo0621
βοΈ License
Apache 2.0 - μμ μ μ¬μ©, μμ , λ°°ν¬ κ°λ₯
Evaluation results
Benchmark Results
General Benchmarks
| Task | Score | Metric |
|---|---|---|
| gsm8k | 42.00% | acc |
| mmlu | 58.00% | acc |
| hellaswag | 71.00% | acc_norm |
| winogrande | 65.00% | acc |
| arc_easy | 78.00% | acc |
| arc_challenge | 48.00% | acc_norm |
Average Score: 60.33%
- Downloads last month
- 136