Qwen3-30B Vietnamese Instruct
Fine-tuned Qwen3-30B-A3B for Vietnamese instruction-following
This model is a Vietnamese-optimized version of Qwen3-30B-A3B, fine-tuned on 327K high-quality Vietnamese instruction samples using LoRA (Low-Rank Adaptation).
Model Description
- Model Type: Large Language Model (Mixture-of-Experts)
- Base Model: Qwen3-30B-A3B
- Language: Vietnamese (primary), English (secondary)
- Fine-tuning Method: LoRA (rank=64, alpha=128) with 4-bit quantization
- Training Data: 327,113 Vietnamese instruction-response pairs
- License: Apache 2.0
- Developed by: Vietnamese LLM Project
Intended Use
This model is designed for Vietnamese natural language processing tasks, including:
- Question Answering: Answer questions in Vietnamese
- Instruction Following: Execute tasks described in Vietnamese
- Conversational AI: Engage in multi-turn Vietnamese dialogues
- Text Generation: Generate Vietnamese text based on prompts
- Educational Applications: Tutoring, explanations, and knowledge sharing
Direct Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"danghuyhoang/qwen3-30b-vietnamese-instruct",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("danghuyhoang/qwen3-30b-vietnamese-instruct")
messages = [
{"role": "system", "content": "Bạn là trợ lý AI thông minh và hữu Ãch."},
{"role": "user", "content": "Giải thÃch khái niệm machine learning bằng tiếng Việt."}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Use with Unsloth (2-5x faster inference)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="danghuyhoang/qwen3-30b-vietnamese-instruct",
max_seq_length=2048,
dtype=None,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model) # Enable inference mode
messages = [
{"role": "user", "content": "Việt Nam có bao nhiêu tỉnh thà nh?"}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0]))
Training Data
The model was fine-tuned on a diverse collection of Vietnamese instruction datasets:
| Dataset | Samples | Source | License |
|---|---|---|---|
| OpenOrca-Viet | 121,178 | 5CD-AI | Apache 2.0 |
| VILM Instruction | Subset | VILM Project | Open |
| Vietnamese UltraChat | Subset | 5CD-AI | MIT |
| Bactrian-X Vietnamese | Subset | MBZUAI | CC BY-NC 4.0 |
| Vietnamese MATH | 40,000 | 5CD-AI | Apache 2.0 |
| Multi-turn Chat | 12,697 | 5CD-AI | Apache 2.0 |
Total: 320,570 training samples + 6,543 validation samples
Data Preprocessing
- All data converted to ChatML format
- Vietnamese content validation
- Token length filtering (max 3072 tokens)
- Quality filtering and deduplication
- 98/2 train/validation split
Training Details
Training Hyperparameters
- Base Model: unsloth/qwen3-30b-a3b
- Training Method: LoRA fine-tuning with 4-bit quantization
- LoRA Configuration:
- Rank (r): 64
- Alpha: 128
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Dropout: 0.0
- Training Hyperparameters:
- Epochs: 1
- Batch size: 36 (per device)
- Gradient accumulation: 1
- Learning rate: 0.00012
- Optimizer: AdamW 8-bit
- LR Scheduler: Linear warmup (50 steps)
- Max sequence length: 3072
- Weight decay: 0.01
- Hardware: 1× NVIDIA A100 80GB
- Training Time: ~47 hours
- Framework: Unsloth + Hugging Face Transformers
Training Loss
Training loss decreased steadily from ~1.5 to ~0.8 over 8,905 steps, indicating successful learning.
Optimization Techniques
- Unsloth: 2-5x faster training with optimized CUDA kernels
- Flash Attention 2: Memory-efficient attention computation
- Gradient Checkpointing: Reduced memory usage
- 4-bit Quantization: QLoRA for memory efficiency
- Mixed Precision: bfloat16 for numerical stability
Evaluation
VMLU Benchmark (Vietnamese Multitask Language Understanding)
The model is evaluated on VMLU, a comprehensive Vietnamese benchmark with 744 questions across multiple subjects.
Evaluation Method: Logit-based scoring (industry standard, same as MMLU)
| Metric | Score |
|---|---|
| Overall Accuracy | Coming Soon |
| STEM | Coming Soon |
| Humanities | Coming Soon |
| Social Sciences | Coming Soon |
Note: To reproduce evaluation, use the evaluation script from the GitHub repository.
Comparison with Base Model
| Model | VMLU Accuracy |
|---|---|
| Qwen3-30B-A3B (Base) | Baseline |
| Qwen3-30B-Vietnamese (This model) | Coming Soon |
Limitations and Biases
Known Limitations
- Vietnamese-Specific: Optimized for Vietnamese, may have reduced performance on other languages
- Instruction Bias: Trained primarily on instruction-following data, may not excel at creative writing
- Factual Knowledge Cutoff: Based on Qwen3's training data (cutoff date unknown)
- Context Length: Trained with max 3072 tokens, performance may degrade on longer contexts
- Mathematical Reasoning: While improved, may still struggle with complex multi-step math
- Code Generation: Not specifically optimized for coding tasks
Potential Biases
- Training data may reflect biases present in Vietnamese internet content
- Instruction datasets may have geographic/cultural biases
- Model may perform better on formal Vietnamese than colloquial speech
- Limited exposure to Vietnamese dialects and regional variations
Ethical Considerations
- Misinformation: Model may generate plausible but incorrect information
- Harmful Content: Despite safety measures, model may occasionally produce inappropriate content
- Privacy: Do not input personal or sensitive information
- Transparency: Always disclose when content is AI-generated
Responsible Use
Recommended Practices
- Verify factual claims from independent sources
- Use human review for high-stakes applications
- Implement content filtering for production deployments
- Monitor outputs for bias and harmful content
- Provide user disclosure about AI involvement
Not Recommended For
- Medical, legal, or financial advice without expert review
- Content moderation as sole decision-maker
- High-stakes decision-making without human oversight
- Generating content intended to deceive
Hardware Requirements
For Inference
Minimum:
- GPU: RTX 4090 24GB (with 4-bit quantization)
- RAM: 32GB
- Disk: 100GB
Recommended:
- GPU: A100 40GB or equivalent
- RAM: 64GB
- Disk: 150GB
For Fine-tuning
- GPU: A100 80GB (for LoRA fine-tuning)
- RAM: 128GB
- Disk: 500GB
- See training guide for optimal configurations
Technical Specifications
Model Architecture
- Type: Mixture-of-Experts (MoE) Transformer
- Experts: Multiple expert networks (A3B variant)
- Parameters: ~30 billion (with sparse activation)
- Hidden Size: 4096
- Attention Heads: 32
- Layers: 40
- Vocabulary Size: 151,936 tokens
- Context Length: 32,768 tokens (training limited to 3072)
File Sizes
- Full Model: ~60GB (bfloat16)
- 4-bit Quantized: ~20GB
- LoRA Adapters Only: ~1-2GB
Citation
If you use this model, please cite:
@software{qwen3_vietnamese_2025,
title = {Qwen3-30B Vietnamese Instruct},
author = {Vietnamese LLM Project},
year = {2025},
url = {https://huggingface.co/danghuyhoang/qwen3-30b-vietnamese-instruct},
license = {Apache-2.0}
}
Also cite the base Qwen3 model:
@article{qwen3_2024,
title={Qwen3 Technical Report},
author={Qwen Team},
journal={arXiv preprint},
year={2024}
}
Acknowledgments
This model was created using:
- Qwen3-30B-A3B by Alibaba Cloud
- Unsloth for optimized training
- Vietnamese datasets from 5CD-AI, VILM, MBZUAI, and the Vietnamese NLP community
Model Card Contact
For questions or issues:
- GitHub: vietnamese-llm-finetuning
- Issues: GitHub Issues
- Discussions: GitHub Discussions
License: Apache 2.0
Last Updated: 2025-11-08