Qwen3-30B Vietnamese Instruct

Fine-tuned Qwen3-30B-A3B for Vietnamese instruction-following

This model is a Vietnamese-optimized version of Qwen3-30B-A3B, fine-tuned on 327K high-quality Vietnamese instruction samples using LoRA (Low-Rank Adaptation).

Model Description

Model Type: Large Language Model (Mixture-of-Experts)
Base Model: Qwen3-30B-A3B
Language: Vietnamese (primary), English (secondary)
Fine-tuning Method: LoRA (rank=64, alpha=128) with 4-bit quantization
Training Data: 327,113 Vietnamese instruction-response pairs
License: Apache 2.0
Developed by: Vietnamese LLM Project

Intended Use

This model is designed for Vietnamese natural language processing tasks, including:

Question Answering: Answer questions in Vietnamese
Instruction Following: Execute tasks described in Vietnamese
Conversational AI: Engage in multi-turn Vietnamese dialogues
Text Generation: Generate Vietnamese text based on prompts
Educational Applications: Tutoring, explanations, and knowledge sharing

Direct Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "danghuyhoang/qwen3-30b-vietnamese-instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("danghuyhoang/qwen3-30b-vietnamese-instruct")

messages = [
    {"role": "system", "content": "Bạn là trợ lý AI thông minh và hữu ích."},
    {"role": "user", "content": "Giải thích khái niệm machine learning bằng tiếng Việt."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Use with Unsloth (2-5x faster inference)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="danghuyhoang/qwen3-30b-vietnamese-instruct",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

FastLanguageModel.for_inference(model)  # Enable inference mode

messages = [
    {"role": "user", "content": "Việt Nam có bao nhiêu tỉnh thành?"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

outputs = model.generate(input_ids=inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0]))

Training Data

The model was fine-tuned on a diverse collection of Vietnamese instruction datasets:

Dataset	Samples	Source	License
OpenOrca-Viet	121,178	5CD-AI	Apache 2.0
VILM Instruction	Subset	VILM Project	Open
Vietnamese UltraChat	Subset	5CD-AI	MIT
Bactrian-X Vietnamese	Subset	MBZUAI	CC BY-NC 4.0
Vietnamese MATH	40,000	5CD-AI	Apache 2.0
Multi-turn Chat	12,697	5CD-AI	Apache 2.0

Total: 320,570 training samples + 6,543 validation samples

Data Preprocessing

All data converted to ChatML format
Vietnamese content validation
Token length filtering (max 3072 tokens)
Quality filtering and deduplication
98/2 train/validation split

Training Details

Training Hyperparameters

Base Model: unsloth/qwen3-30b-a3b
Training Method: LoRA fine-tuning with 4-bit quantization
LoRA Configuration:
- Rank (r): 64
- Alpha: 128
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Dropout: 0.0
Training Hyperparameters:
- Epochs: 1
- Batch size: 36 (per device)
- Gradient accumulation: 1
- Learning rate: 0.00012
- Optimizer: AdamW 8-bit
- LR Scheduler: Linear warmup (50 steps)
- Max sequence length: 3072
- Weight decay: 0.01
Hardware: 1× NVIDIA A100 80GB
Training Time: ~47 hours
Framework: Unsloth + Hugging Face Transformers

Training Loss

Training loss decreased steadily from ~1.5 to ~0.8 over 8,905 steps, indicating successful learning.

Optimization Techniques

Unsloth: 2-5x faster training with optimized CUDA kernels
Flash Attention 2: Memory-efficient attention computation
Gradient Checkpointing: Reduced memory usage
4-bit Quantization: QLoRA for memory efficiency
Mixed Precision: bfloat16 for numerical stability

Evaluation

VMLU Benchmark (Vietnamese Multitask Language Understanding)

The model is evaluated on VMLU, a comprehensive Vietnamese benchmark with 744 questions across multiple subjects.

Evaluation Method: Logit-based scoring (industry standard, same as MMLU)

Metric	Score
Overall Accuracy	Coming Soon
STEM	Coming Soon
Humanities	Coming Soon
Social Sciences	Coming Soon

Note: To reproduce evaluation, use the evaluation script from the GitHub repository.

Comparison with Base Model

Model	VMLU Accuracy
Qwen3-30B-A3B (Base)	Baseline
Qwen3-30B-Vietnamese (This model)	Coming Soon

Limitations and Biases

Known Limitations

Vietnamese-Specific: Optimized for Vietnamese, may have reduced performance on other languages
Instruction Bias: Trained primarily on instruction-following data, may not excel at creative writing
Factual Knowledge Cutoff: Based on Qwen3's training data (cutoff date unknown)
Context Length: Trained with max 3072 tokens, performance may degrade on longer contexts
Mathematical Reasoning: While improved, may still struggle with complex multi-step math
Code Generation: Not specifically optimized for coding tasks

Potential Biases

Training data may reflect biases present in Vietnamese internet content
Instruction datasets may have geographic/cultural biases
Model may perform better on formal Vietnamese than colloquial speech
Limited exposure to Vietnamese dialects and regional variations

Ethical Considerations

Misinformation: Model may generate plausible but incorrect information
Harmful Content: Despite safety measures, model may occasionally produce inappropriate content
Privacy: Do not input personal or sensitive information
Transparency: Always disclose when content is AI-generated

Responsible Use

Recommended Practices

Verify factual claims from independent sources
Use human review for high-stakes applications
Implement content filtering for production deployments
Monitor outputs for bias and harmful content
Provide user disclosure about AI involvement

Not Recommended For

Medical, legal, or financial advice without expert review
Content moderation as sole decision-maker
High-stakes decision-making without human oversight
Generating content intended to deceive

Hardware Requirements

For Inference

Minimum:

GPU: RTX 4090 24GB (with 4-bit quantization)
RAM: 32GB
Disk: 100GB

Recommended:

GPU: A100 40GB or equivalent
RAM: 64GB
Disk: 150GB

For Fine-tuning

GPU: A100 80GB (for LoRA fine-tuning)
RAM: 128GB
Disk: 500GB
See training guide for optimal configurations

Technical Specifications

Model Architecture

Type: Mixture-of-Experts (MoE) Transformer
Experts: Multiple expert networks (A3B variant)
Parameters: ~30 billion (with sparse activation)
Hidden Size: 4096
Attention Heads: 32
Layers: 40
Vocabulary Size: 151,936 tokens
Context Length: 32,768 tokens (training limited to 3072)

File Sizes

Full Model: ~60GB (bfloat16)
4-bit Quantized: ~20GB
LoRA Adapters Only: ~1-2GB

Citation

If you use this model, please cite:

@software{qwen3_vietnamese_2025,
  title = {Qwen3-30B Vietnamese Instruct},
  author = {Vietnamese LLM Project},
  year = {2025},
  url = {https://huggingface.co/danghuyhoang/qwen3-30b-vietnamese-instruct},
  license = {Apache-2.0}
}

Also cite the base Qwen3 model:

@article{qwen3_2024,
  title={Qwen3 Technical Report},
  author={Qwen Team},
  journal={arXiv preprint},
  year={2024}
}

Acknowledgments

This model was created using:

Qwen3-30B-A3B by Alibaba Cloud
Unsloth for optimized training
Vietnamese datasets from 5CD-AI, VILM, MBZUAI, and the Vietnamese NLP community