Qwen3-30B Vietnamese Instruct

Fine-tuned Qwen3-30B-A3B for Vietnamese instruction-following

This model is a Vietnamese-optimized version of Qwen3-30B-A3B, fine-tuned on 327K high-quality Vietnamese instruction samples using LoRA (Low-Rank Adaptation).

Model Description

  • Model Type: Large Language Model (Mixture-of-Experts)
  • Base Model: Qwen3-30B-A3B
  • Language: Vietnamese (primary), English (secondary)
  • Fine-tuning Method: LoRA (rank=64, alpha=128) with 4-bit quantization
  • Training Data: 327,113 Vietnamese instruction-response pairs
  • License: Apache 2.0
  • Developed by: Vietnamese LLM Project

Intended Use

This model is designed for Vietnamese natural language processing tasks, including:

  • Question Answering: Answer questions in Vietnamese
  • Instruction Following: Execute tasks described in Vietnamese
  • Conversational AI: Engage in multi-turn Vietnamese dialogues
  • Text Generation: Generate Vietnamese text based on prompts
  • Educational Applications: Tutoring, explanations, and knowledge sharing

Direct Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "danghuyhoang/qwen3-30b-vietnamese-instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("danghuyhoang/qwen3-30b-vietnamese-instruct")

messages = [
    {"role": "system", "content": "Bạn là trợ lý AI thông minh và hữu ích."},
    {"role": "user", "content": "Giải thích khái niệm machine learning bằng tiếng Việt."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Use with Unsloth (2-5x faster inference)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="danghuyhoang/qwen3-30b-vietnamese-instruct",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

FastLanguageModel.for_inference(model)  # Enable inference mode

messages = [
    {"role": "user", "content": "Việt Nam có bao nhiêu tỉnh thành?"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

outputs = model.generate(input_ids=inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0]))

Training Data

The model was fine-tuned on a diverse collection of Vietnamese instruction datasets:

Dataset Samples Source License
OpenOrca-Viet 121,178 5CD-AI Apache 2.0
VILM Instruction Subset VILM Project Open
Vietnamese UltraChat Subset 5CD-AI MIT
Bactrian-X Vietnamese Subset MBZUAI CC BY-NC 4.0
Vietnamese MATH 40,000 5CD-AI Apache 2.0
Multi-turn Chat 12,697 5CD-AI Apache 2.0

Total: 320,570 training samples + 6,543 validation samples

Data Preprocessing

  • All data converted to ChatML format
  • Vietnamese content validation
  • Token length filtering (max 3072 tokens)
  • Quality filtering and deduplication
  • 98/2 train/validation split

Training Details

Training Hyperparameters

  • Base Model: unsloth/qwen3-30b-a3b
  • Training Method: LoRA fine-tuning with 4-bit quantization
  • LoRA Configuration:
    • Rank (r): 64
    • Alpha: 128
    • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
    • Dropout: 0.0
  • Training Hyperparameters:
    • Epochs: 1
    • Batch size: 36 (per device)
    • Gradient accumulation: 1
    • Learning rate: 0.00012
    • Optimizer: AdamW 8-bit
    • LR Scheduler: Linear warmup (50 steps)
    • Max sequence length: 3072
    • Weight decay: 0.01
  • Hardware: 1× NVIDIA A100 80GB
  • Training Time: ~47 hours
  • Framework: Unsloth + Hugging Face Transformers

Training Loss

Training loss decreased steadily from ~1.5 to ~0.8 over 8,905 steps, indicating successful learning.

Optimization Techniques

  • Unsloth: 2-5x faster training with optimized CUDA kernels
  • Flash Attention 2: Memory-efficient attention computation
  • Gradient Checkpointing: Reduced memory usage
  • 4-bit Quantization: QLoRA for memory efficiency
  • Mixed Precision: bfloat16 for numerical stability

Evaluation

VMLU Benchmark (Vietnamese Multitask Language Understanding)

The model is evaluated on VMLU, a comprehensive Vietnamese benchmark with 744 questions across multiple subjects.

Evaluation Method: Logit-based scoring (industry standard, same as MMLU)

Metric Score
Overall Accuracy Coming Soon
STEM Coming Soon
Humanities Coming Soon
Social Sciences Coming Soon

Note: To reproduce evaluation, use the evaluation script from the GitHub repository.

Comparison with Base Model

Model VMLU Accuracy
Qwen3-30B-A3B (Base) Baseline
Qwen3-30B-Vietnamese (This model) Coming Soon

Limitations and Biases

Known Limitations

  1. Vietnamese-Specific: Optimized for Vietnamese, may have reduced performance on other languages
  2. Instruction Bias: Trained primarily on instruction-following data, may not excel at creative writing
  3. Factual Knowledge Cutoff: Based on Qwen3's training data (cutoff date unknown)
  4. Context Length: Trained with max 3072 tokens, performance may degrade on longer contexts
  5. Mathematical Reasoning: While improved, may still struggle with complex multi-step math
  6. Code Generation: Not specifically optimized for coding tasks

Potential Biases

  • Training data may reflect biases present in Vietnamese internet content
  • Instruction datasets may have geographic/cultural biases
  • Model may perform better on formal Vietnamese than colloquial speech
  • Limited exposure to Vietnamese dialects and regional variations

Ethical Considerations

  • Misinformation: Model may generate plausible but incorrect information
  • Harmful Content: Despite safety measures, model may occasionally produce inappropriate content
  • Privacy: Do not input personal or sensitive information
  • Transparency: Always disclose when content is AI-generated

Responsible Use

Recommended Practices

  • Verify factual claims from independent sources
  • Use human review for high-stakes applications
  • Implement content filtering for production deployments
  • Monitor outputs for bias and harmful content
  • Provide user disclosure about AI involvement

Not Recommended For

  • Medical, legal, or financial advice without expert review
  • Content moderation as sole decision-maker
  • High-stakes decision-making without human oversight
  • Generating content intended to deceive

Hardware Requirements

For Inference

Minimum:

  • GPU: RTX 4090 24GB (with 4-bit quantization)
  • RAM: 32GB
  • Disk: 100GB

Recommended:

  • GPU: A100 40GB or equivalent
  • RAM: 64GB
  • Disk: 150GB

For Fine-tuning

  • GPU: A100 80GB (for LoRA fine-tuning)
  • RAM: 128GB
  • Disk: 500GB
  • See training guide for optimal configurations

Technical Specifications

Model Architecture

  • Type: Mixture-of-Experts (MoE) Transformer
  • Experts: Multiple expert networks (A3B variant)
  • Parameters: ~30 billion (with sparse activation)
  • Hidden Size: 4096
  • Attention Heads: 32
  • Layers: 40
  • Vocabulary Size: 151,936 tokens
  • Context Length: 32,768 tokens (training limited to 3072)

File Sizes

  • Full Model: ~60GB (bfloat16)
  • 4-bit Quantized: ~20GB
  • LoRA Adapters Only: ~1-2GB

Citation

If you use this model, please cite:

@software{qwen3_vietnamese_2025,
  title = {Qwen3-30B Vietnamese Instruct},
  author = {Vietnamese LLM Project},
  year = {2025},
  url = {https://huggingface.co/danghuyhoang/qwen3-30b-vietnamese-instruct},
  license = {Apache-2.0}
}

Also cite the base Qwen3 model:

@article{qwen3_2024,
  title={Qwen3 Technical Report},
  author={Qwen Team},
  journal={arXiv preprint},
  year={2024}
}

Acknowledgments

This model was created using:

  • Qwen3-30B-A3B by Alibaba Cloud
  • Unsloth for optimized training
  • Vietnamese datasets from 5CD-AI, VILM, MBZUAI, and the Vietnamese NLP community

Model Card Contact

For questions or issues:


License: Apache 2.0

Last Updated: 2025-11-08

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for danghuyhoang/qwen3-30b-vietnamese-instruct

Finetuned
Qwen/Qwen3-30B-A3B
Adapter
(12)
this model

Dataset used to train danghuyhoang/qwen3-30b-vietnamese-instruct

Evaluation results