---
language:
- vi
- en
license: apache-2.0
base_model: Qwen/Qwen3-30B-A3B
tags:
- vietnamese
- qwen
- instruction-tuning
- lora
- text-generation
- conversational
- qlora
- unsloth
pipeline_tag: text-generation
library_name: transformers
datasets:
- 5CD-AI/Vietnamese-openorca-2
- vilm/viet-instruct-v2
- 5CD-AI/Vietnamese-UltraChat
- MBZUAI/Bactrian-X
model-index:
- name: Qwen3-30B Vietnamese Instruct
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: VMLU
      type: tridm/VMLU
    metrics:
    - type: accuracy
      name: VMLU Accuracy
      value: TBD
---

# Qwen3-30B Vietnamese Instruct

**Fine-tuned Qwen3-30B-A3B for Vietnamese instruction-following**

This model is a Vietnamese-optimized version of Qwen3-30B-A3B, fine-tuned on 327K high-quality Vietnamese instruction samples using LoRA (Low-Rank Adaptation).

## Model Description

- **Model Type**: Large Language Model (Mixture-of-Experts)
- **Base Model**: [Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B)
- **Language**: Vietnamese (primary), English (secondary)
- **Fine-tuning Method**: LoRA (rank=64, alpha=128) with 4-bit quantization
- **Training Data**: 327,113 Vietnamese instruction-response pairs
- **License**: Apache 2.0
- **Developed by**: Vietnamese LLM Project

## Intended Use

This model is designed for Vietnamese natural language processing tasks, including:

- **Question Answering**: Answer questions in Vietnamese
- **Instruction Following**: Execute tasks described in Vietnamese
- **Conversational AI**: Engage in multi-turn Vietnamese dialogues
- **Text Generation**: Generate Vietnamese text based on prompts
- **Educational Applications**: Tutoring, explanations, and knowledge sharing

### Direct Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "danghuyhoang/qwen3-30b-vietnamese-instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("danghuyhoang/qwen3-30b-vietnamese-instruct")

messages = [
    {"role": "system", "content": "Bạn là trợ lý AI thông minh và hữu ích."},
    {"role": "user", "content": "Giải thích khái niệm machine learning bằng tiếng Việt."}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Use with Unsloth (2-5x faster inference)

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="danghuyhoang/qwen3-30b-vietnamese-instruct",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

FastLanguageModel.for_inference(model)  # Enable inference mode

messages = [
    {"role": "user", "content": "Việt Nam có bao nhiêu tỉnh thành?"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

outputs = model.generate(input_ids=inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0]))
```

## Training Data

The model was fine-tuned on a diverse collection of Vietnamese instruction datasets:

| Dataset | Samples | Source | License |
|---------|---------|--------|---------|
| OpenOrca-Viet | 121,178 | 5CD-AI | Apache 2.0 |
| VILM Instruction | Subset | VILM Project | Open |
| Vietnamese UltraChat | Subset | 5CD-AI | MIT |
| Bactrian-X Vietnamese | Subset | MBZUAI | CC BY-NC 4.0 |
| Vietnamese MATH | 40,000 | 5CD-AI | Apache 2.0 |
| Multi-turn Chat | 12,697 | 5CD-AI | Apache 2.0 |

**Total**: 320,570 training samples + 6,543 validation samples

### Data Preprocessing

- All data converted to ChatML format
- Vietnamese content validation
- Token length filtering (max 3072 tokens)
- Quality filtering and deduplication
- 98/2 train/validation split

## Training Details

### Training Hyperparameters

- **Base Model**: unsloth/qwen3-30b-a3b
- **Training Method**: LoRA fine-tuning with 4-bit quantization
- **LoRA Configuration**:
  - Rank (r): 64
  - Alpha: 128
  - Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  - Dropout: 0.0
- **Training Hyperparameters**:
  - Epochs: 1
  - Batch size: 36 (per device)
  - Gradient accumulation: 1
  - Learning rate: 0.00012
  - Optimizer: AdamW 8-bit
  - LR Scheduler: Linear warmup (50 steps)
  - Max sequence length: 3072
  - Weight decay: 0.01
- **Hardware**: 1× NVIDIA A100 80GB
- **Training Time**: ~47 hours
- **Framework**: Unsloth + Hugging Face Transformers

### Training Loss

Training loss decreased steadily from ~1.5 to ~0.8 over 8,905 steps, indicating successful learning.

### Optimization Techniques

- **Unsloth**: 2-5x faster training with optimized CUDA kernels
- **Flash Attention 2**: Memory-efficient attention computation
- **Gradient Checkpointing**: Reduced memory usage
- **4-bit Quantization**: QLoRA for memory efficiency
- **Mixed Precision**: bfloat16 for numerical stability

## Evaluation

### VMLU Benchmark (Vietnamese Multitask Language Understanding)

The model is evaluated on VMLU, a comprehensive Vietnamese benchmark with 744 questions across multiple subjects.

**Evaluation Method**: Logit-based scoring (industry standard, same as MMLU)

| Metric | Score |
|--------|-------|
| Overall Accuracy | Coming Soon |
| STEM | Coming Soon |
| Humanities | Coming Soon |
| Social Sciences | Coming Soon |

**Note**: To reproduce evaluation, use the evaluation script from the [GitHub repository](https://github.com/andreidhoang/vietnamese-llm-finetuning).

### Comparison with Base Model

| Model | VMLU Accuracy |
|-------|---------------|
| Qwen3-30B-A3B (Base) | Baseline |
| Qwen3-30B-Vietnamese (This model) | Coming Soon |

## Limitations and Biases

### Known Limitations

1. **Vietnamese-Specific**: Optimized for Vietnamese, may have reduced performance on other languages
2. **Instruction Bias**: Trained primarily on instruction-following data, may not excel at creative writing
3. **Factual Knowledge Cutoff**: Based on Qwen3's training data (cutoff date unknown)
4. **Context Length**: Trained with max 3072 tokens, performance may degrade on longer contexts
5. **Mathematical Reasoning**: While improved, may still struggle with complex multi-step math
6. **Code Generation**: Not specifically optimized for coding tasks

### Potential Biases

- Training data may reflect biases present in Vietnamese internet content
- Instruction datasets may have geographic/cultural biases
- Model may perform better on formal Vietnamese than colloquial speech
- Limited exposure to Vietnamese dialects and regional variations

### Ethical Considerations

- **Misinformation**: Model may generate plausible but incorrect information
- **Harmful Content**: Despite safety measures, model may occasionally produce inappropriate content
- **Privacy**: Do not input personal or sensitive information
- **Transparency**: Always disclose when content is AI-generated

## Responsible Use

### Recommended Practices

- Verify factual claims from independent sources
- Use human review for high-stakes applications
- Implement content filtering for production deployments
- Monitor outputs for bias and harmful content
- Provide user disclosure about AI involvement

### Not Recommended For

- Medical, legal, or financial advice without expert review
- Content moderation as sole decision-maker
- High-stakes decision-making without human oversight
- Generating content intended to deceive

## Hardware Requirements

### For Inference

**Minimum**:
- GPU: RTX 4090 24GB (with 4-bit quantization)
- RAM: 32GB
- Disk: 100GB

**Recommended**:
- GPU: A100 40GB or equivalent
- RAM: 64GB
- Disk: 150GB

### For Fine-tuning

- GPU: A100 80GB (for LoRA fine-tuning)
- RAM: 128GB
- Disk: 500GB
- See training guide for optimal configurations

## Technical Specifications

### Model Architecture

- **Type**: Mixture-of-Experts (MoE) Transformer
- **Experts**: Multiple expert networks (A3B variant)
- **Parameters**: ~30 billion (with sparse activation)
- **Hidden Size**: 4096
- **Attention Heads**: 32
- **Layers**: 40
- **Vocabulary Size**: 151,936 tokens
- **Context Length**: 32,768 tokens (training limited to 3072)

### File Sizes

- **Full Model**: ~60GB (bfloat16)
- **4-bit Quantized**: ~20GB
- **LoRA Adapters Only**: ~1-2GB

## Citation

If you use this model, please cite:

```bibtex
@software{qwen3_vietnamese_2025,
  title = {Qwen3-30B Vietnamese Instruct},
  author = {Vietnamese LLM Project},
  year = {2025},
  url = {https://huggingface.co/danghuyhoang/qwen3-30b-vietnamese-instruct},
  license = {Apache-2.0}
}
```

Also cite the base Qwen3 model:

```bibtex
@article{qwen3_2024,
  title={Qwen3 Technical Report},
  author={Qwen Team},
  journal={arXiv preprint},
  year={2024}
}
```

## Acknowledgments

This model was created using:

- **[Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B)** by Alibaba Cloud
- **[Unsloth](https://github.com/unslothai/unsloth)** for optimized training
- **Vietnamese datasets** from 5CD-AI, VILM, MBZUAI, and the Vietnamese NLP community

## Model Card Contact

For questions or issues:

- **GitHub**: [vietnamese-llm-finetuning](https://github.com/andreidhoang/vietnamese-llm-finetuning)
- **Issues**: [GitHub Issues](https://github.com/andreidhoang/vietnamese-llm-finetuning/issues)
- **Discussions**: [GitHub Discussions](https://github.com/andreidhoang/vietnamese-llm-finetuning/discussions)

---

**License**: Apache 2.0

**Last Updated**: 2025-11-08