File size: 3,940 Bytes
180f838
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# `quickmt-zh-en` Neural Machine Translation Model 

# Usage

## Install `quickmt`

```bash
git clone https://github.com/quickmt/quickmt.git
pip install ./quickmt/
```

## Download model

```bash
quickmt-model-download quickmt/quickmt-zh-en ./quickmt-zh-en
```

## Use model

```python
from quickmt import Translator

# Auto-detects GPU, set to "cpu" to force CPU inference
t = Translator("./quickmt-zh-en/", device="auto")

# Translate - set beam size to 5 for higher quality (but slower speed)
t(["他补充道:“我们现在有 4 个月大没有糖尿病的老鼠,但它们曾经得过该病。”"], beam_size=1)

# Get alternative translations by sampling
# You can pass any cTranslate2 `translate_batch` arguments
t(["他补充道:“我们现在有 4 个月大没有糖尿病的老鼠,但它们曾经得过该病。”"], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
```

# Model Information

* Trained using [`eole`](https://github.com/eole-nlp/eole)
* Exported for fast inference to []CTranslate2](https://github.com/OpenNMT/CTranslate2) format
* Training data: https://huggingface.co/datasets/quickmt/quickmt-train.zh-en/tree/main

## Metrics

BLEU and CHRF2 calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the Flores200 `devtest` test set ("zho_Hans"->"eng_Latn").

| Model | bleu | chrf2 |
| ----  | ---- | ----  |
| quickmt/quickmt-zh-en      | 28.58 | 57.46 |
| Helsinki-NLP/opus-mt-zh-en | 23.35 | 53.60 |
| facebook/m2m100_418M | 18.96 | 50.06 |
| facebook/m2m100_1.2B | 24.68 | 54.68 |
| facebook/nllb-200-distilled-600M | 26.22 | 55.17 | 
| facebook/nllb-200-distilled-1.3B | 28.54 | 57.34 |
| google/madlad400-3b-mt | 28.74 | 58.01 | 

## Training Configuration

```yaml
## IO
save_data: zh_en/data_spm
overwrite: True
seed: 1234
report_every: 100
valid_metrics: ["BLEU"]
tensorboard: true
tensorboard_log_dir: tensorboard

### Vocab
src_vocab: zh-en/src.eole.vocab
tgt_vocab: zh-en/tgt.eole.vocab
src_vocab_size: 20000
tgt_vocab_size: 20000
vocab_size_multiple: 8
share_vocab: False
n_sample: 0

data:
    corpus_1:
        path_src: hf://quickmt/quickmt-train-zh-en/zh
        path_tgt: hf://quickmt/quickmt-train-zh-en/en
        path_sco: hf://quickmt/quickmt-train-zh-en/sco

    valid:
        path_src: zh-en/dev.zho
        path_tgt: zh-en/dev.eng

transforms: [sentencepiece, filtertoolong]
transforms_configs:
  sentencepiece:
    src_subword_model: "zh-en/src.spm.model"
    tgt_subword_model: "zh-en/tgt.spm.model"
  filtertoolong:
    src_seq_length: 512
    tgt_seq_length: 512

training:
    # Run configuration
    model_path: quickmt-zh-en
    keep_checkpoint: 4
    save_checkpoint_steps: 1000
    train_steps: 200000
    valid_steps: 1000
    
    # Train on a single GPU
    world_size: 1
    gpu_ranks: [0]

    # Batching
    batch_type: "tokens"
    batch_size: 13312
    valid_batch_size: 13312
    batch_size_multiple: 8
    accum_count: [4]
    accum_steps: [0]

    # Optimizer & Compute
    compute_dtype: "bfloat16"
    optim: "pagedadamw8bit"
    learning_rate: 1.0
    warmup_steps: 10000
    decay_method: "noam"
    adam_beta2: 0.998

    # Data loading
    bucket_size: 262144
    num_workers: 4
    prefetch_factor: 100

    # Hyperparams
    dropout_steps: [0]
    dropout: [0.1]
    attention_dropout: [0.1]
    max_grad_norm: 0
    label_smoothing: 0.1
    average_decay: 0.0001
    param_init_method: xavier_uniform
    normalization: "tokens"

model:
    architecture: "transformer"
    layer_norm: standard
    share_embeddings: false
    share_decoder_embeddings: true
    add_ffnbias: true
    mlp_activation_fn: gated-silu
    add_estimator: false
    add_qkvbias: false
    norm_eps: 1e-6
    hidden_size: 1024
    encoder:
        layers: 8
    decoder:
        layers: 2
    heads: 16
    transformer_ff: 4096
    embeddings:
        word_vec_size: 1024
        position_encoding_type: "SinusoidalInterleaved"
```