Upload folder using huggingface_hub
Browse files- .ipynb_checkpoints/README-checkpoint.md +98 -0
- README.md +98 -3
- config.json +10 -0
- eole-config.yaml +99 -0
- eole-model/config.json +133 -0
- eole-model/en.spm.model +3 -0
- eole-model/es.spm.model +3 -0
- eole-model/model.00.safetensors +3 -0
- eole-model/vocab.json +0 -0
- model.bin +3 -0
- source_vocabulary.json +0 -0
- src.spm.model +3 -0
- target_vocabulary.json +0 -0
- tgt.spm.model +3 -0
.ipynb_checkpoints/README-checkpoint.md
ADDED
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- es
|
5 |
+
tags:
|
6 |
+
- translation
|
7 |
+
license: cc-by-4.0
|
8 |
+
datasets:
|
9 |
+
- quickmt/quickmt-train.es-en
|
10 |
+
model-index:
|
11 |
+
- name: quickmt-es-en
|
12 |
+
results:
|
13 |
+
- task:
|
14 |
+
name: Translation spa-eng
|
15 |
+
type: translation
|
16 |
+
args: spa-eng
|
17 |
+
dataset:
|
18 |
+
name: flores101-devtest
|
19 |
+
type: flores_101
|
20 |
+
args: spa_Latn eng_Latn devtest
|
21 |
+
metrics:
|
22 |
+
- name: BLEU
|
23 |
+
type: bleu
|
24 |
+
value: 28.64
|
25 |
+
- name: CHRF
|
26 |
+
type: chrf
|
27 |
+
value: 58.61
|
28 |
+
- name: COMET
|
29 |
+
type: comet
|
30 |
+
value: 86.11
|
31 |
+
---
|
32 |
+
|
33 |
+
|
34 |
+
# `quickmt-es-en` Neural Machine Translation Model
|
35 |
+
|
36 |
+
`quickmt-es-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `es` into `en`.
|
37 |
+
|
38 |
+
|
39 |
+
## Model Information
|
40 |
+
|
41 |
+
* Trained using [`eole`](https://github.com/eole-nlp/eole)
|
42 |
+
* 185M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
|
43 |
+
* 50k joint Sentencepiece vocabulary
|
44 |
+
* Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
|
45 |
+
* Training data: https://huggingface.co/datasets/quickmt/quickmt-train.it-en/tree/main
|
46 |
+
|
47 |
+
See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
|
48 |
+
|
49 |
+
## Usage with `quickmt`
|
50 |
+
|
51 |
+
You must install the Nvidia cuda toolkit first, if you want to do GPU inference.
|
52 |
+
|
53 |
+
Next, install the `quickmt` python library and download the model:
|
54 |
+
|
55 |
+
```bash
|
56 |
+
git clone https://github.com/quickmt/quickmt.git
|
57 |
+
pip install ./quickmt/
|
58 |
+
|
59 |
+
quickmt-model-download quickmt/quickmt-es-en ./quickmt-es-en
|
60 |
+
```
|
61 |
+
|
62 |
+
Finally use the model in python:
|
63 |
+
|
64 |
+
```python
|
65 |
+
from quickmt import Translator
|
66 |
+
|
67 |
+
# Auto-detects GPU, set to "cpu" to force CPU inference
|
68 |
+
t = Translator("./quickmt-es-en/", device="auto")
|
69 |
+
|
70 |
+
# Translate - set beam size to 1 for faster speed (but lower quality)
|
71 |
+
sample_text = 'La investigación todavía se ubica en su etapa inicial, conforme indicara el Dr. Ehud Ur, docente en la carrera de medicina de la Universidad de Dalhousie, en Halifax, Nueva Escocia, y director del departamento clínico y científico de la Asociación Canadiense de Diabetes.'
|
72 |
+
t(sample_text, beam_size=5)
|
73 |
+
|
74 |
+
> 'The research is still in its early stages, as indicated by Dr. Ehud Ur, a medical professor at the University of Dalhousie, Halifax, Nova Scotia, and director of the clinical and scientific department of the Canadian Diabetes Association.'
|
75 |
+
|
76 |
+
# Get alternative translations by sampling
|
77 |
+
# You can pass any cTranslate2 `translate_batch` arguments
|
78 |
+
t([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
|
79 |
+
|
80 |
+
> 'The research is still in its initial stages as instructed by Dr. Ehud Ur, a professor at the medical degree, University of Dalhousie, Halifax, Nova Scotia, and director of the clinical and scientific department of the Canadian Diabetes Association.'
|
81 |
+
```
|
82 |
+
|
83 |
+
The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`.
|
84 |
+
|
85 |
+
|
86 |
+
## Metrics
|
87 |
+
|
88 |
+
`bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("spa_Latn"->"eng_Latn"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32 (faster speed is possible using a larger batch size).
|
89 |
+
|
90 |
+
| | bleu | chrf2 | comet22 | Time (s) |
|
91 |
+
|:---------------------------------|-------:|--------:|----------:|-----------:|
|
92 |
+
| quickmt/quickmt-es-en | 28.64 | 58.61 | 86.11 | 1.33 |
|
93 |
+
| Helsink-NLP/opus-mt-es-en | 27.62 | 58.38 | 86.01 | 3.67 |
|
94 |
+
| facebook/nllb-200-distilled-600M | 30.02 | 59.71 | 86.55 | 21.99 |
|
95 |
+
| facebook/nllb-200-distilled-1.3B | 31.58 | 60.96 | 87.25 | 38.2 |
|
96 |
+
| facebook/m2m100_418M | 22.85 | 55.04 | 82.9 | 18.83 |
|
97 |
+
| facebook/m2m100_1.2B | 26.84 | 57.69 | 85.47 | 36.22 |
|
98 |
+
|
README.md
CHANGED
@@ -1,3 +1,98 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- es
|
5 |
+
tags:
|
6 |
+
- translation
|
7 |
+
license: cc-by-4.0
|
8 |
+
datasets:
|
9 |
+
- quickmt/quickmt-train.es-en
|
10 |
+
model-index:
|
11 |
+
- name: quickmt-es-en
|
12 |
+
results:
|
13 |
+
- task:
|
14 |
+
name: Translation spa-eng
|
15 |
+
type: translation
|
16 |
+
args: spa-eng
|
17 |
+
dataset:
|
18 |
+
name: flores101-devtest
|
19 |
+
type: flores_101
|
20 |
+
args: spa_Latn eng_Latn devtest
|
21 |
+
metrics:
|
22 |
+
- name: BLEU
|
23 |
+
type: bleu
|
24 |
+
value: 28.64
|
25 |
+
- name: CHRF
|
26 |
+
type: chrf
|
27 |
+
value: 58.61
|
28 |
+
- name: COMET
|
29 |
+
type: comet
|
30 |
+
value: 86.11
|
31 |
+
---
|
32 |
+
|
33 |
+
|
34 |
+
# `quickmt-es-en` Neural Machine Translation Model
|
35 |
+
|
36 |
+
`quickmt-es-en` is a reasonably fast and reasonably accurate neural machine translation model for translation from `es` into `en`.
|
37 |
+
|
38 |
+
|
39 |
+
## Model Information
|
40 |
+
|
41 |
+
* Trained using [`eole`](https://github.com/eole-nlp/eole)
|
42 |
+
* 185M parameter transformer 'big' with 8 encoder layers and 2 decoder layers
|
43 |
+
* 50k joint Sentencepiece vocabulary
|
44 |
+
* Exported for fast inference to [CTranslate2](https://github.com/OpenNMT/CTranslate2) format
|
45 |
+
* Training data: https://huggingface.co/datasets/quickmt/quickmt-train.it-en/tree/main
|
46 |
+
|
47 |
+
See the `eole` model configuration in this repository for further details and the `eole-model` for the raw `eole` (pytorch) model.
|
48 |
+
|
49 |
+
## Usage with `quickmt`
|
50 |
+
|
51 |
+
You must install the Nvidia cuda toolkit first, if you want to do GPU inference.
|
52 |
+
|
53 |
+
Next, install the `quickmt` python library and download the model:
|
54 |
+
|
55 |
+
```bash
|
56 |
+
git clone https://github.com/quickmt/quickmt.git
|
57 |
+
pip install ./quickmt/
|
58 |
+
|
59 |
+
quickmt-model-download quickmt/quickmt-es-en ./quickmt-es-en
|
60 |
+
```
|
61 |
+
|
62 |
+
Finally use the model in python:
|
63 |
+
|
64 |
+
```python
|
65 |
+
from quickmt import Translator
|
66 |
+
|
67 |
+
# Auto-detects GPU, set to "cpu" to force CPU inference
|
68 |
+
t = Translator("./quickmt-es-en/", device="auto")
|
69 |
+
|
70 |
+
# Translate - set beam size to 1 for faster speed (but lower quality)
|
71 |
+
sample_text = 'La investigación todavía se ubica en su etapa inicial, conforme indicara el Dr. Ehud Ur, docente en la carrera de medicina de la Universidad de Dalhousie, en Halifax, Nueva Escocia, y director del departamento clínico y científico de la Asociación Canadiense de Diabetes.'
|
72 |
+
t(sample_text, beam_size=5)
|
73 |
+
|
74 |
+
> 'The research is still in its early stages, as indicated by Dr. Ehud Ur, a medical professor at the University of Dalhousie, Halifax, Nova Scotia, and director of the clinical and scientific department of the Canadian Diabetes Association.'
|
75 |
+
|
76 |
+
# Get alternative translations by sampling
|
77 |
+
# You can pass any cTranslate2 `translate_batch` arguments
|
78 |
+
t([sample_text], sampling_temperature=1.2, beam_size=1, sampling_topk=50, sampling_topp=0.9)
|
79 |
+
|
80 |
+
> 'The research is still in its initial stages as instructed by Dr. Ehud Ur, a professor at the medical degree, University of Dalhousie, Halifax, Nova Scotia, and director of the clinical and scientific department of the Canadian Diabetes Association.'
|
81 |
+
```
|
82 |
+
|
83 |
+
The model is in `ctranslate2` format, and the tokenizers are `sentencepiece`, so you can use `ctranslate2` directly instead of through `quickmt`. It is also possible to get this model to work with e.g. [LibreTranslate](https://libretranslate.com/) which also uses `ctranslate2` and `sentencepiece`.
|
84 |
+
|
85 |
+
|
86 |
+
## Metrics
|
87 |
+
|
88 |
+
`bleu` and `chrf2` are calculated with [sacrebleu](https://github.com/mjpost/sacrebleu) on the [Flores200 `devtest` test set](https://huggingface.co/datasets/facebook/flores) ("spa_Latn"->"eng_Latn"). `comet22` with the [`comet`](https://github.com/Unbabel/COMET) library and the [default model](https://huggingface.co/Unbabel/wmt22-comet-da). "Time (s)" is the time in seconds to translate the flores-devtest dataset (1012 sentences) on an RTX 4070s GPU with batch size 32 (faster speed is possible using a larger batch size).
|
89 |
+
|
90 |
+
| | bleu | chrf2 | comet22 | Time (s) |
|
91 |
+
|:---------------------------------|-------:|--------:|----------:|-----------:|
|
92 |
+
| quickmt/quickmt-es-en | 28.64 | 58.61 | 86.11 | 1.33 |
|
93 |
+
| Helsink-NLP/opus-mt-es-en | 27.62 | 58.38 | 86.01 | 3.67 |
|
94 |
+
| facebook/nllb-200-distilled-600M | 30.02 | 59.71 | 86.55 | 21.99 |
|
95 |
+
| facebook/nllb-200-distilled-1.3B | 31.58 | 60.96 | 87.25 | 38.2 |
|
96 |
+
| facebook/m2m100_418M | 22.85 | 55.04 | 82.9 | 18.83 |
|
97 |
+
| facebook/m2m100_1.2B | 26.84 | 57.69 | 85.47 | 36.22 |
|
98 |
+
|
config.json
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_source_bos": false,
|
3 |
+
"add_source_eos": false,
|
4 |
+
"bos_token": "<s>",
|
5 |
+
"decoder_start_token": "<s>",
|
6 |
+
"eos_token": "</s>",
|
7 |
+
"layer_norm_epsilon": 1e-06,
|
8 |
+
"multi_query_attention": false,
|
9 |
+
"unk_token": "<unk>"
|
10 |
+
}
|
eole-config.yaml
ADDED
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## IO
|
2 |
+
save_data: data
|
3 |
+
overwrite: True
|
4 |
+
seed: 1234
|
5 |
+
report_every: 100
|
6 |
+
valid_metrics: ["BLEU"]
|
7 |
+
tensorboard: true
|
8 |
+
tensorboard_log_dir: tensorboard
|
9 |
+
|
10 |
+
### Vocab
|
11 |
+
src_vocab: es.eole.vocab
|
12 |
+
tgt_vocab: en.eole.vocab
|
13 |
+
src_vocab_size: 20000
|
14 |
+
tgt_vocab_size: 20000
|
15 |
+
vocab_size_multiple: 8
|
16 |
+
share_vocab: false
|
17 |
+
n_sample: 0
|
18 |
+
|
19 |
+
data:
|
20 |
+
corpus_1:
|
21 |
+
# path_src: hf://quickmt/quickmt-train.es-en/es
|
22 |
+
# path_tgt: hf://quickmt/quickmt-train.es-en/en
|
23 |
+
# path_sco: hf://quickmt/quickmt-train.es-en/sco
|
24 |
+
path_src: train.es
|
25 |
+
path_tgt: train.en
|
26 |
+
valid:
|
27 |
+
path_src: dev.es
|
28 |
+
path_tgt: dev.en
|
29 |
+
|
30 |
+
transforms: [sentencepiece, filtertoolong]
|
31 |
+
transforms_configs:
|
32 |
+
sentencepiece:
|
33 |
+
src_subword_model: "es.spm.model"
|
34 |
+
tgt_subword_model: "en.spm.model"
|
35 |
+
filtertoolong:
|
36 |
+
src_seq_length: 256
|
37 |
+
tgt_seq_length: 256
|
38 |
+
|
39 |
+
training:
|
40 |
+
# Run configuration
|
41 |
+
model_path: quickmt-es-en-eole-model
|
42 |
+
train_from: quickmt-es-en-eole-model
|
43 |
+
#train_from: model
|
44 |
+
keep_checkpoint: 4
|
45 |
+
train_steps: 100000
|
46 |
+
save_checkpoint_steps: 5000
|
47 |
+
valid_steps: 5000
|
48 |
+
|
49 |
+
# Train on a single GPU
|
50 |
+
world_size: 1
|
51 |
+
gpu_ranks: [0]
|
52 |
+
|
53 |
+
# Batching 10240
|
54 |
+
batch_type: "tokens"
|
55 |
+
batch_size: 6400
|
56 |
+
valid_batch_size: 4096
|
57 |
+
batch_size_multiple: 8
|
58 |
+
accum_count: [12]
|
59 |
+
accum_steps: [0]
|
60 |
+
|
61 |
+
# Optimizer & Compute
|
62 |
+
compute_dtype: "fp16"
|
63 |
+
optim: "adamw"
|
64 |
+
#use_amp: False
|
65 |
+
learning_rate: 2.0
|
66 |
+
warmup_steps: 4000
|
67 |
+
decay_method: "noam"
|
68 |
+
adam_beta2: 0.998
|
69 |
+
|
70 |
+
# Data loading
|
71 |
+
bucket_size: 128000
|
72 |
+
num_workers: 4
|
73 |
+
prefetch_factor: 32
|
74 |
+
|
75 |
+
# Hyperparams
|
76 |
+
dropout_steps: [0]
|
77 |
+
dropout: [0.1]
|
78 |
+
attention_dropout: [0.1]
|
79 |
+
max_grad_norm: 0
|
80 |
+
label_smoothing: 0.1
|
81 |
+
average_decay: 0.0001
|
82 |
+
param_init_method: xavier_uniform
|
83 |
+
normalization: "tokens"
|
84 |
+
|
85 |
+
model:
|
86 |
+
architecture: "transformer"
|
87 |
+
share_embeddings: false
|
88 |
+
share_decoder_embeddings: false
|
89 |
+
hidden_size: 1024
|
90 |
+
encoder:
|
91 |
+
layers: 8
|
92 |
+
decoder:
|
93 |
+
layers: 2
|
94 |
+
heads: 8
|
95 |
+
transformer_ff: 4096
|
96 |
+
embeddings:
|
97 |
+
word_vec_size: 1024
|
98 |
+
position_encoding_type: "SinusoidalInterleaved"
|
99 |
+
|
eole-model/config.json
ADDED
@@ -0,0 +1,133 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"tgt_vocab": "en.eole.vocab",
|
3 |
+
"n_sample": 0,
|
4 |
+
"overwrite": true,
|
5 |
+
"valid_metrics": [
|
6 |
+
"BLEU"
|
7 |
+
],
|
8 |
+
"tgt_vocab_size": 20000,
|
9 |
+
"tensorboard": true,
|
10 |
+
"tensorboard_log_dir_dated": "tensorboard/Apr-28_20-08-59",
|
11 |
+
"vocab_size_multiple": 8,
|
12 |
+
"src_vocab_size": 20000,
|
13 |
+
"save_data": "data",
|
14 |
+
"share_vocab": false,
|
15 |
+
"src_vocab": "es.eole.vocab",
|
16 |
+
"transforms": [
|
17 |
+
"sentencepiece",
|
18 |
+
"filtertoolong"
|
19 |
+
],
|
20 |
+
"tensorboard_log_dir": "tensorboard",
|
21 |
+
"report_every": 100,
|
22 |
+
"seed": 1234,
|
23 |
+
"training": {
|
24 |
+
"average_decay": 0.0001,
|
25 |
+
"accum_steps": [
|
26 |
+
0
|
27 |
+
],
|
28 |
+
"accum_count": [
|
29 |
+
12
|
30 |
+
],
|
31 |
+
"attention_dropout": [
|
32 |
+
0.1
|
33 |
+
],
|
34 |
+
"train_steps": 100000,
|
35 |
+
"warmup_steps": 4000,
|
36 |
+
"normalization": "tokens",
|
37 |
+
"bucket_size": 128000,
|
38 |
+
"compute_dtype": "torch.float16",
|
39 |
+
"max_grad_norm": 0.0,
|
40 |
+
"batch_type": "tokens",
|
41 |
+
"valid_batch_size": 4096,
|
42 |
+
"optim": "adamw",
|
43 |
+
"world_size": 1,
|
44 |
+
"dropout_steps": [
|
45 |
+
0
|
46 |
+
],
|
47 |
+
"adam_beta2": 0.998,
|
48 |
+
"train_from": "quickmt-es-en-eole-model",
|
49 |
+
"gpu_ranks": [
|
50 |
+
0
|
51 |
+
],
|
52 |
+
"learning_rate": 2.0,
|
53 |
+
"num_workers": 0,
|
54 |
+
"dropout": [
|
55 |
+
0.1
|
56 |
+
],
|
57 |
+
"batch_size_multiple": 8,
|
58 |
+
"label_smoothing": 0.1,
|
59 |
+
"batch_size": 6400,
|
60 |
+
"model_path": "quickmt-es-en-eole-model",
|
61 |
+
"param_init_method": "xavier_uniform",
|
62 |
+
"keep_checkpoint": 4,
|
63 |
+
"prefetch_factor": 32,
|
64 |
+
"decay_method": "noam",
|
65 |
+
"valid_steps": 5000,
|
66 |
+
"save_checkpoint_steps": 5000
|
67 |
+
},
|
68 |
+
"model": {
|
69 |
+
"share_decoder_embeddings": false,
|
70 |
+
"transformer_ff": 4096,
|
71 |
+
"position_encoding_type": "SinusoidalInterleaved",
|
72 |
+
"heads": 8,
|
73 |
+
"share_embeddings": false,
|
74 |
+
"hidden_size": 1024,
|
75 |
+
"architecture": "transformer",
|
76 |
+
"decoder": {
|
77 |
+
"transformer_ff": 4096,
|
78 |
+
"decoder_type": "transformer",
|
79 |
+
"layers": 2,
|
80 |
+
"position_encoding_type": "SinusoidalInterleaved",
|
81 |
+
"heads": 8,
|
82 |
+
"n_positions": null,
|
83 |
+
"hidden_size": 1024,
|
84 |
+
"tgt_word_vec_size": 1024
|
85 |
+
},
|
86 |
+
"embeddings": {
|
87 |
+
"word_vec_size": 1024,
|
88 |
+
"position_encoding_type": "SinusoidalInterleaved",
|
89 |
+
"src_word_vec_size": 1024,
|
90 |
+
"tgt_word_vec_size": 1024
|
91 |
+
},
|
92 |
+
"encoder": {
|
93 |
+
"transformer_ff": 4096,
|
94 |
+
"layers": 8,
|
95 |
+
"position_encoding_type": "SinusoidalInterleaved",
|
96 |
+
"heads": 8,
|
97 |
+
"n_positions": null,
|
98 |
+
"encoder_type": "transformer",
|
99 |
+
"hidden_size": 1024,
|
100 |
+
"src_word_vec_size": 1024
|
101 |
+
}
|
102 |
+
},
|
103 |
+
"data": {
|
104 |
+
"corpus_1": {
|
105 |
+
"path_src": "train.es",
|
106 |
+
"path_tgt": "train.en",
|
107 |
+
"transforms": [
|
108 |
+
"sentencepiece",
|
109 |
+
"filtertoolong"
|
110 |
+
],
|
111 |
+
"path_align": null
|
112 |
+
},
|
113 |
+
"valid": {
|
114 |
+
"path_src": "dev.es",
|
115 |
+
"path_tgt": "dev.en",
|
116 |
+
"transforms": [
|
117 |
+
"sentencepiece",
|
118 |
+
"filtertoolong"
|
119 |
+
],
|
120 |
+
"path_align": null
|
121 |
+
}
|
122 |
+
},
|
123 |
+
"transforms_configs": {
|
124 |
+
"filtertoolong": {
|
125 |
+
"tgt_seq_length": 256,
|
126 |
+
"src_seq_length": 256
|
127 |
+
},
|
128 |
+
"sentencepiece": {
|
129 |
+
"tgt_subword_model": "${MODEL_PATH}/en.spm.model",
|
130 |
+
"src_subword_model": "${MODEL_PATH}/es.spm.model"
|
131 |
+
}
|
132 |
+
}
|
133 |
+
}
|
eole-model/en.spm.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c26488c6db0bdca05f0e9e8edf43e8bdb4f78fc5c41c51749f88aefa6a1d030b
|
3 |
+
size 593820
|
eole-model/es.spm.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:515603821dd149cb66b99febbe4bbb05b9c7819943621d1f66c28ca2270a47e9
|
3 |
+
size 603700
|
eole-model/model.00.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9986aa5e396869b44721a504f83752570705bc23adccaba4345724d6fd2fc5e3
|
3 |
+
size 823882912
|
eole-model/vocab.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:309bbb55ecb269d151a6cf72db8df85d8f28a5e79e0510f3b9cdcf2fdcac8cb8
|
3 |
+
size 401699775
|
source_vocabulary.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
src.spm.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:515603821dd149cb66b99febbe4bbb05b9c7819943621d1f66c28ca2270a47e9
|
3 |
+
size 603700
|
target_vocabulary.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tgt.spm.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c26488c6db0bdca05f0e9e8edf43e8bdb4f78fc5c41c51749f88aefa6a1d030b
|
3 |
+
size 593820
|