File size: 1,045 Bytes
a3b1dbb 2b6e372 3b35b58 ee91bc5 3b35b58 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
---
license: apache-2.0
datasets:
- lecslab/glosslm-corpus-split
metrics:
- accuracy
- chrf
- bleu
base_model:
- google/byt5-base
library_name: transformers
---
- Repo: https://github.com/foltaProject/glosslm
- Paper: https://arxiv.org/abs/2403.06399
Usage:
```python
import transformers
# Your inputs
transcription = "o sey xtok rixoqiil"
translation = "O sea busca esposa."
lang = "Uspanteco"
metalang = "Spanish"
is_segmented = False
prompt = f"""Provide the glosses for the following transcription in {lang}.
Transcription in {lang}: {transcription}
Transcription segmented: {is_segmented}
Translation in {metalang}: {translation}\n
Glosses:
"""
model = transformers.T5ForConditionalGeneration.from_pretrained("lecslab/glosslm")
tokenizer = transformers.ByT5Tokenizer.from_pretrained(
"google/byt5-base", use_fast=False
)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = tokenizer.batch_decode(
model.generate(**inputs, max_length=1024), skip_special_tokens=True
)
print(outputs[0])
# o sea COM-buscar E3S-esposa
``` |