File size: 1,882 Bytes
b5376cf 48dd11d b5376cf 48dd11d a289668 48dd11d 494f575 b5376cf 48dd11d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
---
title: Polish-English Translation (ByT5)
emoji: π΅π±βοΈπ¬π§
colorFrom: red
colorTo: blue
sdk: gradio
app_file: app.py
license: cc-by-nc-sa-4.0
tags:
- translation
- text2text-generation
- text generation
- language translation
- polish
- english
- byt5
- t5
- tokenizer-free
- nlp
- gradio
sdk_version: 5.34.2
---
# Two-Way Polish π΅π±βοΈπ¬π§ English Translator with ByT5
This Space provides two-way translation between Polish and English using a single, powerful model: **Google's `byt5-300m`**.
The key feature of this model is that it is **tokenizer-free**. It operates directly on raw text bytes (UTF-8) instead of relying on a fixed vocabulary. This makes it incredibly robust for translation, as it can handle any character, including:
* Polish diacritics (`Δ
`, `Δ`, `Δ`, `Ε`, `Ε`, `Γ³`, `Ε`, `ΕΊ`, `ΕΌ`)
* Emojis and special symbols
* Typos or unusual spellings
## How to Use
1. **Enter your text:** Type or paste the text you want to translate.
2. **Select the direction:** Choose either `English to Polish` or `Polish to English`. The application adds a special prefix to the text to tell the model which way to translate.
3. **Click Submit:** The translated text will appear in the output box.
## Model and Technical Details
This application is powered by a fine-tuned version of the `google/byt5-300m` model.
* **Model:** [google/byt5-300m](https://huggingface.co/google/byt5-300m)
* **Architecture:** ByT5 (Byte-level T5) is a "tokenizer-free" model that processes text as a sequence of bytes. This eliminates "unknown token" errors and allows a single model to handle multiple tasks and languages flexibly.
* **Method:** Two-way translation is achieved by prepending a task-specific prefix to the input before feeding it to the model (e.g., `translate English to Polish: Hello world!`).
---
*Created by gregniuki* |