|
--- |
|
title: Polish-English Translation (ByT5) |
|
emoji: π΅π±βοΈπ¬π§ |
|
colorFrom: red |
|
colorTo: blue |
|
sdk: gradio |
|
app_file: app.py |
|
license: cc-by-nc-sa-4.0 |
|
tags: |
|
- translation |
|
- text2text-generation |
|
- text generation |
|
- language translation |
|
- polish |
|
- english |
|
- byt5 |
|
- t5 |
|
- tokenizer-free |
|
- nlp |
|
- gradio |
|
sdk_version: 5.34.2 |
|
--- |
|
|
|
# Two-Way Polish π΅π±βοΈπ¬π§ English Translator with ByT5 |
|
|
|
This Space provides two-way translation between Polish and English using a single, powerful model: **Google's `byt5-300m`**. |
|
|
|
The key feature of this model is that it is **tokenizer-free**. It operates directly on raw text bytes (UTF-8) instead of relying on a fixed vocabulary. This makes it incredibly robust for translation, as it can handle any character, including: |
|
* Polish diacritics (`Δ
`, `Δ`, `Δ`, `Ε`, `Ε`, `Γ³`, `Ε`, `ΕΊ`, `ΕΌ`) |
|
* Emojis and special symbols |
|
* Typos or unusual spellings |
|
|
|
## How to Use |
|
|
|
1. **Enter your text:** Type or paste the text you want to translate. |
|
2. **Select the direction:** Choose either `English to Polish` or `Polish to English`. The application adds a special prefix to the text to tell the model which way to translate. |
|
3. **Click Submit:** The translated text will appear in the output box. |
|
|
|
## Model and Technical Details |
|
|
|
This application is powered by a fine-tuned version of the `google/byt5-300m` model. |
|
|
|
* **Model:** [google/byt5-300m](https://huggingface.co/google/byt5-300m) |
|
* **Architecture:** ByT5 (Byte-level T5) is a "tokenizer-free" model that processes text as a sequence of bytes. This eliminates "unknown token" errors and allows a single model to handle multiple tasks and languages flexibly. |
|
* **Method:** Two-way translation is achieved by prepending a task-specific prefix to the input before feeding it to the model (e.g., `translate English to Polish: Hello world!`). |
|
|
|
--- |
|
*Created by gregniuki* |