translate / README.md
Gregniuki's picture
Update README.md
a289668 verified
---
title: Polish-English Translation (ByT5)
emoji: πŸ‡΅πŸ‡±β†”οΈπŸ‡¬πŸ‡§
colorFrom: red
colorTo: blue
sdk: gradio
app_file: app.py
license: cc-by-nc-sa-4.0
tags:
- translation
- text2text-generation
- text generation
- language translation
- polish
- english
- byt5
- t5
- tokenizer-free
- nlp
- gradio
sdk_version: 5.34.2
---
# Two-Way Polish πŸ‡΅πŸ‡±β†”οΈπŸ‡¬πŸ‡§ English Translator with ByT5
This Space provides two-way translation between Polish and English using a single, powerful model: **Google's `byt5-300m`**.
The key feature of this model is that it is **tokenizer-free**. It operates directly on raw text bytes (UTF-8) instead of relying on a fixed vocabulary. This makes it incredibly robust for translation, as it can handle any character, including:
* Polish diacritics (`Δ…`, `Δ‡`, `Δ™`, `Ε‚`, `Ε„`, `Γ³`, `Ε›`, `ΕΊ`, `ΕΌ`)
* Emojis and special symbols
* Typos or unusual spellings
## How to Use
1. **Enter your text:** Type or paste the text you want to translate.
2. **Select the direction:** Choose either `English to Polish` or `Polish to English`. The application adds a special prefix to the text to tell the model which way to translate.
3. **Click Submit:** The translated text will appear in the output box.
## Model and Technical Details
This application is powered by a fine-tuned version of the `google/byt5-300m` model.
* **Model:** [google/byt5-300m](https://huggingface.co/google/byt5-300m)
* **Architecture:** ByT5 (Byte-level T5) is a "tokenizer-free" model that processes text as a sequence of bytes. This eliminates "unknown token" errors and allows a single model to handle multiple tasks and languages flexibly.
* **Method:** Two-way translation is achieved by prepending a task-specific prefix to the input before feeding it to the model (e.g., `translate English to Polish: Hello world!`).
---
*Created by gregniuki*