---
title: QuickMT Neural Machine Translation
sdk: docker
emoji: 🚀
app_port: 7860
pinned: true
short_description: High-performance, self-hostable neural machine translation
colorFrom: red
colorTo: gray
---
# `quickmt` Neural Machine Translation Inference Library

## REST Server Features

- **Dynamic Batching**: Multiple concurrent HTTP requests are pooled together to maximize GPU utilization.
- **Multi-Model Support**: Requests are routed to specific models based on `src_lang` and `tgt_lang`.
- **LRU Cache**: Automatically loads and unloads models based on usage to manage memory.


## Installation

```bash
pip install -r requirements.txt
```

## Running the Web Application

```bash
export MAX_LOADED_MODELS=3
export MAX_BATCH_SIZE=32
export DEVICE=cuda # or cpu
export COMPUTE_TYPE=int8 # default, auto, int8, float16, etc.
quickmt-gui
```


## Running the REST Server

```bash
export MAX_LOADED_MODELS=3
export MAX_BATCH_SIZE=32
export DEVICE=cuda # or cpu
export COMPUTE_TYPE=int8 # default, auto, int8, float16, etc.
quickmt-api
```


## API Usage

### Translate
```bash
curl -X POST http://localhost:8000/translate \
     -H "Content-Type: application/json" \
     -d '{"src":"Hello world","src_lang":null,"tgt_lang":"fr","beam_size":2,"patience":1,"length_penalty":1,"coverage_penalty":0,"repetition_penalty":1}'
```

Returns:
```json
{
  "translation":"Bonjour tout le monde !",
  "src_lang":"en",
  "src_lang_score":0.16532786190509796,
  "tgt_lang":"fr",
  "processing_time":2.2334513664245605,
  "model_used":"quickmt/quickmt-en-fr"
}
```

## Load Testing with Locust
To simulate a multi-user load:
```bash
locust -f locustfile.py --host http://localhost:8000
```
Then open http://localhost:8089 in your browser.