--- title: QuickMT Neural Machine Translation sdk: docker emoji: 🚀 app_port: 7860 pinned: true short_description: High-performance, self-hostable neural machine translation colorFrom: red colorTo: gray --- # `quickmt` Neural Machine Translation Inference Library ## REST Server Features - **Dynamic Batching**: Multiple concurrent HTTP requests are pooled together to maximize GPU utilization. - **Multi-Model Support**: Requests are routed to specific models based on `src_lang` and `tgt_lang`. - **LRU Cache**: Automatically loads and unloads models based on usage to manage memory. ## Installation ```bash pip install -r requirements.txt ``` ## Running the Web Application ```bash export MAX_LOADED_MODELS=3 export MAX_BATCH_SIZE=32 export DEVICE=cuda # or cpu export COMPUTE_TYPE=int8 # default, auto, int8, float16, etc. quickmt-gui ``` ## Running the REST Server ```bash export MAX_LOADED_MODELS=3 export MAX_BATCH_SIZE=32 export DEVICE=cuda # or cpu export COMPUTE_TYPE=int8 # default, auto, int8, float16, etc. quickmt-api ``` ## API Usage ### Translate ```bash curl -X POST http://localhost:8000/translate \ -H "Content-Type: application/json" \ -d '{"src":"Hello world","src_lang":null,"tgt_lang":"fr","beam_size":2,"patience":1,"length_penalty":1,"coverage_penalty":0,"repetition_penalty":1}' ``` Returns: ```json { "translation":"Bonjour tout le monde !", "src_lang":"en", "src_lang_score":0.16532786190509796, "tgt_lang":"fr", "processing_time":2.2334513664245605, "model_used":"quickmt/quickmt-en-fr" } ``` ## Load Testing with Locust To simulate a multi-user load: ```bash locust -f locustfile.py --host http://localhost:8000 ``` Then open http://localhost:8089 in your browser.