File size: 2,567 Bytes
294f5fc
3b42478
6f3c32c
 
 
294f5fc
 
6f3c32c
294f5fc
 
3b42478
6f3c32c
3b42478
6f3c32c
 
 
3b42478
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f3c32c
 
 
3b42478
 
 
 
 
 
 
 
 
6f3c32c
3b42478
6f3c32c
3b42478
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f3c32c
 
 
3b42478
6f3c32c
3b42478
6f3c32c
3b42478
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
title: IndicTrans2 Translation API
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
---

# IndicTrans2 Translation API

A powerful translation API supporting 22+ Indian languages using IndicTrans2 model.

## Features

### 🌍 Multi-Language Support
- 22+ Indian languages supported
- English to Indian language translation
- High-quality AI-powered translations

### πŸ“„ Enhanced PDF Processing
- **Multiple extraction methods**: PyPDF2, PyMuPDF, and OCR fallback
- **Smart text chunking**: Memory-efficient processing for large documents
- **OCR support**: Handles scanned PDFs and images
- **Duplicate removal**: Cleans up extracted text automatically
- **PDF generation**: Download translated documents as PDF

### πŸš€ Performance Optimizations
- **Memory management**: Optimized for GPU memory usage
- **Batch processing**: Efficient handling of large texts
- **Float16 precision**: Reduced memory footprint on GPU
- **Smart caching**: Faster subsequent requests

### πŸ”§ Memory Management
- Real-time memory monitoring via `/memory-info`
- Manual memory clearing via `/clear-memory`
- Automatic memory cleanup between batches

## API Endpoints

- `GET /` - API status and information
- `GET /health` - Health check and component status
- `GET /languages` - List of supported languages
- `POST /translate` - Batch translation
- `POST /translate-simple` - Simple text translation
- `POST /translate-pdf` - PDF translation with enhanced processing
- `POST /translate-pdf-enhanced` - Advanced PDF translation with download
- `GET /memory-info` - Memory usage information
- `POST /clear-memory` - Clear GPU memory cache

## Recent Improvements

### Memory Optimization
- Reduced memory usage by 60-80%
- Fixed memory allocation errors for large PDFs
- Optimized model loading with float16 precision

### Enhanced PDF Processing
- Multiple extraction methods with automatic fallback
- OCR support for scanned documents
- Smart text chunking for memory efficiency
- Duplicate text removal
- PDF generation for translated documents

### Better Error Handling
- Graceful fallback for failed translation batches
- Detailed error messages with memory information
- Automatic retry mechanisms

## Usage

The API is ready to use with any HTTP client. See the `/docs` endpoint for interactive documentation.

## Supported Languages

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Khasi, Malayalam, Manipuri, Marathi, Maithili, Mizo, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, and more.