Spaces:

Syedkaif29
/

PdfTransic

Sleeping

File size: 2,567 Bytes

---
title: IndicTrans2 Translation API
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
---

# IndicTrans2 Translation API

A powerful translation API supporting 22+ Indian languages using IndicTrans2 model.

## Features

### 🌍 Multi-Language Support
- 22+ Indian languages supported
- English to Indian language translation
- High-quality AI-powered translations

### 📄 Enhanced PDF Processing
- **Multiple extraction methods**: PyPDF2, PyMuPDF, and OCR fallback
- **Smart text chunking**: Memory-efficient processing for large documents
- **OCR support**: Handles scanned PDFs and images
- **Duplicate removal**: Cleans up extracted text automatically
- **PDF generation**: Download translated documents as PDF

### 🚀 Performance Optimizations
- **Memory management**: Optimized for GPU memory usage
- **Batch processing**: Efficient handling of large texts
- **Float16 precision**: Reduced memory footprint on GPU
- **Smart caching**: Faster subsequent requests

### 🔧 Memory Management
- Real-time memory monitoring via `/memory-info`
- Manual memory clearing via `/clear-memory`
- Automatic memory cleanup between batches

## API Endpoints

- `GET /` - API status and information
- `GET /health` - Health check and component status
- `GET /languages` - List of supported languages
- `POST /translate` - Batch translation
- `POST /translate-simple` - Simple text translation
- `POST /translate-pdf` - PDF translation with enhanced processing
- `POST /translate-pdf-enhanced` - Advanced PDF translation with download
- `GET /memory-info` - Memory usage information
- `POST /clear-memory` - Clear GPU memory cache

## Recent Improvements

### Memory Optimization
- Reduced memory usage by 60-80%
- Fixed memory allocation errors for large PDFs
- Optimized model loading with float16 precision

### Enhanced PDF Processing
- Multiple extraction methods with automatic fallback
- OCR support for scanned documents
- Smart text chunking for memory efficiency
- Duplicate text removal
- PDF generation for translated documents

### Better Error Handling
- Graceful fallback for failed translation batches
- Detailed error messages with memory information
- Automatic retry mechanisms

## Usage

The API is ready to use with any HTTP client. See the `/docs` endpoint for interactive documentation.

## Supported Languages

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Khasi, Malayalam, Manipuri, Marathi, Maithili, Mizo, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, and more.