Spaces:

Syedkaif29
/

PdfTransic

Sleeping

App Files Files Community

PdfTransic / README.md

Syedkaif29

Deploy hf_space_requirements_fix with latest improvements - Memory optimization, Enhanced PDF processing, OCR support

3b42478 4 months ago

preview code

raw

history blame contribute delete

2.57 kB

metadata

title: IndicTrans2 Translation API
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit

IndicTrans2 Translation API

A powerful translation API supporting 22+ Indian languages using IndicTrans2 model.

Features

🌍 Multi-Language Support

22+ Indian languages supported
English to Indian language translation
High-quality AI-powered translations

📄 Enhanced PDF Processing

Multiple extraction methods: PyPDF2, PyMuPDF, and OCR fallback
Smart text chunking: Memory-efficient processing for large documents
OCR support: Handles scanned PDFs and images
Duplicate removal: Cleans up extracted text automatically
PDF generation: Download translated documents as PDF

🚀 Performance Optimizations

Memory management: Optimized for GPU memory usage
Batch processing: Efficient handling of large texts
Float16 precision: Reduced memory footprint on GPU
Smart caching: Faster subsequent requests

🔧 Memory Management

Real-time memory monitoring via /memory-info
Manual memory clearing via /clear-memory
Automatic memory cleanup between batches

API Endpoints

GET / - API status and information
GET /health - Health check and component status
GET /languages - List of supported languages
POST /translate - Batch translation
POST /translate-simple - Simple text translation
POST /translate-pdf - PDF translation with enhanced processing
POST /translate-pdf-enhanced - Advanced PDF translation with download
GET /memory-info - Memory usage information
POST /clear-memory - Clear GPU memory cache

Recent Improvements

Memory Optimization

Reduced memory usage by 60-80%
Fixed memory allocation errors for large PDFs
Optimized model loading with float16 precision

Enhanced PDF Processing

Multiple extraction methods with automatic fallback
OCR support for scanned documents
Smart text chunking for memory efficiency
Duplicate text removal
PDF generation for translated documents

Better Error Handling

Graceful fallback for failed translation batches
Detailed error messages with memory information
Automatic retry mechanisms

Usage

The API is ready to use with any HTTP client. See the /docs endpoint for interactive documentation.

Supported Languages

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Khasi, Malayalam, Manipuri, Marathi, Maithili, Mizo, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, and more.