PdfTransic / README.md
Syedkaif29
Deploy hf_space_requirements_fix with latest improvements - Memory optimization, Enhanced PDF processing, OCR support
3b42478
metadata
title: IndicTrans2 Translation API
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit

IndicTrans2 Translation API

A powerful translation API supporting 22+ Indian languages using IndicTrans2 model.

Features

🌍 Multi-Language Support

  • 22+ Indian languages supported
  • English to Indian language translation
  • High-quality AI-powered translations

πŸ“„ Enhanced PDF Processing

  • Multiple extraction methods: PyPDF2, PyMuPDF, and OCR fallback
  • Smart text chunking: Memory-efficient processing for large documents
  • OCR support: Handles scanned PDFs and images
  • Duplicate removal: Cleans up extracted text automatically
  • PDF generation: Download translated documents as PDF

πŸš€ Performance Optimizations

  • Memory management: Optimized for GPU memory usage
  • Batch processing: Efficient handling of large texts
  • Float16 precision: Reduced memory footprint on GPU
  • Smart caching: Faster subsequent requests

πŸ”§ Memory Management

  • Real-time memory monitoring via /memory-info
  • Manual memory clearing via /clear-memory
  • Automatic memory cleanup between batches

API Endpoints

  • GET / - API status and information
  • GET /health - Health check and component status
  • GET /languages - List of supported languages
  • POST /translate - Batch translation
  • POST /translate-simple - Simple text translation
  • POST /translate-pdf - PDF translation with enhanced processing
  • POST /translate-pdf-enhanced - Advanced PDF translation with download
  • GET /memory-info - Memory usage information
  • POST /clear-memory - Clear GPU memory cache

Recent Improvements

Memory Optimization

  • Reduced memory usage by 60-80%
  • Fixed memory allocation errors for large PDFs
  • Optimized model loading with float16 precision

Enhanced PDF Processing

  • Multiple extraction methods with automatic fallback
  • OCR support for scanned documents
  • Smart text chunking for memory efficiency
  • Duplicate text removal
  • PDF generation for translated documents

Better Error Handling

  • Graceful fallback for failed translation batches
  • Detailed error messages with memory information
  • Automatic retry mechanisms

Usage

The API is ready to use with any HTTP client. See the /docs endpoint for interactive documentation.

Supported Languages

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Khasi, Malayalam, Manipuri, Marathi, Maithili, Mizo, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, and more.