Image Preprocessing Service

This service automatically processes various image formats during upload to ensure compatibility and optimal storage.

Overview

The ImagePreprocessor service automatically detects and converts various image formats to PNG or JPEG before storing them in the system. This ensures that all images are in a standard, web-compatible format.

Supported Input Formats

Direct Storage (No Preprocessing)

PNG (image/png) - Already optimal format
JPEG (image/jpeg, image/jpg) - Already optimal format

Formats Requiring Preprocessing

HEIC/HEIF Files

Input: HEIC/HEIF files from modern smartphones
Processing: Convert to RGB and flatten alpha channel
Output: PNG or JPEG

WebP Files

Input: WebP format (Google's web image format)
Processing: Convert to RGB and flatten alpha channel
Output: PNG or JPEG

GIF Files

Input: GIF files (static or animated)
Processing: Extract first frame for animated GIFs, convert to RGB
Output: PNG or JPEG

TIFF/GeoTIFF Files

Input: TIFF or GeoTIFF files
Processing: Render RGB view, handle various color spaces
Output: PNG or JPEG

PDF Files

Input: PDF documents
Processing: Rasterize first page at 2x zoom for quality
Output: PNG or JPEG
Performance Note: PDF processing is inherently slower due to complex format parsing and rasterization

How It Works

1. MIME Type Detection

The service first detects the file format using:

File extension analysis
File signature (magic bytes) detection
Fallback to generic binary if unknown

2. Preprocessing Decision

If format is already PNG/JPEG → No processing needed
If format requires conversion → Apply appropriate processor

3. Format Conversion

Each format has a specialized processor that:

Opens the file using appropriate library (PIL, PyMuPDF)
Converts to RGB color space
Flattens alpha channels
Optimizes output quality
Generates new filename with correct extension

4. Storage

Processed image is stored with new filename
Original filename is preserved in metadata
SHA256 hash is calculated from processed content

Integration Points

Upload Endpoint (`/api/images/`)

All file uploads go through preprocessing
Supports drag & drop and file picker
Handles both crisis maps and drone imagery

Contribution Endpoint (`/api/contribute/from-url`)

Images contributed from existing URLs are also preprocessed
Ensures consistency across all image sources

Configuration

Target Format

Default: PNG (better quality, lossless)
Alternative: JPEG (smaller file size, lossy)
Quality: 95% for JPEG (configurable)

Error Handling

If preprocessing fails, falls back to original content
Logs errors for debugging
Continues upload process

Dependencies

Pillow (PIL): Core image processing
PyMuPDF: PDF rasterization
Python standard library: MIME type detection, file handling

Benefits

Format Consistency: All stored images are in web-compatible formats
Quality Assurance: Automatic optimization and color space conversion
User Experience: Users can upload any common image format
Storage Efficiency: Optimized file sizes and formats
Compatibility: Ensures images work across all platforms and browsers

Example Usage

from app.services.image_preprocessor import ImagePreprocessor

# Process an image
processed_content, new_filename, mime_type = ImagePreprocessor.preprocess_image(
    file_content,
    "original.heic",
    target_format='PNG',
    quality=95
)

# Check if preprocessing is needed
if ImagePreprocessor.needs_preprocessing(mime_type):
    print(f"Converting {mime_type} to PNG...")

Error Handling

The service gracefully handles errors:

Unsupported formats: Falls back to generic processing
Corrupted files: Logs error and continues with original
Processing failures: Maintains upload functionality
Memory issues: Handles large files efficiently

Performance Considerations

PDF Processing Performance

PDF conversion is the most computationally expensive operation due to:

Complex Format: PDFs require parsing, interpretation, and rendering
Rasterization: Vector-to-pixel conversion is CPU-intensive
Memory Usage: Large PDFs can consume significant memory
Quality vs Speed: Higher zoom factors increase quality but decrease speed

Performance Tuning Options

from app.services.image_preprocessor import ImagePreprocessor

# Fast mode - lower quality, much faster
ImagePreprocessor.configure_pdf_processing(quality_mode='fast')

# Balanced mode - good quality, reasonable speed (default)
ImagePreprocessor.configure_pdf_processing(quality_mode='balanced')

# Quality mode - highest quality, slower processing
ImagePreprocessor.configure_pdf_processing(quality_mode='quality')

# Custom configuration
ImagePreprocessor.configure_pdf_processing(
    zoom_factor=1.2,      # Lower zoom = faster
    compress_level=4,     # Lower compression = faster
    quality_mode='balanced'
)

Expected Processing Times

Small PDFs (<1MB): 2-5 seconds
Medium PDFs (1-5MB): 5-15 seconds
Large PDFs (5-25MB): 15-60 seconds
Complex PDFs: May take longer due to graphics complexity

Future Enhancements

Batch processing: Process multiple images simultaneously
Format preferences: User-configurable output formats
Quality settings: Adjustable compression levels
Metadata preservation: Keep EXIF and other metadata
Progressive processing: Stream large files

Image Preprocessing Service

Overview

Supported Input Formats

Direct Storage (No Preprocessing)

Formats Requiring Preprocessing

HEIC/HEIF Files

WebP Files

GIF Files

TIFF/GeoTIFF Files

PDF Files

How It Works

1. MIME Type Detection

2. Preprocessing Decision

3. Format Conversion

4. Storage

Integration Points

Upload Endpoint (/api/images/)

Contribution Endpoint (/api/contribute/from-url)

Configuration

Target Format

Error Handling

Dependencies

Benefits

Example Usage

Error Handling

Performance Considerations

PDF Processing Performance

Performance Tuning Options

Expected Processing Times

Future Enhancements

Upload Endpoint (`/api/images/`)

Contribution Endpoint (`/api/contribute/from-url`)