File size: 5,716 Bytes
84aedaf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# Image Preprocessing Service

This service automatically processes various image formats during upload to ensure compatibility and optimal storage.

## Overview

The `ImagePreprocessor` service automatically detects and converts various image formats to PNG or JPEG before storing them in the system. This ensures that all images are in a standard, web-compatible format.

## Supported Input Formats

### Direct Storage (No Preprocessing)
- **PNG** (`image/png`) - Already optimal format
- **JPEG** (`image/jpeg`, `image/jpg`) - Already optimal format

### Formats Requiring Preprocessing

#### HEIC/HEIF Files
- **Input**: HEIC/HEIF files from modern smartphones
- **Processing**: Convert to RGB and flatten alpha channel
- **Output**: PNG or JPEG

#### WebP Files
- **Input**: WebP format (Google's web image format)
- **Processing**: Convert to RGB and flatten alpha channel
- **Output**: PNG or JPEG

#### GIF Files
- **Input**: GIF files (static or animated)
- **Processing**: Extract first frame for animated GIFs, convert to RGB
- **Output**: PNG or JPEG

#### TIFF/GeoTIFF Files
- **Input**: TIFF or GeoTIFF files
- **Processing**: Render RGB view, handle various color spaces
- **Output**: PNG or JPEG

#### PDF Files
- **Input**: PDF documents
- **Processing**: Rasterize first page at 2x zoom for quality
- **Output**: PNG or JPEG
- **Performance Note**: PDF processing is inherently slower due to complex format parsing and rasterization

## How It Works

### 1. MIME Type Detection
The service first detects the file format using:
- File extension analysis
- File signature (magic bytes) detection
- Fallback to generic binary if unknown

### 2. Preprocessing Decision
- If format is already PNG/JPEG β†’ No processing needed
- If format requires conversion β†’ Apply appropriate processor

### 3. Format Conversion
Each format has a specialized processor that:
- Opens the file using appropriate library (PIL, PyMuPDF)
- Converts to RGB color space
- Flattens alpha channels
- Optimizes output quality
- Generates new filename with correct extension

### 4. Storage
- Processed image is stored with new filename
- Original filename is preserved in metadata
- SHA256 hash is calculated from processed content

## Integration Points

### Upload Endpoint (`/api/images/`)
- All file uploads go through preprocessing
- Supports drag & drop and file picker
- Handles both crisis maps and drone imagery

### Contribution Endpoint (`/api/contribute/from-url`)
- Images contributed from existing URLs are also preprocessed
- Ensures consistency across all image sources

## Configuration

### Target Format
- **Default**: PNG (better quality, lossless)
- **Alternative**: JPEG (smaller file size, lossy)
- **Quality**: 95% for JPEG (configurable)

### Error Handling
- If preprocessing fails, falls back to original content
- Logs errors for debugging
- Continues upload process

## Dependencies

- **Pillow (PIL)**: Core image processing
- **PyMuPDF**: PDF rasterization
- **Python standard library**: MIME type detection, file handling

## Benefits

1. **Format Consistency**: All stored images are in web-compatible formats
2. **Quality Assurance**: Automatic optimization and color space conversion
3. **User Experience**: Users can upload any common image format
4. **Storage Efficiency**: Optimized file sizes and formats
5. **Compatibility**: Ensures images work across all platforms and browsers

## Example Usage

```python
from app.services.image_preprocessor import ImagePreprocessor

# Process an image
processed_content, new_filename, mime_type = ImagePreprocessor.preprocess_image(
    file_content,
    "original.heic",
    target_format='PNG',
    quality=95
)

# Check if preprocessing is needed
if ImagePreprocessor.needs_preprocessing(mime_type):
    print(f"Converting {mime_type} to PNG...")
```

## Error Handling

The service gracefully handles errors:
- **Unsupported formats**: Falls back to generic processing
- **Corrupted files**: Logs error and continues with original
- **Processing failures**: Maintains upload functionality
- **Memory issues**: Handles large files efficiently

## Performance Considerations

### PDF Processing Performance
PDF conversion is the most computationally expensive operation due to:
- **Complex Format**: PDFs require parsing, interpretation, and rendering
- **Rasterization**: Vector-to-pixel conversion is CPU-intensive
- **Memory Usage**: Large PDFs can consume significant memory
- **Quality vs Speed**: Higher zoom factors increase quality but decrease speed

### Performance Tuning Options
```python
from app.services.image_preprocessor import ImagePreprocessor

# Fast mode - lower quality, much faster
ImagePreprocessor.configure_pdf_processing(quality_mode='fast')

# Balanced mode - good quality, reasonable speed (default)
ImagePreprocessor.configure_pdf_processing(quality_mode='balanced')

# Quality mode - highest quality, slower processing
ImagePreprocessor.configure_pdf_processing(quality_mode='quality')

# Custom configuration
ImagePreprocessor.configure_pdf_processing(
    zoom_factor=1.2,      # Lower zoom = faster
    compress_level=4,     # Lower compression = faster
    quality_mode='balanced'
)
```

### Expected Processing Times
- **Small PDFs (<1MB)**: 2-5 seconds
- **Medium PDFs (1-5MB)**: 5-15 seconds  
- **Large PDFs (5-25MB)**: 15-60 seconds
- **Complex PDFs**: May take longer due to graphics complexity

## Future Enhancements

- **Batch processing**: Process multiple images simultaneously
- **Format preferences**: User-configurable output formats
- **Quality settings**: Adjustable compression levels
- **Metadata preservation**: Keep EXIF and other metadata
- **Progressive processing**: Stream large files