|
--- |
|
license: cc |
|
language: |
|
- en |
|
|
|
--- |
|
|
|
# Model Card for AutoML Cuisine Classification |
|
|
|
This model card documents the **AutoML Cuisine Classification** model trained with AutoGluon Multimodal on a classmate’s dataset of food images. |
|
The task is to predict whether a food image belongs to **Asian** or **Western** cuisine (binary classification). |
|
|
|
--- |
|
|
|
## Model Details |
|
- **Developed by:** Bareethul Kader |
|
- **Framework:** AutoGluon Multimodal |
|
- **Repository:** bareethul/image-dataset-model |
|
- **License:** CC BY 4.0 |
|
|
|
--- |
|
|
|
## Intended Use |
|
### Direct Use |
|
- Educational demonstration of AutoML on an image classification task. |
|
- Comparison of different backbones (ResNet18, MobileNetV3, EfficientNet-B0). |
|
- Exploring effects of augmentation and model selection under constrained compute budget. |
|
|
|
### Out of Scope Use |
|
- Not intended for production deployments in food classification systems. |
|
- May not generalize to cuisines other than “Asian vs Western,” or to non-restaurant/home cooked settings. |
|
- Not meant for health/dietary or allergy related automation. |
|
|
|
--- |
|
|
|
## Dataset |
|
- **Source:** [maryzhang/hw1-24679-image-dataset](https://huggingface.co/datasets/maryzhang/hw1-24679-image-dataset) |
|
- **Task:** Binary image classification (label 0 = Western cuisine, label 1 = Asian cuisine) |
|
- **Size:** |
|
- Original images: 40 |
|
- Augmented images: 320 |
|
- Total: ≈ 360 images |
|
- **Features:** |
|
- `image`: Image (RGB, as provided) |
|
- `label`: Integer 0 or 1 |
|
|
|
--- |
|
|
|
## Training Setup |
|
- **AutoML framework:** AutoGluon Multimodal (`MultiModalPredictor`) |
|
- **Evaluation metric:** Accuracy |
|
- **Budget:** 600 seconds (10 minutes) for quick runs; longer (~1800s) for full run and more accuracy. |
|
- **Hardware:** Google Colab (GPU, typical environment) |
|
- **Search Space:** |
|
- Backbones: `resnet18`, `mobilenetv3_small_100`, `efficientnet_b0` |
|
- **Preprocessing / Augmentation:** As provided in dataset (augmented split); resize and standard image transforms as in dataset loading |
|
|
|
--- |
|
|
|
### Search Space and Hyperparameters |
|
AutoGluon automatically searched across the following dimensions: |
|
- **Architectures (depth/width):** ResNet18 (shallow), MobileNetV3-Small (compact width), EfficientNet-B0 (deeper, wider baseline). |
|
- **Optimizers:** Variants of AdamW and SGD. |
|
- **Learning rate / weight decay:** Schedules in range ~1e-3 to 1e-5 with decay applied. |
|
- **Regularization:** Implicit dropout layers (in backbones) and weight decay. |
|
- **Augmentation:** |
|
- Random crops and flips (default torchvision pipeline). |
|
- RandAugment (with random distortions). |
|
- Mixup (interpolated samples). |
|
- **Early stopping:** Triggered automatically when validation metric stops improving. |
|
|
|
--- |
|
### Augmentation Pipeline |
|
- Random resized crop to **224 × 224 pixels** |
|
- Random horizontal flip |
|
- Color jitter (brightness, contrast, saturation, hue) |
|
- RandAugment (random transformations applied with strength parameter) |
|
- Mixup with α = 0.2 (blending images/labels) |
|
|
|
### Input Resolution |
|
- All images resized to **224 × 224** before being passed to the network |
|
|
|
### Expected Preprocessing |
|
- RGB image normalization (mean/std) using ImageNet statistics |
|
- One hot encoding of labels for classification |
|
- Train/validation split: 80/20 stratified |
|
|
|
--- |
|
|
|
## Results |
|
|
|
### Best Architecture |
|
- AutoGluon selected **EfficientNet-B0** as the best performing backbone in terms of validation accuracy. |
|
- Other backbones tested included **ResNet18** and **MobileNetV3-Small**, which had slightly lower validation accuracy. |
|
|
|
### Best Hyperparameters |
|
- Optimizer: AdamW |
|
- Learning rate: ~0.001 (exact value depends on AutoGluon’s internal selection) |
|
- Weight decay: ~1e-4 |
|
- Regularization: implicit (from backbone architecture) |
|
- Augmentation: dataset’s augmented split + standard image transforms |
|
- Early stopping: triggered automatically when validation stopped improving |
|
|
|
### Training Curves & Early-Stop Rationale |
|
- Validation accuracy with EfficientNet-B0 rose steadily and plateaued |
|
- Early stopping occurred once validation did not improve (or under what condition AutoGluon decided) |
|
- Prevented overfitting while still allowing model to reach its best validation performance |
|
|
|
### Test Metrics |
|
On the held-out **original split** (~40 images): |
|
|
|
- **Test Accuracy:** 1.0 |
|
- **Weighted F1:** 1.0 |
|
|
|
--- |
|
## Error Analysis |
|
|
|
- The model reached accuracy and F1 of 1.0 on the test split. This is due to the dataset’s small size or possible overlap with augmented data. The results reflect dataset limitations rather than true generalization. |
|
|
|
## Limitations, Biases, and Ethical Notes |
|
- Small dataset size -> overfitting risk. |
|
- Augmented data may not capture all real world variance (lighting, background, etc.). |
|
- Binary classification “Asian vs Western” is coarse; many cuisines and dishes don’t neatly fit. |
|
- Labeling reflects simplified categories; cultural/geographic nuance lost. |
|
|
|
--- |
|
|
|
## Known Failure Modes |
|
- Struggles on images with unusual lighting/backgrounds |
|
- Misclassifies foods with **fusion characteristics** (e.g., Asian inspired Western dishes) |
|
- Sensitive to **out-of-distribution inputs** (images outside the dataset’s augmentation domain) |
|
- Performs poorly when food is occluded or partially cropped |
|
|
|
--- |
|
|
|
## AI Usage Disclosure |
|
|
|
- Assistance tools were used to streamline coding |
|
- Improve documentation clarity, |
|
- Refine the model card presentation. |
|
--- |
|
## Example Inference |
|
```python |
|
from autogluon.multimodal import MultiModalPredictor |
|
|
|
# Load the pretrained model |
|
predictor = MultiModalPredictor.load("bareethul/image-dataset-model") |
|
|
|
# Run inference on an image file |
|
pred = predictor.predict("path/to/your_test_food_image.jpg") |
|
print("Prediction:", pred) # 0 = Western cuisine, 1 = Asian cuisine |
|
|
|
|