File size: 5,815 Bytes
0575c82 14a4acf 0575c82 c3afd2a a558aef c3afd2a 0575c82 aae39a6 0575c82 32bfb13 0575c82 32bfb13 b938d85 0575c82 a558aef 304679d 0575c82 304679d 0575c82 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
---
license: cc
language:
- en
---
# Model Card for AutoML Cuisine Classification
This model card documents the **AutoML Cuisine Classification** model trained with AutoGluon Multimodal on a classmate’s dataset of food images.
The task is to predict whether a food image belongs to **Asian** or **Western** cuisine (binary classification).
---
## Model Details
- **Developed by:** Bareethul Kader
- **Framework:** AutoGluon Multimodal
- **Repository:** bareethul/image-dataset-model
- **License:** CC BY 4.0
---
## Intended Use
### Direct Use
- Educational demonstration of AutoML on an image classification task.
- Comparison of different backbones (ResNet18, MobileNetV3, EfficientNet-B0).
- Exploring effects of augmentation and model selection under constrained compute budget.
### Out of Scope Use
- Not intended for production deployments in food classification systems.
- May not generalize to cuisines other than “Asian vs Western,” or to non-restaurant/home cooked settings.
- Not meant for health/dietary or allergy related automation.
---
## Dataset
- **Source:** [maryzhang/hw1-24679-image-dataset](https://huggingface.co/datasets/maryzhang/hw1-24679-image-dataset)
- **Task:** Binary image classification (label 0 = Western cuisine, label 1 = Asian cuisine)
- **Size:**
- Original images: 40
- Augmented images: 320
- Total: ≈ 360 images
- **Features:**
- `image`: Image (RGB, as provided)
- `label`: Integer 0 or 1
---
## Training Setup
- **AutoML framework:** AutoGluon Multimodal (`MultiModalPredictor`)
- **Evaluation metric:** Accuracy
- **Budget:** 600 seconds (10 minutes) for quick runs; longer (~1800s) for full run and more accuracy.
- **Hardware:** Google Colab (GPU, typical environment)
- **Search Space:**
- Backbones: `resnet18`, `mobilenetv3_small_100`, `efficientnet_b0`
- **Preprocessing / Augmentation:** As provided in dataset (augmented split); resize and standard image transforms as in dataset loading
---
### Search Space and Hyperparameters
AutoGluon automatically searched across the following dimensions:
- **Architectures (depth/width):** ResNet18 (shallow), MobileNetV3-Small (compact width), EfficientNet-B0 (deeper, wider baseline).
- **Optimizers:** Variants of AdamW and SGD.
- **Learning rate / weight decay:** Schedules in range ~1e-3 to 1e-5 with decay applied.
- **Regularization:** Implicit dropout layers (in backbones) and weight decay.
- **Augmentation:**
- Random crops and flips (default torchvision pipeline).
- RandAugment (with random distortions).
- Mixup (interpolated samples).
- **Early stopping:** Triggered automatically when validation metric stops improving.
---
### Augmentation Pipeline
- Random resized crop to **224 × 224 pixels**
- Random horizontal flip
- Color jitter (brightness, contrast, saturation, hue)
- RandAugment (random transformations applied with strength parameter)
- Mixup with α = 0.2 (blending images/labels)
### Input Resolution
- All images resized to **224 × 224** before being passed to the network
### Expected Preprocessing
- RGB image normalization (mean/std) using ImageNet statistics
- One hot encoding of labels for classification
- Train/validation split: 80/20 stratified
---
## Results
### Best Architecture
- AutoGluon selected **EfficientNet-B0** as the best performing backbone in terms of validation accuracy.
- Other backbones tested included **ResNet18** and **MobileNetV3-Small**, which had slightly lower validation accuracy.
### Best Hyperparameters
- Optimizer: AdamW
- Learning rate: ~0.001 (exact value depends on AutoGluon’s internal selection)
- Weight decay: ~1e-4
- Regularization: implicit (from backbone architecture)
- Augmentation: dataset’s augmented split + standard image transforms
- Early stopping: triggered automatically when validation stopped improving
### Training Curves & Early-Stop Rationale
- Validation accuracy with EfficientNet-B0 rose steadily and plateaued
- Early stopping occurred once validation did not improve (or under what condition AutoGluon decided)
- Prevented overfitting while still allowing model to reach its best validation performance
### Test Metrics
On the held-out **original split** (~40 images):
- **Test Accuracy:** 1.0
- **Weighted F1:** 1.0
---
## Error Analysis
- The model reached accuracy and F1 of 1.0 on the test split. This is due to the dataset’s small size or possible overlap with augmented data. The results reflect dataset limitations rather than true generalization.
## Limitations, Biases, and Ethical Notes
- Small dataset size -> overfitting risk.
- Augmented data may not capture all real world variance (lighting, background, etc.).
- Binary classification “Asian vs Western” is coarse; many cuisines and dishes don’t neatly fit.
- Labeling reflects simplified categories; cultural/geographic nuance lost.
---
## Known Failure Modes
- Struggles on images with unusual lighting/backgrounds
- Misclassifies foods with **fusion characteristics** (e.g., Asian inspired Western dishes)
- Sensitive to **out-of-distribution inputs** (images outside the dataset’s augmentation domain)
- Performs poorly when food is occluded or partially cropped
---
## AI Usage Disclosure
- Assistance tools were used to streamline coding
- Improve documentation clarity,
- Refine the model card presentation.
---
## Example Inference
```python
from autogluon.multimodal import MultiModalPredictor
# Load the pretrained model
predictor = MultiModalPredictor.load("bareethul/image-dataset-model")
# Run inference on an image file
pred = predictor.predict("path/to/your_test_food_image.jpg")
print("Prediction:", pred) # 0 = Western cuisine, 1 = Asian cuisine
|