food11-vit

This model is a fine-tuned version of google/vit-base-patch16-224 on the Food11 dataset.

Model description

ViT-base transformer trained to classify food images into 11 categories using transfer learning and PyTorch Lightning.

Intended uses & limitations

This model is intended for food image classification tasks with a fixed set of 11 common food types. It may not generalize to out-of-distribution food images or fine-grained food variants.

Classes

Bread
Dairy product
Dessert
Egg
Fried food
Meat
Noodles-Pasta
Rice
Seafood
Soup
Vegetable-Fruit

Training and evaluation data

The model was trained on the training split of the Food11 dataset (9,866 images) and validated on the validation split (3,430 images). The test set was not used.

Training procedure

Training hyperparameters

The following hyperparameters were used:

learning_rate: 2e-5
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: AdamW
lr_scheduler_type: linear
num_epochs: 5

Training results

Epoch	Step	Training Loss	Validation Loss	Validation Accuracy
1	308	1.2517	0.1991	0.9531
2	617	0.4728	0.1376	0.9621
3	926	0.2027	0.1281	0.9621
4	1235	0.2861	0.1395	0.9589
5	1544	0.2943	0.1223	0.9659

Framework versions

Transformers 4.39.3
PyTorch 2.1.2
Datasets 2.18.0
Tokenizers 0.15.1

Skorm
/

food11-vit