food11-vit

This model is a fine-tuned version of google/vit-base-patch16-224 on the Food11 dataset.

Model description

ViT-base transformer trained to classify food images into 11 categories using transfer learning and PyTorch Lightning.

Intended uses & limitations

This model is intended for food image classification tasks with a fixed set of 11 common food types. It may not generalize to out-of-distribution food images or fine-grained food variants.

Classes

  • Bread
  • Dairy product
  • Dessert
  • Egg
  • Fried food
  • Meat
  • Noodles-Pasta
  • Rice
  • Seafood
  • Soup
  • Vegetable-Fruit

Training and evaluation data

The model was trained on the training split of the Food11 dataset (9,866 images) and validated on the validation split (3,430 images). The test set was not used.

Training procedure

Training hyperparameters

The following hyperparameters were used:

  • learning_rate: 2e-5
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: AdamW
  • lr_scheduler_type: linear
  • num_epochs: 5

Training results

Epoch Step Training Loss Validation Loss Validation Accuracy
1 308 1.2517 0.1991 0.9531
2 617 0.4728 0.1376 0.9621
3 926 0.2027 0.1281 0.9621
4 1235 0.2861 0.1395 0.9589
5 1544 0.2943 0.1223 0.9659

Framework versions

  • Transformers 4.39.3
  • PyTorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.1
Downloads last month
13
Safetensors
Model size
85.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Skorm/food11-vit

Finetuned
(868)
this model

Space using Skorm/food11-vit 1