DistillCLIP
This model is a distilled version of CLIP-ViT-B/32 distilled with Conceptual Captions 3M. It achieves the following results on the evaluation set:
- Loss: 0.0064
- Intra-modal Loss: 0.0056
- Inter-modal Loss: 0.0008
Model description
DistillCLIP is a distilled version of CLIP. Specficially, the teacher model was a CLIP-ViT-B/32.
The knowledge distillation scheme of CLIP is presented below:
CLIP is distilled with two losses: $L_{inter}$ and $L_{intra}$. These losses respectively distill the inter-modal (image-text) and intra-modal (image-image, text-text) similarity maps with MSE losses. The final distillation loss is the sum of the two losses, or $L = L_{inter} + L_{intra}$.
The image encoder is a ViT-S/16 while the text encoder is a 6-layer Transformer encoder. At the start of training the image encoder was initialized with ImageNet-21K pretrained weights while the text encoder was initialized with every odd indexed layer of the teacher text encoder (assuming layers are zero-indexed).
Intended uses & limitations
Primary intended uses
Research on vision-language models e.g. natural language supervised image classification, visual question answering, text-to-image synthesis
Primary intended users
Researchers in the field of vision-language representation learning
Out-of-scope use cases
In-the-wild applications e.g. industrial deployment
Training and evaluation data
The model was trained and evaluated on Conceptual Captions 3M.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 84
- eval_batch_size: 84
- seed: 42
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-06
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10000
- training_steps: 33513
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Intra-modal Loss | Intra-modal Loss |
---|---|---|---|---|---|
0.0259 | 0.01 | 500 | 0.0223 | 0.0194 | 0.0029 |
0.0197 | 0.03 | 1000 | 0.0178 | 0.0152 | 0.0026 |
0.017 | 0.04 | 1500 | 0.0153 | 0.0129 | 0.0023 |
0.0153 | 0.06 | 2000 | 0.0133 | 0.0112 | 0.0021 |
0.0142 | 0.07 | 2500 | 0.0135 | 0.0116 | 0.0019 |
0.0134 | 0.09 | 3000 | 0.0138 | 0.0119 | 0.0018 |
0.0127 | 0.1 | 3500 | 0.0117 | 0.0099 | 0.0018 |
0.012 | 0.12 | 4000 | 0.0116 | 0.0099 | 0.0017 |
0.0115 | 0.13 | 4500 | 0.0113 | 0.0097 | 0.0016 |
0.0111 | 0.15 | 5000 | 0.0112 | 0.0098 | 0.0014 |
0.0108 | 0.16 | 5500 | 0.0112 | 0.0097 | 0.0015 |
0.0106 | 0.18 | 6000 | 0.0107 | 0.0093 | 0.0014 |
0.0105 | 0.19 | 6500 | 0.0102 | 0.0089 | 0.0013 |
0.0101 | 0.21 | 7000 | 0.0100 | 0.0087 | 0.0013 |
0.0098 | 0.22 | 7500 | 0.0101 | 0.0089 | 0.0013 |
0.0098 | 0.24 | 8000 | 0.0100 | 0.0088 | 0.0013 |
0.0098 | 0.25 | 8500 | 0.0100 | 0.0089 | 0.0012 |
0.0094 | 0.27 | 9000 | 0.0095 | 0.0084 | 0.0011 |
0.0092 | 0.28 | 9500 | 0.0092 | 0.0080 | 0.0011 |
0.0091 | 0.3 | 10000 | 0.0097 | 0.0086 | 0.0011 |
0.0091 | 0.31 | 10500 | 0.0098 | 0.0087 | 0.0011 |
0.0087 | 0.33 | 11000 | 0.0090 | 0.0079 | 0.0011 |
0.0085 | 0.34 | 11500 | 0.0089 | 0.0079 | 0.0010 |
0.0088 | 0.36 | 12000 | 0.0086 | 0.0075 | 0.0010 |
0.0082 | 0.37 | 12500 | 0.0084 | 0.0075 | 0.0010 |
0.0082 | 0.39 | 13000 | 0.0080 | 0.0070 | 0.0009 |
0.008 | 0.4 | 13500 | 0.0080 | 0.0071 | 0.0010 |
0.008 | 0.42 | 14000 | 0.0088 | 0.0078 | 0.0010 |
0.0078 | 0.43 | 14500 | 0.0086 | 0.0076 | 0.0010 |
0.0077 | 0.45 | 15000 | 0.0081 | 0.0071 | 0.0010 |
0.0076 | 0.46 | 15500 | 0.0077 | 0.0068 | 0.0009 |
0.0075 | 0.48 | 16000 | 0.0076 | 0.0067 | 0.0009 |
0.0074 | 0.49 | 16500 | 0.0075 | 0.0066 | 0.0009 |
0.0072 | 0.51 | 17000 | 0.0070 | 0.0061 | 0.0009 |
0.0072 | 0.52 | 17500 | 0.0075 | 0.0066 | 0.0009 |
0.0071 | 0.54 | 18000 | 0.0072 | 0.0063 | 0.0009 |
0.0071 | 0.55 | 18500 | 0.0071 | 0.0063 | 0.0009 |
0.007 | 0.57 | 19000 | 0.0076 | 0.0067 | 0.0009 |
0.0069 | 0.58 | 19500 | 0.0074 | 0.0065 | 0.0009 |
0.0068 | 0.6 | 20000 | 0.0067 | 0.0059 | 0.0009 |
0.0069 | 0.61 | 20500 | 0.0067 | 0.0058 | 0.0008 |
0.0067 | 0.63 | 21000 | 0.0069 | 0.0061 | 0.0008 |
0.0067 | 0.64 | 21500 | 0.0071 | 0.0062 | 0.0008 |
0.0065 | 0.66 | 22000 | 0.0069 | 0.0061 | 0.0008 |
0.0065 | 0.67 | 22500 | 0.0066 | 0.0058 | 0.0008 |
0.0065 | 0.69 | 23000 | 0.0070 | 0.0062 | 0.0008 |
0.0064 | 0.7 | 23500 | 0.0068 | 0.0059 | 0.0008 |
0.0064 | 0.72 | 24000 | 0.0064 | 0.0056 | 0.0008 |
0.0063 | 0.73 | 24500 | 0.0066 | 0.0058 | 0.0008 |
0.0063 | 0.75 | 25000 | 0.0065 | 0.0057 | 0.0008 |
0.0062 | 0.76 | 25500 | 0.0066 | 0.0058 | 0.0008 |
0.0062 | 0.78 | 26000 | 0.0064 | 0.0056 | 0.0008 |
0.0062 | 0.79 | 26500 | 0.0065 | 0.0057 | 0.0008 |
0.0061 | 0.81 | 27000 | 0.0065 | 0.0057 | 0.0008 |
0.0061 | 0.82 | 27500 | 0.0063 | 0.0055 | 0.0008 |
0.0059 | 0.84 | 28000 | 0.0064 | 0.0057 | 0.0008 |
0.006 | 0.85 | 28500 | 0.0064 | 0.0056 | 0.0008 |
0.006 | 0.87 | 29000 | 0.0065 | 0.0057 | 0.0008 |
0.006 | 0.88 | 29500 | 0.0065 | 0.0057 | 0.0008 |
0.006 | 0.9 | 30000 | 0.0065 | 0.0057 | 0.0008 |
0.006 | 0.91 | 30500 | 0.0064 | 0.0056 | 0.0008 |
0.0059 | 0.93 | 31000 | 0.0064 | 0.0056 | 0.0008 |
0.006 | 0.94 | 31500 | 0.0064 | 0.0056 | 0.0008 |
0.0059 | 0.95 | 32000 | 0.0064 | 0.0056 | 0.0008 |
0.0058 | 0.97 | 32500 | 0.0064 | 0.0056 | 0.0008 |
0.0059 | 0.98 | 33000 | 0.0064 | 0.0056 | 0.0008 |
0.0059 | 1.0 | 33500 | 0.0064 | 0.0056 | 0.0008 |
Framework versions
- Transformers 4.29.2
- Pytorch 2.0.0
- Datasets 2.13.1
- Tokenizers 0.13.3
- Downloads last month
- 3