File size: 6,739 Bytes
046d25a 1735bd9 9089199 1735bd9 9089199 1735bd9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
---
library_name: keras-hub
---
### Model Overview
SigLIP model pre-trained on WebLi at resolution 224x224. It was introduced in the paper [Sigmoid Loss for Language Image Pre-Training](https://arxiv.org/abs/2303.15343) by Zhai et al. and first released in this [repository](https://github.com/google-research/big_vision).
SigLIP is [CLIP](https://huggingface.co/docs/transformers/model_doc/clip), a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
A TLDR of SigLIP by one of the authors can be found [here](https://twitter.com/giffmana/status/1692641733459267713).
Weights are released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE) . Keras model code is released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE).
## Links
* [SigLIP Quickstart Notebook](https://www.kaggle.com/code/laxmareddypatlolla/siglip-quickstart-notebook-with-hub)
* [SigLIP API Documentation](https://keras.io/keras_hub/api/models/siglip/)
* [SigLIP Model Card](https://arxiv.org/abs/2303.15343)
* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
## Installation
Keras and KerasHub can be installed with:
```
pip install -U -q keras-hub
pip install -U -q keras
```
Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.
## Presets
The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
| Preset name | Parameters | Description |
|---------------------------------------|------------|--------------------------------------------------------------------------------------------------------------|
| siglip_base_patch16_224 | 203.16M | 200 million parameter, image size 224, pre-trained on WebLi. |
siglip_base_patch16_256 | 203.20M | 200 million parameter, image size 256, pre-trained on WebLi. |
siglip_base_patch16_384 | 203.45M | 200 million parameter, image size 384, pre-trained on WebLi. |
siglip_base_patch16_512 | 203.79M | 200 million parameter, image size 512, pre-trained on WebLi. |
siglip_base_patch16_256_multilingual |370.63M | 370 million parameter, image size 256, pre-trained on WebLi.|
siglip2_base_patch16_224 | 375.19M | 375 million parameter, patch size 16, image size 224, pre-trained on WebLi.|
siglip2_base_patch16_256| 375.23M | 375 million parameter, patch size 16, image size 256, pre-trained on WebLi.|
siglip2_base_patch32_256| 376.86M | 376 million parameter, patch size 32, image size 256, pre-trained on WebLi.|
siglip2_base_patch16_384 | 376.86M | 376 million parameter, patch size 16, image size 384, pre-trained on WebLi.|
siglip_large_patch16_256 | 652.15M | 652 million parameter, image size 256, pre-trained on WebLi. |
siglip_large_patch16_384 | 652.48M | 652 million parameter, image size 384, pre-trained on WebLi. |
siglip_so400m_patch14_224 | 877.36M | 877 million parameter, image size 224, shape-optimized version, pre-trained on WebLi.|
siglip_so400m_patch14_384 | 877.96M| 877 million parameter, image size 384, shape-optimized version, pre-trained on WebLi.|
siglip2_large_patch16_256 |881.53M |881 million parameter, patch size 16, image size 256, pre-trained on WebLi.|
siglip2_large_patch16_384 | 881.86M | 881 million parameter, patch size 16, image size 384, pre-trained on WebLi.|
siglip2_large_patch16_512 | 882.31M |882 million parameter, patch size 16, image size 512, pre-trained on WebLi.|
siglip_so400m_patch16_256_i18n | 1.13B |1.1 billion parameter, image size 256, shape-optimized version, pre-trained on WebLi.|
siglip2_so400m_patch14_224 | 1.14B |1.1 billion parameter, patch size 14, image size 224, shape-optimized version, pre-trained on WebLi.|
siglip2_so400m_patch16_256| 1.14B |1.1 billion parameter, patch size 16, image size 256, shape-optimized version, pre-trained on WebLi.|
siglip2_so400m_patch14_384 | 1.14B |1.1 billion parameter, patch size 14, image size 224, shape-optimized version, pre-trained on WebLi.|
siglip2_so400m_patch16_384 | 1.14B |1.1 billion parameter, patch size 16, image size 384, shape-optimized version, pre-trained on WebLi.|
siglip2_so400m_patch16_512| 1.14B |1.1 billion parameter, patch size 16, image size 512, shape-optimized version, pre-trained on WebLi.|
siglip2_giant_opt_patch16_256| 1.87B |1.8 billion parameter, patch size 16, image size 256, pre-trained on WebLi.|
siglip2_giant_opt_patch16_384| 1.87B |1.8 billion parameter, patch size 16, image size 384, pre-trained on WebLi.|
## Example Usage
```Python
import keras
import numpy as np
import matplotlib.pyplot as plt
from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
from keras_hub.layers import SigLIPImageConverter
# instantiate the model and preprocessing tools
siglip = SigLIPBackbone.from_preset("siglip2_so400m_patch16_512")
tokenizer = SigLIPTokenizer.from_preset("siglip2_so400m_patch16_512",
sequence_length=64)
image_converter = SigLIPImageConverter.from_preset("siglip2_so400m_patch16_512")
# obtain tokens for some input text
tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])
# preprocess image and text
image = keras.utils.load_img("cat.jpg")
image = image_converter(np.array([image]).astype(float))
# query the model for similarities
siglip({
"images": image,
"token_ids": tokens,
})
```
## Example Usage with Hugging Face URI
```Python
import keras
import numpy as np
import matplotlib.pyplot as plt
from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
from keras_hub.layers import SigLIPImageConverter
# instantiate the model and preprocessing tools
siglip = SigLIPBackbone.from_preset("hf://keras/siglip2_so400m_patch16_512")
tokenizer = SigLIPTokenizer.from_preset("hf://keras/siglip2_so400m_patch16_512",
sequence_length=64)
image_converter = SigLIPImageConverter.from_preset("hf://keras/siglip2_so400m_patch16_512")
# obtain tokens for some input text
tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])
# preprocess image and text
image = keras.utils.load_img("cat.jpg")
image = image_converter(np.array([image]).astype(float))
# query the model for similarities
siglip({
"images": image,
"token_ids": tokens,
})
```
|