File size: 6,739 Bytes

---
library_name: keras-hub
---
### Model Overview
SigLIP model pre-trained on WebLi at resolution 224x224. It was introduced in the paper [Sigmoid Loss for Language Image Pre-Training](https://arxiv.org/abs/2303.15343) by Zhai et al. and first released in this [repository](https://github.com/google-research/big_vision).
SigLIP is [CLIP](https://huggingface.co/docs/transformers/model_doc/clip), a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
A TLDR of SigLIP by one of the authors can be found [here](https://twitter.com/giffmana/status/1692641733459267713).

Weights are released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE) . Keras model code is released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE).

## Links

* [SigLIP Quickstart Notebook](https://www.kaggle.com/code/laxmareddypatlolla/siglip-quickstart-notebook-with-hub)
* [SigLIP API Documentation](https://keras.io/keras_hub/api/models/siglip/)
* [SigLIP Model Card](https://arxiv.org/abs/2303.15343)
* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)

## Installation

Keras and KerasHub can be installed with:

```
pip install -U -q keras-hub
pip install -U -q keras
```

Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.

## Presets

The following model checkpoints are provided by the Keras team. Full code examples for each are available below.

| Preset name                            | Parameters | Description                                                                                                  |
|---------------------------------------|------------|--------------------------------------------------------------------------------------------------------------|
| siglip_base_patch16_224 | 	203.16M	|  200 million parameter, image size 224, pre-trained on WebLi.  |  
siglip_base_patch16_256 |	203.20M	 | 200 million parameter, image size 256, pre-trained on WebLi. |
siglip_base_patch16_384 |	203.45M	|  200 million parameter, image size 384, pre-trained on WebLi. |
siglip_base_patch16_512 |	203.79M	|  200 million parameter, image size 512, pre-trained on WebLi. |
siglip_base_patch16_256_multilingual	|370.63M | 370 million parameter, image size 256, pre-trained on WebLi.|
siglip2_base_patch16_224 |	375.19M	| 375 million parameter, patch size 16, image size 224, pre-trained on WebLi.|
siglip2_base_patch16_256| 375.23M |	375 million parameter, patch size 16, image size 256, pre-trained on WebLi.|
siglip2_base_patch32_256| 376.86M	| 376 million parameter, patch size 32, image size 256, pre-trained on WebLi.|
siglip2_base_patch16_384 |	376.86M	| 376 million parameter, patch size 16, image size 384, pre-trained on WebLi.|
siglip_large_patch16_256 |	652.15M	| 652 million parameter, image size 256, pre-trained on WebLi. |
siglip_large_patch16_384 |	652.48M	| 652 million parameter, image size 384, pre-trained on WebLi. |
siglip_so400m_patch14_224	| 877.36M	| 877 million parameter, image size 224, shape-optimized version, pre-trained on WebLi.|
siglip_so400m_patch14_384	| 877.96M| 877 million parameter, image size 384, shape-optimized version, pre-trained on WebLi.|
siglip2_large_patch16_256	|881.53M	|881 million parameter, patch size 16, image size 256, pre-trained on WebLi.|
siglip2_large_patch16_384 |	881.86M	| 881 million parameter, patch size 16, image size 384, pre-trained on WebLi.|
siglip2_large_patch16_512	| 882.31M	|882 million parameter, patch size 16, image size 512, pre-trained on WebLi.|
siglip_so400m_patch16_256_i18n	| 1.13B	|1.1 billion parameter, image size 256, shape-optimized version, pre-trained on WebLi.|
siglip2_so400m_patch14_224 |	1.14B	|1.1 billion parameter, patch size 14, image size 224, shape-optimized version, pre-trained on WebLi.|
siglip2_so400m_patch16_256|	1.14B	|1.1 billion parameter, patch size 16, image size 256, shape-optimized version, pre-trained on WebLi.|
siglip2_so400m_patch14_384 |	1.14B	|1.1 billion parameter, patch size 14, image size 224, shape-optimized version, pre-trained on WebLi.|
siglip2_so400m_patch16_384 |	1.14B	|1.1 billion parameter, patch size 16, image size 384, shape-optimized version, pre-trained on WebLi.|
siglip2_so400m_patch16_512|	1.14B	|1.1 billion parameter, patch size 16, image size 512, shape-optimized version, pre-trained on WebLi.|
siglip2_giant_opt_patch16_256|	1.87B	|1.8 billion parameter, patch size 16, image size 256, pre-trained on WebLi.|
siglip2_giant_opt_patch16_384|	1.87B	|1.8 billion parameter, patch size 16, image size 384, pre-trained on WebLi.|

## Example Usage
```Python
import keras
import numpy as np
import matplotlib.pyplot as plt
from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
from keras_hub.layers import SigLIPImageConverter

# instantiate the model and preprocessing tools
siglip = SigLIPBackbone.from_preset("siglip2_so400m_patch16_512")
tokenizer = SigLIPTokenizer.from_preset("siglip2_so400m_patch16_512",
sequence_length=64)
image_converter = SigLIPImageConverter.from_preset("siglip2_so400m_patch16_512")

# obtain tokens for some input text
tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])

# preprocess image and text
image = keras.utils.load_img("cat.jpg")
image = image_converter(np.array([image]).astype(float))

# query the model for similarities
siglip({
     "images": image,
     "token_ids": tokens,
})
```

## Example Usage with Hugging Face URI

```Python
import keras
import numpy as np
import matplotlib.pyplot as plt
from keras_hub.models import SigLIPBackbone, SigLIPTokenizer
from keras_hub.layers import SigLIPImageConverter

# instantiate the model and preprocessing tools
siglip = SigLIPBackbone.from_preset("hf://keras/siglip2_so400m_patch16_512")
tokenizer = SigLIPTokenizer.from_preset("hf://keras/siglip2_so400m_patch16_512",
sequence_length=64)
image_converter = SigLIPImageConverter.from_preset("hf://keras/siglip2_so400m_patch16_512")

# obtain tokens for some input text
tokens = tokenizer.tokenize(["mountains", "cat on tortoise", "house"])

# preprocess image and text
image = keras.utils.load_img("cat.jpg")
image = image_converter(np.array([image]).astype(float))

# query the model for similarities
siglip({
     "images": image,
     "token_ids": tokens,
})
```