tags: | |
- clip | |
- vision-language | |
- image-text | |
- pytorch | |
license: apache-2.0 | |
# CLIP Model | |
This is a fine-tuned CLIP model for vision-language tasks. | |
## Model Description | |
This model was fine-tuned from a base CLIP model and includes custom temperature scaling. | |
## Usage | |
```python | |
from transformers import CLIPModel, CLIPProcessor | |
import torch | |
# Load model and processor | |
model = CLIPModel.from_pretrained("aprendesc/CLIP_model_v0") | |
processor = CLIPProcessor.from_pretrained("aprendesc/CLIP_model_v0") | |
# Load temperature parameter if available | |
try: | |
from huggingface_hub import hf_hub_download | |
temperature_path = hf_hub_download(repo_id="aprendesc/CLIP_model_v0", filename="temperature.pth") | |
temperature = torch.load(temperature_path, map_location='cpu') | |
print(f"Temperature parameter: {temperature}") | |
except: | |
print("No temperature parameter found") | |
# Use the model for inference | |
# ... your inference code here ... | |
``` | |
## Training Details | |
- Base model: CLIP | |
- Custom temperature scaling included | |
- Fine-tuned for specific vision-language tasks | |