PerceptCLIP
/

PerceptCLIP_Memorability

computer_vision

perceptual_tasks

Model card Files Files and versions Community

PerceptCLIP_Memorability / README.md

Amitz244's picture

Update README.md

37b7c94 verified 5 months ago

|

1.98 kB

	---
	language:
	- en
	base_model:
	- openai/clip-vit-large-patch14
	tags:
	- memorability
	- computer_vision
	- perceptual_tasks
	- CLIP
	- LaMem
	- THINGS
	---
	# Don’t Judge Before You CLIP: Memorability Prediction Model

	PreceptCLIP-Memorability is a model designed to predict image memorability (the likelihood of an image to be remembered). This is the official model from the paper ["Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks"](https://arxiv.org/abs/2503.13260). Our model applies LoRA adaptation on the CLIP visual encoder with an additional MLP head to achieve state-of-the-art results.

	## Training Details

	- Dataset: [LaMem](http://memorability.csail.mit.edu/download.html) (Large-Scale Image Memorability)
	- Architecture: CLIP Vision Encoder (ViT-L/14) with LoRA adaptation
	- Loss Function: Mean Squared Error (MSE) Loss for memorability prediction
	- Optimizer: AdamW
	- Learning Rate: 5e-05
	- Batch Size: 32

	## Usage

	To use the model for inference:

	```python
	from torchvision import transforms
	import torch
	from PIL import Image
	from huggingface_hub import hf_hub_download
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	# Load model
	model_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_Memorability", filename="perceptCLIP_Memorability.pth")
	model = torch.load(model_path).to(device).eval()

	# Load an image
	image = Image.open("image_path.jpg").convert("RGB")

	# Preprocess and predict
	def Mem_preprocess():
	transform = transforms.Compose([
	transforms.Resize(224),
	transforms.CenterCrop(size=(224, 224)),
	transforms.ToTensor(),
	transforms.Normalize(mean=(0.48145466, 0.4578275, 0.40821073),
	std=(0.26862954, 0.26130258, 0.27577711))
	])
	return transform

	image = Mem_preprocess()(image).unsqueeze(0).to(device)

	with torch.no_grad():
	mem_score = model(image).item()

	print(f"Predicted Memorability Score: {mem_score:.4f}")