metadata
license: mit
tags:
- clip
- visual-search
- image-retrieval
- fashion
library_name: clip
datasets:
- deepfashion
pipeline_tag: feature-extraction
model-index:
- name: StyleFinder
results:
- task:
type: image-retrieval
name: Fashion Visual Search
dataset:
type: deepfashion
name: DeepFashion In-shop Clothes Retrieval
metrics:
- type: recall@1
value: 53.95
name: Rank-1 Accuracy (RN50)
- type: map
value: 0.4265
name: mAP (RN50)
- type: recall@1
value: 46.24
name: Rank-1 Accuracy (ViT-B/16)
- type: map
value: 0.3481
name: mAP (ViT-B/16)
π StyleFinder β AI-Powered Fashion Visual Search
StyleFinder is a deep learning-based image retrieval system fine-tuned on the DeepFashion In-shop Clothes dataset using CLIP. It enables users to upload an image and retrieve visually similar fashion items using both zero-shot and fine-tuned CLIP variants.
π§ Supported Models
| Model | Stage | Description |
|---|---|---|
| ViT-B/16 | Stage 3 v4 | Best fine-tuned transformer-based model |
| RN50 | Stage 3 v3 | Best fine-tuned CNN-based model |
| ViT-B/16 | Zero-shot | Official OpenAI pretrained CLIP |
| RN50 | Zero-shot | Official OpenAI pretrained CLIP |
π Evaluation Results
| Metric | ViT-B/16 (v4) | RN50 (v3) |
|---|---|---|
| Rank-1 | 46.24% | 53.95% |
| mAP | 0.3481 | 0.4265 |
πΌοΈ Precomputed Gallery Features
Gallery embeddings are stored as .pt files for fast cosine similarity search.
| File Name | Description |
|---|---|
vitb16_stage3_v4_gallery.pt |
Fine-tuned ViT-B/16 gallery |
rn50_stage3_v3_gallery.pt |
Fine-tuned RN50 gallery |
vitb16_zeroshot_gallery.pt |
Official CLIP ViT-B/16 gallery |
rn50_zeroshot_gallery.pt |
Official CLIP RN50 gallery |
These are stored in the gallery_features/ directory and can be loaded with load_gallery_features().
βοΈ How to Use
πΉ Load a Model
from model_loader import load_model
model, preprocess = load_model(arch="vitb16", stage="stage3") # or rn50 / zeroshot