File size: 2,143 Bytes
e6d332b
 
cc09ea2
 
 
 
fb0a4ff
 
 
 
 
 
e6d332b
cc09ea2
e6d332b
efd8f5c
 
 
 
3f64d2c
e6d332b
cc09ea2
e6d332b
355cddc
 
b9dc025
 
 
 
 
b02f9c8
 
 
 
 
 
b9dc025
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
355cddc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
library_name: transformers
license: mit
language:
- en
pipeline_tag: object-detection
base_model:
- microsoft/conditional-detr-resnet-50
tags:
- object-detection
- fashion
- search
---
This model is fine-tuned version of microsoft/conditional-detr-resnet-50.

You can find details of model in this github repo -> [fashion-visual-search](https://github.com/yainage90/fashion-visual-search)

And you can find fashion image feature extractor model -> [yainage90/fashion-image-feature-extractor](https://huggingface.co/yainage90/fashion-image-feature-extractor)

This model was trained using a combination of two datasets: [modanet](https://github.com/eBay/modanet) and [fashionpedia](https://fashionpedia.github.io/home/)

The labels are ['bag', 'bottom', 'dress', 'hat', 'shoes', 'outer', 'top']

In the 96th epoch out of total of 100 epochs, the best score was achieved with mAP 0.7542. Therefore, it is believed that there is a little room for performance improvement.

``` python
from PIL import Image
import torch
from transformers import  AutoImageProcessor, AutoModelForObjectDetection

device = 'cpu'
if torch.cuda.is_available():
    device = torch.device('cuda')
elif torch.backends.mps.is_available():
    device = torch.device('mps')

ckpt = 'yainage90/fashion-object-detection'
image_processor = AutoImageProcessor.from_pretrained(ckpt)
model = AutoModelForObjectDetection.from_pretrained(ckpt).to(device)

image = Image.open('<path/to/image>').convert('RGB')

with torch.no_grad():
    inputs = image_processor(images=[image], return_tensors="pt")
    outputs = model(**inputs.to(device))
    target_sizes = torch.tensor([[image.size[1], image.size[0]]])
    results = image_processor.post_process_object_detection(outputs, threshold=0.4, target_sizes=target_sizes)[0]

    items = []
    for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
        score = score.item()
        label = label.item()
        box = [i.item() for i in box]
        print(f"{model.config.id2label[label]}: {round(score, 3)} at {box}")
        items.append((score, label, box))
```

![sample_image](sample_image.png)