File size: 4,542 Bytes
c124ce8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2ca03a5
 
c124ce8
 
 
 
 
 
511963f
 
c124ce8
 
 
 
 
 
 
 
 
 
 
 
 
ce4bbab
f8f63c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce4bbab
c124ce8
ce4bbab
 
 
c124ce8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
607f6b8
 
 
 
 
 
 
 
 
 
 
 
 
 
ce4bbab
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
license: mit
tags:
  - object-detection
  - computer-vision
  - feet
  - pytorch
  - faster-rcnn
library_name: torchvision
pipeline_tag: object-detection
model-index:
  - name: Faster R-CNN Foot Detector
    results: []
---

# 🦶 Faster R-CNN Foot Detection Model

![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/648a824a8ca6cf9857d1349c/5yqaFK6aNuC_suQE2o_BA.jpeg)

by [Tony Assi](https://www.tonyassi.com/)

This model detects **feet or shoes** in an image using a fine-tuned [Faster R-CNN](https://pytorch.org/vision/stable/models/generated/torchvision.models.detection.fasterrcnn_resnet50_fpn.html) model from Torchvision.

It was trained on a small custom dataset of foot annotations and is intended as a starting point for foot/shoe detection in street, fashion, or movement-based applications.

Web demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/tonyassi/foot-detection)

---

## 🧠 Model Details

- Base model: `fasterrcnn_resnet50_fpn` (pretrained on COCO)
- Fine-tuned on 40 images of feet/shoes
- Class labels:
  - `1`: foot/shoe
- Bounding box outputs with confidence scores
- Optimized for CPU (but works with MPS and CUDA)

---


## ⚡️ Quick Start

To download this repository:

```bash
git clone https://github.com/tonyassi/FootDetection.git
cd FootDetection
```

Install:

```bash
pip install -r requirements.txt
```


Usage:

```python
from FootDetection import FootDetection
from PIL import Image

# Initialize model (first run will auto-download weights)
foot_detection = FootDetection("cpu")  # "cuda" for GPU  or "mps" for Apple Silicon

# Load image
img = Image.open("image.jpg").convert("RGB")

# Run detection
results = foot_detection.detect(img, threshold=0.1)
print(results)

# Draw boxes
img_with_boxes = foot_detection.draw_boxes(img)
img_with_boxes.show()
img_with_boxes.save("annotated_image.jpg")
```

---

## 📦 Usage
```bash
pip install torch torchvision pillow huggingface_hub
```

```python
import os
import torch
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from PIL import Image, ImageDraw
from torchvision.transforms import functional as F
from huggingface_hub import hf_hub_download

# ===== CONFIG =====
device = torch.device("cpu")  # or "mps" if stable
checkpoint_dir = "checkpoints"
checkpoint_file = "fasterrcnn_foot.pth"
local_path = os.path.join(checkpoint_dir, checkpoint_file)

# ===== Ensure Checkpoint Exists =====
if not os.path.exists(local_path):
    os.makedirs(checkpoint_dir, exist_ok=True)
    print("Downloading model from Hugging Face...")
    local_path = hf_hub_download(
        repo_id="tonyassi/foot-detection",
        filename=checkpoint_file,
        local_dir=checkpoint_dir
    )

# ===== Load Model =====
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="DEFAULT")
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, 2)
model.load_state_dict(torch.load(local_path, map_location=device))
model.to(device)
model.eval()

# ===== Function: Foot Detection =====
def foot_detection(image, threshold=0.1):
    """Takes a PIL image, returns bounding boxes + scores above threshold"""
    image_tensor = F.to_tensor(image).unsqueeze(0).to(device)
    with torch.no_grad():
        outputs = model(image_tensor)[0]

    boxes = []
    scores = []
    for box, score in zip(outputs["boxes"], outputs["scores"]):
        if score >= threshold:
            boxes.append(box.tolist())
            scores.append(score.item())

    return {
        "boxes": boxes,
        "scores": scores
    }

# ===== Function: Draw Bounding Boxes =====
def draw_bounding_box(image, detection):
    """Draws boxes and scores on a copy of the image"""
    image_copy = image.copy()
    draw = ImageDraw.Draw(image_copy)

    for box, score in zip(detection["boxes"], detection["scores"]):
        x0, y0, x1, y1 = box
        draw.rectangle([x0, y0, x1, y1], outline="red", width=3)
        draw.text((x0, y0), f"{score:.2f}", fill="red")

    return image_copy


from PIL import Image

# ==== Load and prepare image ====
image_path = "test.jpg"  # replace with your image path
image = Image.open(image_path).convert("RGB")

# ==== Run detection ====
detections = foot_detection(image, threshold=0.3)

# ==== Draw results ====
result_image = draw_bounding_box(image, detections)
result_image.show()  # or result_image.save("output.jpg")
```