Spaces:
Sleeping
Sleeping
File size: 6,969 Bytes
c08ab4e 56a3735 c08ab4e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
import os
os.environ["STREAMLIT_WATCHER_TYPE"] = "none"
os.environ["STREAMLIT_WATCH_DISABLE"] = "true"
import streamlit as st
from PIL import Image
import torch
import torchvision.transforms as transforms
import pandas as pd
from utils.preprocessing import get_transforms
from models.resnet_model import ResNet18
# Class names in order
class_names = [
'calling', 'clapping', 'cycling', 'dancing', 'drinking', 'eating', 'fighting',
'hugging', 'laughing', 'listening_to_music', 'running', 'sitting', 'sleeping',
'texting', 'using_laptop'
]
@st.cache_resource
def load_model():
if not os.path.exists("models/best_model.pth"):
st.error("Model weights not found. Please ensure 'models/best_model.pth' exists.")
st.stop()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ResNet18(num_classes=15)
model.load_state_dict(torch.load("models/best_model.pth", map_location=device))
model.to(device)
model.eval()
return model, device
def predict(image, model, device):
transform = get_transforms()
image_t = transform(image).unsqueeze(0).to(device)
with torch.no_grad():
outputs = model(image_t)
probs = torch.softmax(outputs, dim=1)
conf, predicted = torch.max(probs, dim=1)
return class_names[predicted.item()], float(conf.item()) # type: ignore
def main():
st.title("Human Action Recognition App")
tab1, tab2, tab3 = st.tabs(["About", "Predict", "Metrics & Test Predictions"])
with tab1:
st.header("About This App")
st.markdown("""
### 🧠 Human Action Recognition (HAR)
This application classifies **human actions** from static images using a deep learning model trained on a curated dataset of 15 different activities.
#### 🔍 Purpose
To demonstrate how computer vision and deep learning can be used to **recognize and classify human behaviors** in images — useful for applications such as surveillance, activity monitoring, and human-computer interaction.
#### 🧰 Model
- **Architecture:** ResNet18 (Residual Neural Network with 18 layers)
- **Pretrained:** On ImageNet for general features
- **Fine-tuned:** On a specialized Human Action Recognition dataset for task-specific learning
#### 📚 Dataset
- **Source:** [Bingsu/Human_Action_Recognition](https://huggingface.co/datasets/Bingsu/Human_Action_Recognition)
- **Categories:** 15 action classes
- `calling`, `clapping`, `cycling`, `dancing`, `drinking`, `eating`, `fighting`,
`hugging`, `laughing`, `listening_to_music`, `running`, `sitting`,
`sleeping`, `texting`, `using_laptop`
---
🖼️ Simply upload an image, and the model will analyze and classify the dominant action being performed.
""")
with tab2:
st.header("Predict Human Action from Image")
uploaded_file = st.file_uploader("Upload an image", type=["jpg", "jpeg", "png"])
if uploaded_file is not None:
try:
image = Image.open(uploaded_file).convert("RGB")
except Exception:
st.error("Error loading image. Please upload a valid JPG or PNG file.")
return
st.image(image, caption="Uploaded Image", use_container_width=True)
model, device = load_model()
prediction, confidence = predict(image, model, device)
pred_label = prediction.replace('_', ' ').title()
st.success(f"Predicted Action: **{pred_label}**")
st.info(f"Confidence: {confidence*100:.2f}%")
# Show transformed input
transform = get_transforms()
transformed_image = transform(image)
st.image(transforms.ToPILImage()(transformed_image), caption="Transformed Input (for model)", use_container_width=True)
# Detailed explanation based on predicted output
st.markdown(f"""
### About the Transformed Input for **{pred_label}**
Before the model makes its prediction, the uploaded image undergoes several preprocessing steps to prepare it for analysis:
- **Resizing and cropping:** The image is resized and cropped to a consistent size (usually 224x224 pixels) so that the model receives uniform input dimensions.
- **Normalization:** Pixel color values are scaled based on mean and standard deviation values (typically from ImageNet dataset statistics). This helps the model generalize better by standardizing the input distribution.
- **Conversion to Tensor:** The image is converted from a PIL image to a PyTorch tensor, which is the required input format for the model.
This processed image is exactly what the model "sees" when it predicts the action **{pred_label}**. Understanding this helps ensure the model's input is consistent and reliable.
""")
with tab3:
st.header("Training & Validation Metrics")
st.markdown("""
**Training Accuracy (96.5%)**
During training, the model correctly identified human actions in 96.5% of the images. This indicates it has effectively learned the patterns and features present in the training data.
**Validation Accuracy (96.6%)**
When evaluated on new, unseen images, the model correctly classified 96.6% of them. This demonstrates its ability to generalize knowledge beyond simply memorizing the training examples.
**Training Loss (0.12)**
The average prediction error during training was low (0.12), meaning the model’s guesses are generally close to the true labels.
**Validation Loss (0.10)**
On unseen data, the prediction error was even lower (0.10), suggesting the model is not overfitting but genuinely learning to understand the task.
""")
st.markdown("---")
st.header("Test Set Predictions Preview")
st.markdown("""
The table below presents a sample of the model’s predictions on the test dataset, which consists of images the model has not encountered during training. The columns typically include:
- **Filename:** The name of the test image file
- **Predicted Label:** The human action predicted by the model
- **Confidence:** The model’s confidence score for each prediction
Reviewing this information aids in evaluating the model’s real-world performance and helps identify potential failure cases.
""")
csv_path = "test_predictions.csv"
if os.path.exists(csv_path):
df = pd.read_csv(csv_path)
st.dataframe(df.head(20)) # show first 20 rows
st.success(f"Loaded {len(df)} test predictions.")
else:
st.warning(f"Test predictions CSV file not found at: {csv_path}")
if __name__ == "__main__":
main()
|