dyryu1208
commit
920dfd0

A newer version of the Gradio SDK is available: 5.43.1

Upgrade
metadata
title: Real Time AI Video Summarization Service
emoji: πŸ“Ί
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 5.33.0
app_file: app.py
pinned: false
license: mit
short_description: Multi-agent performs STT and summarizes real-time video
tags:
  - agent-demo-track

Real-time AI Video Summarization Service: Multi-Agent Workflow Implementation

πŸ’‘ Service Overview

This application is a real-time analysis and summarization service for video content, powered by an AI agent workflow. Multiple specialized AI agents work together, each performing their distinct role to deliver comprehensive analytical results.

πŸ€– AI Agent Workflow

The application comprises three specialized AI agents working in collaboration:

  1. Speech Recognition Agent: Based on AWS Transcribe, this agent converts video speech to text and specializes in distinguishing between multiple speakers.

  2. Summarization Agent: Leveraging the Claude 3.5 Haiku model, this agent analyzes the transcribed text and extracts key content. It excels at understanding context and identifying crucial concepts.

  3. Knowledge Retrieval Agent: Powered by Google Gemini, this agent extracts key keywords from the transcribed text and performs Google Search on these keywords, summarizing additional information for each keyword. This provides valuable context and background knowledge related to the video content.

These three agents operate asynchronously, processing data sequentially and sharing results under the coordination of a mediator (backend controller). They perform tasks autonomously without user intervention and update in real-time.

πŸ›  Key Features

  • Autonomous Agent Collaboration: Each agent works independently in its specialized domain and shares results
  • Real-time Speech Recognition: The speech recognition agent converts video audio to text
  • Intelligent Content Summarization: The summarization agent understands context and extracts essential content
  • Automatic Background Knowledge: The knowledge retrieval agent provides relevant information from web searches
  • Multiple Speaker Identification: Identification and distinction of various speakers in conversational content
  • Real-time Updates: Entire agent workflow results refresh at 10-second intervals

πŸ“‹ Supported Content

Currently, the agent analysis system supports the following three AWS-related video contents:

  1. Agents for Amazon Bedrock: Technical lecture about Amazon Bedrock agents
  2. Bundesliga Fan Experience: Case study on how Bundesliga uses AI to enhance fan experiences
  3. Discover New AWS Services with AWS Heroes: Introduction to new AWS services in 2024

πŸš€ How to Use

  1. Wait until the thumbnail images for each video fully appear.

  2. Select the video title located just below the thumbnail image, then click the video play button. (You can select any video, but we recommend choosing "Data, AI & Soccer How Bundesliga is transforming the fan experience" due to language considerations.)

  3. When you press the Auto Update button at the bottom, the Real-Time Script, AI Summary Result, and Keyword Search Result will be updated every 10 seconds in real-time according to the agent workflow.

    • The Real-Time Script is the execution result of the Speech Recognition Agent that converts video content to text using AWS Transcribe.
    • The AI Summary Result is the execution result of the Summarization Agent.
    • The Keyword Search Result is the execution result of the Knowledge Retrieval Agent.
  4. By pressing the Refresh button, you can immediately check the results up to that point.

πŸ”§ Technology Stack

  • User Interface: Gradio 5.31.0
  • Agent Technologies:
    • Speech Recognition: Amazon Transcribe
    • Content Summarization: AWS Bedrock (Claude 3.5 Haiku)
    • Knowledge Retrieval: Google Gemini 2.0 Flash

πŸ“Œ Notes

  • Initial results take approximately 30 seconds to appear after the agent workflow starts.
  • Automatic updates occur at 10-second intervals.
  • Each agent's analysis results are accumulated and stored as history.

πŸ”— Related Links

πŸ“œ License

This project is released under the MIT License.