Spaces:

Agents-MCP-Hackathon
/

Real-time-AI-Video-Summarization-Service

Paused

App Files Files Community

dyryu1208 commited on Jun 6, 2025

Commit

920dfd0

1 Parent(s): 24f0286

commit

Browse files

Files changed (10) hide show

.DS_Store +0 -0
README.md +74 -4
analyze_claude.py +40 -0
app.py +439 -0
google_search.py +82 -0
prompts.py +201 -0
realtime_video_analysis.py +146 -0
requirements.txt +12 -0
run_backend.py +184 -0
transcribe_texts +0 -0

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

README.md CHANGED Viewed

@@ -1,14 +1,84 @@
 ---
 title: Real Time AI Video Summarization Service
-emoji: 📈
-colorFrom: indigo
-colorTo: green
 sdk: gradio
 sdk_version: 5.33.0
 app_file: app.py
 pinned: false
 license: mit
 short_description: Multi-agent performs STT and summarizes real-time video
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Real Time AI Video Summarization Service
+emoji: 📺
+colorFrom: purple
+colorTo: indigo
 sdk: gradio
 sdk_version: 5.33.0
 app_file: app.py
 pinned: false
 license: mit
 short_description: Multi-agent performs STT and summarizes real-time video
+tags:
+  - agent-demo-track
 ---
+# Real-time AI Video Summarization Service: Multi-Agent Workflow Implementation
+## 💡 Service Overview
+This application is a real-time analysis and summarization service for video content, powered by an AI agent workflow. Multiple specialized AI agents work together, each performing their distinct role to deliver comprehensive analytical results.
+## 🤖 AI Agent Workflow
+The application comprises three specialized AI agents working in collaboration:
+1. **Speech Recognition Agent**: Based on AWS Transcribe, this agent converts video speech to text and specializes in distinguishing between multiple speakers.
+2. **Summarization Agent**: Leveraging the Claude 3.5 Haiku model, this agent analyzes the transcribed text and extracts key content. It excels at understanding context and identifying crucial concepts.
+3. **Knowledge Retrieval Agent**: Powered by Google Gemini, this agent extracts key keywords from the transcribed text and performs Google Search on these keywords, summarizing additional information for each keyword. This provides valuable context and background knowledge related to the video content.
+These three agents operate asynchronously, processing data sequentially and sharing results under the coordination of a mediator (backend controller). They perform tasks autonomously without user intervention and update in real-time.
+## 🛠 Key Features
+- **Autonomous Agent Collaboration**: Each agent works independently in its specialized domain and shares results
+- **Real-time Speech Recognition**: The speech recognition agent converts video audio to text
+- **Intelligent Content Summarization**: The summarization agent understands context and extracts essential content
+- **Automatic Background Knowledge**: The knowledge retrieval agent provides relevant information from web searches
+- **Multiple Speaker Identification**: Identification and distinction of various speakers in conversational content
+- **Real-time Updates**: Entire agent workflow results refresh at 10-second intervals
+## 📋 Supported Content
+Currently, the agent analysis system supports the following three AWS-related video contents:
+1. **Agents for Amazon Bedrock**: Technical lecture about Amazon Bedrock agents
+2. **Bundesliga Fan Experience**: Case study on how Bundesliga uses AI to enhance fan experiences
+3. **Discover New AWS Services with AWS Heroes**: Introduction to new AWS services in 2024
+## 🚀 How to Use
+1. Wait until the thumbnail images for each video fully appear.
+2. Select the video title located just below the thumbnail image, then click the video play button. (You can select any video, but we recommend choosing "Data, AI & Soccer How Bundesliga is transforming the fan experience" due to language considerations.)
+3. When you press the Auto Update button at the bottom, the Real-Time Script, AI Summary Result, and Keyword Search Result will be updated every 10 seconds in real-time according to the agent workflow.
+    * The Real-Time Script is the execution result of the Speech Recognition Agent that converts video content to text using AWS Transcribe.
+    * The AI Summary Result is the execution result of the Summarization Agent.
+    * The Keyword Search Result is the execution result of the Knowledge Retrieval Agent.
+4. By pressing the Refresh button, you can immediately check the results up to that point.
+## 🔧 Technology Stack
+- **User Interface**: Gradio 5.31.0
+- **Agent Technologies**:
+  - Speech Recognition: Amazon Transcribe
+  - Content Summarization: AWS Bedrock (Claude 3.5 Haiku)
+  - Knowledge Retrieval: Google Gemini 2.0 Flash
+## 📌 Notes
+- Initial results take approximately 30 seconds to appear after the agent workflow starts.
+- Automatic updates occur at 10-second intervals.
+- Each agent's analysis results are accumulated and stored as history.
+## 🔗 Related Links
+- [AWS Bedrock](https://aws.amazon.com/bedrock/)
+- [Amazon Transcribe](https://aws.amazon.com/transcribe/)
+- [Google Gemini AI](https://ai.google.dev/)
+## 📜 License
+This project is released under the MIT License.

analyze_claude.py ADDED Viewed

	@@ -0,0 +1,40 @@

+import boto3
+import json
+from prompts import *
+bedrock = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")
+CLAUDE_MODEL_ID = "us.anthropic.claude-3-5-haiku-20241022-v1:0"
+def analyze_with_claude(stt_data, content_type="국민의힘"):
+    """
+    Use Claude to summarize AI based on STT data.
+    Select the appropriate prompt according to the content_type.
+    """
+    if content_type == "Agents for Amazon Bedrock":
+        prompt_template = BEDROCK_CLAUDE_PROMPT
+    elif content_type == "Bundesliga Fan Experience":
+        prompt_template =  BUNDESLIGA_CLAUDE_PROMPT
+    elif content_type == "AWS_2024_recap":
+        prompt_template = AWS_CLAUDE_PROMPT
+    formatted_prompt = prompt_template.format(stt_data=stt_data)
+    body = json.dumps({
+        "anthropic_version": "bedrock-2023-05-31",
+        "max_tokens": 1000,
+        "messages": [
+            {
+                "role": "user",
+                "content": formatted_prompt
+            }
+        ]
+    })
+    response = bedrock.invoke_model(
+        modelId=CLAUDE_MODEL_ID,
+        body=body
+    )
+    response_body = json.loads(response.get('body').read())
+    return response_body['content'][0]['text']

app.py ADDED Viewed

	@@ -0,0 +1,439 @@

+import gradio as gr
+from PIL import Image
+import threading
+import time
+import os
+import shutil
+from pathlib import Path
+from huggingface_hub import hf_hub_download
+from run_backend import main as run_backend_main, analysis_results, search_results
+# Files Prequisites
+def prepare_dataset_files():
+    # Create local directory
+    os.makedirs("data", exist_ok=True)
+    # Check current directory
+    print(f"Current working directory: {os.getcwd()}")
+    # Define list of required files
+    files_to_download = [
+        "aws.mp4", "aws.png", "aws.wav",
+        "aws_bundesliga.mp4", "aws_bundesliga.png", "aws_bundesliga.wav",
+        "summit_sungwoo.mp4", "summit_sungwoo.png", "summit_sungwoo.wav"
+    ]
+    repo_id = "cloudplayer/hackathon_data"
+    repo_type = "dataset"
+    try:
+        for file_name in files_to_download:
+            local_path = os.path.join("data", file_name)
+            # Skip if file already exists
+            if os.path.exists(local_path):
+                print(f"File already exists, skipping download: {local_path}")
+                continue
+            # Download file from Hub
+            downloaded_path = hf_hub_download(
+                repo_id=repo_id,
+                filename=file_name,
+                repo_type=repo_type,
+                local_dir="data",
+                local_dir_use_symlinks=False  # Download actual file
+            )
+            print(f"Downloaded file: {downloaded_path}")
+        # Check downloaded files
+        print(f"Files in data directory: {os.listdir('data')}")
+        return True
+    except Exception as e:
+        print(f"Error downloading files: {e}")
+        import traceback
+        traceback.print_exc()
+        return False
+prepare_dataset_files()
+# Set the directory path based on the current file
+BASE_DIR = "."
+DATA_DIR = "./data"
+TRANSCRIPT_FILE = "./transcribe_texts"
+# Analysis Status Management
+analysis_running = False
+analysis_thread = None
+last_update_time = 0
+def read_transcript():
+    """Read Transcript File"""
+    try:
+        with open(TRANSCRIPT_FILE, "r", encoding="utf-8") as f:
+            return f.read()
+    except Exception as e:
+        return "Loading Script..."
+def get_current_content():
+    """Get Current Content"""
+    global last_update_time
+    if not analysis_running:
+        return "", "", "", "", ""
+    try:
+        current_time = time.time()
+        if current_time - last_update_time < 1.0:  # If less than 1 second, do not update
+            return None
+        last_update_time = current_time
+        transcript = read_transcript()
+        current_analysis = analysis_results[-1] if analysis_results else ""
+        current_search = search_results[-1] if search_results else ""
+        # 이전 결과 업데이트
+        if len(analysis_results) > 1:
+            prev_analysis_text = "\n\n".join([
+                f"#### Summary #{i+1}\n{result}"
+                for i, result in enumerate(analysis_results[:-1])
+            ])
+        else:
+            prev_analysis_text = "No previous analysis results."
+        if len(search_results) > 1:
+            prev_search_text = "\n\n".join([
+                f"#### Search Result #{i+1}\n{result}"
+                for i, result in enumerate(search_results[:-1])
+            ])
+        else:
+            prev_search_text = "No previous search results."
+        return transcript, current_analysis, current_search, prev_analysis_text, prev_search_text
+    except Exception as e:
+        print(f"Error occurred while updating content: {e}")
+        return None
+def start_analysis(party, video_path):
+    """Start Analysis"""
+    global analysis_running, analysis_thread, last_update_time
+    if not analysis_running:
+        analysis_running = True
+        last_update_time = time.time()
+        # Initialize Transcript File
+        try:
+            with open(TRANSCRIPT_FILE, "w", encoding="utf-8") as f:
+                f.write("")
+        except Exception as e:
+            pass
+        # Start Analysis Thread
+        analysis_thread = threading.Thread(target=run_backend_main, args=(party,))
+        analysis_thread.daemon = True
+        analysis_thread.start()
+    return gr.Markdown(f"# {party} Analysis"), gr.update(value=video_path)
+def create_ui():
+    """Create UI"""
+    with gr.Blocks(title="Real-Time AI Video Summarization Service", theme=gr.themes.Soft()) as demo:
+        # State Variables
+        party = gr.State("")
+        container_visible = gr.State(True)
+        selection_visible = gr.State(True)
+        auto_update = gr.State(False)  # Auto Update State
+        update_trigger = gr.State(0)  # Update Trigger
+        # Add Timer Component (10 second interval)
+        timer = gr.Timer(10.0, active=False)
+        # Add User Guide at the top
+        gr.Markdown("""
+        ## How to Use:
+        1. Wait until the thumbnail images for each video fully appear.
+        2. Select the video title located just below the thumbnail image, then click the video play button in "Sample Video".
+           (You can select any video, but we recommend choosing "Data, AI & Soccer How Bundesliga is transforming the fan experience" due to language considerations.)
+        3. When you press the Auto Update button at the bottom, the Real-Time Script, AI Summary Result, and Keyword Search Result will be updated every 10 seconds in real-time according to the agent workflow.
+            * The Real-Time Script is the execution result of the Speech Recognition Agent that converts video content to text using AWS Transcribe.
+            * The AI Summary Result is the execution result of the Summarization Agent.
+            * The Keyword Search Result is the execution result of the Knowledge Retrieval Agent.
+        4. By pressing the Refresh button, you can immediately check the results up to that point.
+        """)
+        with gr.Column(visible=lambda: container_visible.value) as aws_container:
+            gr.Markdown("### AWS Lecture - Select the video to perform AI summarization")
+            with gr.Row(equal_height=True):
+                with gr.Column(scale=1, min_width=400):
+                    aws_image_2 = gr.Image(
+                        value=str(DATA_DIR + "/aws_bundesliga.png"),
+                        label="aws_bundesliga",
+                        show_label=True,
+                        height=300,  # Fixed Image Height
+                        width=400,   # Fixed Image Width
+                        elem_id="aws_bundesliga"
+                    )
+                    aws_button_2 = gr.Button(
+                        "Data, AI & Soccer How Bundesliga is transforming the fan experience",
+                        variant="primary",
+                        size="lg",
+                        elem_id="aws_button_2"
+                    )
+                with gr.Column(scale=1, min_width=400):
+                    aws_image_1 = gr.Image(
+                        value=str(DATA_DIR + "/summit_sungwoo.png"),
+                        label="summit_sungwoo",
+                        show_label=True,
+                        height=300,  # Fixed Image Height
+                        width=400,   # Fixed Image Width
+                        elem_id="summit_sungwoo"
+                    )
+                    aws_button_1 = gr.Button(
+                        "The Future of AI is Here! Agents for Amazon Bedrock",
+                        variant="primary",
+                        size="lg",
+                        elem_id="aws_button_1"
+                    )
+                with gr.Column(scale=1, min_width=400):
+                    aws_image_3 = gr.Image(
+                        value=str(DATA_DIR + "/aws.png"),
+                        label="aws",
+                        show_label=True,
+                        height=300,
+                        width=400,
+                        elem_id="aws"
+                    )
+                    aws_button_3 = gr.Button(
+                        "Discover the New AWS Services with AWS Heroes in 2024",
+                        variant="primary",
+                        size="lg",
+                        elem_id="aws_button_3"
+                    )
+            # Add CSS Style
+            gr.Markdown("""
+            <style>
+            #summit_sungwoo, #aws_bundesliga, #aws{
+                object-fit: contain !important;
+                background-color: #f8f9fa;
+                border-radius: 10px;
+                padding: 10px;
+                box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+            }
+            #aws_button_1, #aws_button_2, #aws_button_3 {
+                margin-top: 20px;
+                width: 100%;
+                height: 50px;
+                font-size: 1.2em;
+                font-weight: bold;
+                border-radius: 8px;
+                transition: all 0.3s ease;
+            }
+            #aws_button_1:hover, #aws_button_2:hover, #aws_button_3:hover {
+                transform: translateY(-2px);
+                box-shadow: 0 4px 8px rgba(0,0,0,0.2);
+            }
+            </style>
+            """)
+        # Analysis Container (Initially Hidden)
+        with gr.Column(visible=lambda: selection_visible.value) as analysis_container:
+            title = gr.Markdown("# Video Analysis")
+            with gr.Row():
+                # Left: Video
+                with gr.Column(scale=3):
+                    video = gr.Video(
+                        label="Sample Video",
+                        show_label=True,
+                        interactive=False,
+                        value=str(DATA_DIR + "/summit_sungwoo.mp4"),  # Default Value
+                        elem_id="debate_video"
+                    )
+                # Right: Analysis Results Tabs
+                with gr.Column(scale=2):
+                    with gr.Tabs() as tabs:
+                        with gr.TabItem("Real-Time Script"):
+                            transcript = gr.Textbox(
+                                label="Real-Time Script",
+                                show_label=True,
+                                lines=20,
+                                interactive=False,
+                                value="Loading Script...",  # Initial Value
+                                elem_id="transcript_box"
+                            )
+                        with gr.TabItem("AI Summary Result"):
+                            analysis = gr.Markdown(
+                                value="Loading Analysis Result...",  # Initial Value
+                                elem_id="analysis_result"
+                            )
+                            with gr.Accordion("View Previous Analysis Results", open=False):
+                                prev_analysis = gr.Markdown(
+                                    value="No previous analysis results.",  # Initial Value
+                                    elem_id="prev_analysis"
+                                )
+                        with gr.TabItem("Keyword Search Result"):
+                            search = gr.Markdown(
+                                value="Loading Search Result...",  # Initial Value
+                                elem_id="search_result"
+                            )
+                            with gr.Accordion("View Previous Keyword Search Results", open=False):
+                                prev_search = gr.Markdown(
+                                    value="No previous search results.",  # Initial Value
+                                    elem_id="prev_search"
+                                )
+            # Show Status
+            status = gr.Markdown("Analysis is in progress...")
+            # Update Button
+            with gr.Row():
+                update_button = gr.Button(
+                    "Refresh",
+                    variant="secondary",
+                    size="lg",
+                    elem_id="update_button"
+                )
+                auto_update_button = gr.Button(
+                    "Auto Update",
+                    variant="secondary",
+                    size="lg",
+                    elem_id="auto_update_button"
+                )
+            # Add CSS Style for Analysis Page
+            gr.Markdown("""
+            <style>
+            #debate_video {
+                border-radius: 10px;
+                box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+            }
+            #transcript_box {
+                font-family: 'Noto Sans KR', sans-serif;
+                line-height: 1.6;
+            }
+            #analysis_result, #search_result, #prev_analysis, #prev_search {
+                font-family: 'Noto Sans KR', sans-serif;
+                line-height: 1.8;
+                padding: 15px;
+                background-color: #f8f9fa;
+                border-radius: 8px;
+            }
+            #update_button, #auto_update_button {
+                margin: 10px;
+                transition: all 0.3s ease;
+            }
+            #update_button:hover, #auto_update_button:hover {
+                transform: translateY(-2px);
+                box-shadow: 0 4px 8px rgba(0,0,0,0.2);
+            }
+            </style>
+            """)
+        def on_aws_select(content_name, video_file):
+            """AWS Lecture Selection Processing"""
+            party.value = content_name
+            video_path = str(DATA_DIR + f"/{video_file}")
+            container_visible.value = False
+            selection_visible.value = True
+            return start_analysis(content_name, video_path)
+        def trigger_update():
+            """Increase Update Trigger"""
+            update_trigger.value += 1
+            return update_trigger.value
+        def update_content(trigger):
+            """Update Content"""
+            if not analysis_running:
+                return (
+                    "Analysis has not started.",
+                    "No analysis results.",
+                    "No search results.",
+                    "No previous analysis results.",
+                    "No previous search results.",
+                    trigger
+                )
+            result = get_current_content()
+            if result is None:
+                return (
+                    transcript.value,
+                    analysis.value,
+                    search.value,
+                    prev_analysis.value,
+                    prev_search.value,
+                    trigger
+                )
+            return (*result, trigger)
+        def toggle_auto_update():
+            """Toggle Auto Update"""
+            auto_update.value = not auto_update.value
+            if auto_update.value:
+                # Start Auto Update - Increase Trigger
+                trigger_update()
+                return "Auto Update has started. It will be updated every 10 seconds.", gr.Timer(active=True)
+            else:
+                return "Auto Update has stopped.", gr.Timer(active=False)
+        aws_button_1.click(
+            fn=lambda: on_aws_select("Agents for Amazon Bedrock", "summit_sungwoo.mp4"),
+            outputs=[title, video]
+        )
+        aws_button_2.click(
+            fn=lambda: on_aws_select("Bundesliga Fan Experience", "aws_bundesliga.mp4"),
+            outputs=[title, video]
+        )
+        aws_button_3.click(
+            fn=lambda: on_aws_select("AWS_2024_recap", "aws.mp4"),
+            outputs=[title, video]
+        )
+        # Update Button Click Event
+        update_button.click(
+            fn=trigger_update,
+            outputs=[update_trigger]
+        )
+        # Auto Update Button Click Event
+        auto_update_button.click(
+            fn=toggle_auto_update,
+            outputs=[status, timer]
+        )
+        # Timer Tick Event
+        timer.tick(
+            fn=lambda: trigger_update() if auto_update.value else None,
+            outputs=[update_trigger]
+        )
+        # Update Trigger Change Event
+        update_trigger.change(
+            fn=update_content,
+            inputs=[update_trigger],
+            outputs=[transcript, analysis, search, prev_analysis, prev_search, update_trigger]
+        )
+        # Initial Load - Set Update Trigger
+        demo.load(
+            fn=lambda: (update_trigger.value + 1),
+            outputs=[update_trigger]
+        )
+    return demo
+if __name__ == "__main__":
+    demo = create_ui()
+    demo.queue()  # Enable Queue
+    demo.launch(share=True)

google_search.py ADDED Viewed

	@@ -0,0 +1,82 @@

+import os
+import json
+from google import genai
+from google.genai import types
+from dotenv import load_dotenv
+from prompts import *
+try:
+    from dotenv import load_dotenv
+    load_dotenv()
+except ImportError:
+    pass
+def format_search_results(results):
+    """Format Search Results"""
+    formatted_output = ""
+    for i in range(1, 4):
+        formatted_output += f"### {i}. {results[f'keyword{i}']}\n"
+        formatted_output += f"{results[f'summary{i}']}"
+        formatted_output += "\n"
+    return formatted_output
+def grounding_with_google_search(stt_data, content_type = "국민의힘"):
+    """
+    Extract Keywords and Perform Google Search
+    """
+    # Create a client for Google GenAI
+    client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
+    # Define a sample schema for search results
+    sample_schema = {
+        "keyword1": "Core Keyword 1",
+        "summary1": "Summarize Search Result about Core Keyword 1",
+        "keyword2": "Core Keyword 2",
+        "summary2": "Summarize Search Result about Core Keyword 2",
+        "keyword3": "Core Keyword 3",
+        "summary3": "Summarize Search Result about Core Keyword 3"
+    }
+    # Select the appropriate system prompt based on content_type
+    if content_type == "Agents for Amazon Bedrock":
+        system_prompt = BEDROCK_SEARCH_PROMPT
+    elif content_type == "Bundesliga Fan Experience":
+        system_prompt = BUNDESLIGA_SEARCH_PROMPT
+    elif content_type == "AWS_2024_recap":
+        system_prompt = AWS_SEARCH_PROMPT
+    # Format the system prompt with the sample schema
+    system_prompt = system_prompt.format(sample_schema=sample_schema)
+    # Prepare the human message with the input script
+    human_message = f"""
+    ## Input Script
+    {stt_data}
+    """
+    # Generate content using the Google GenAI client
+    response = client.models.generate_content(
+        model="gemini-2.0-flash-001",
+        contents=human_message,
+        config=types.GenerateContentConfig(
+            system_instruction=system_prompt,
+            response_mime_type="application/json",
+            tools=[
+                types.Tool(
+                    google_search=types.GoogleSearchRetrieval(
+                        dynamic_retrieval_config = types.DynamicRetrievalConfig(
+                            mode=types.DynamicRetrievalConfigMode.MODE_UNSPECIFIED,
+                            dynamic_threshold=0.0
+                        )
+                    )
+                )
+            ]
+        )
+    )
+    # Parse the response text and format the search results
+    text = response.text
+    results = json.loads(text)
+    return format_search_results(results)

prompts.py ADDED Viewed

	@@ -0,0 +1,201 @@

+# AWS SUMMIT AI Summary
+BEDROCK_CLAUDE_PROMPT = """
+You are a real-time analyst for a technical lecture on AWS Bedrock and generative AI.
+The presenter is explaining their company, the technologies they use, and how these technologies are implemented.
+Analyze the lecture content below and summarize the key points concisely.
+Read the IMPORTANT section carefully, and ensure all summaries are in English.
+* IMPORTANT:
+1. The input data is a script converted from real-time speech, so typos may occur.
+    - Correct typos related to technical terms to the correct terms
+    - Correct misnamed AWS services or company names to their accurate forms
+    - Exclude content that is difficult to understand in context
+2. Focus points for summary:
+    - The main business and role of the presenter's company/organization
+    - Key technologies explained in the presentation (AWS Bedrock, generative AI, agents, etc.)
+    - Main steps of the technology implementation process
+    - Examples or use cases of technology application mentioned in the presentation
+Here is the lecture content:
+{stt_data}
+1. Describe the company the presenter is affiliated with.
+2. What technology is the presenter explaining?
+3. Describe the process or implementation method of the technology.
+"""
+# AWS SUMMIT AI Summary Google Search Keywords
+BEDROCK_SEARCH_PROMPT = """
+You are a Focused Search and Analysis Assistant for AWS technical presentations.
+Your tasks:
+1. Read the Input script which was extracted from real-time voice records of an AWS Bedrock technical presentation.
+2. Extract exactly 3 most significant elements from the script, focusing specifically on:
+- The presenter's company/organization and its business
+- AWS Bedrock and generative AI technologies mentioned
+- Implementation methods and processes described
+3. For each extracted element:
+- The data you enter is scripted data transcribed from real-time speech and may contain typos. Process typos that make sense in context.
+- Correct any misnamed AWS services or company names to their accurate forms
+- Exclude from words anything that doesn't really make sense
+- Search for relevant information that provides clear context about the element
+- Provide comprehensive summaries in English. All summaries MUST be provided in English only.
+- Each element should be one word and short.
+4. Priority should be given to:
+- Company/organization name of the presenter and its core business
+- Specific AWS Bedrock features and generative AI technologies mentioned
+- Technical implementation steps or processes described
+- Any examples or use cases mentioned in the presentation
+Output Format:
+{sample_schema}
+* keyword1 should relate to the presenter's company or organization
+* keyword2 should relate to the core technology discussed (AWS Bedrock/generative AI)
+* keyword3 should relate to implementation methods or processes
+* Summary 1, 2, 3 are the searches and answers for each keyword. Include a detailed description of at least 2-3 sentences that would help understand the context of the presentation.
+"""
+BUNDESLIGA_CLAUDE_PROMPT = """
+You are a real-time analyst for a podcast discussing how the Bundesliga uses data and AI to innovate fan experiences.
+The podcast features a dialogue format with two speakers (Questioner 1, Responder 1) discussing how the Bundesliga is using data and AI.
+Analyze the conversation below and summarize the main discussion points and Q&A.
+Read the IMPORTANT section carefully, and ensure all summaries are in English.
+* IMPORTANT:
+1. The input data is a script converted from real-time speech, so typos may occur.
+    - Correct typos related to football terms, technical terms, and Bundesliga-related terms
+    - Consider the context of the dialogue between the questioner and responder
+    - Exclude content that is difficult to understand in context
+2. Focus points for summary:
+    - The core of the current discussion topic
+    - The main points of the questions posed by the questioner
+    - The key answers and information provided by the responder
+    - Important examples of data/AI usage in the Bundesliga discussed in the conversation
+3. Conversation structure analysis:
+    - Clearly distinguish and identify question-answer pairs
+    - Identify the interests of the questioner and the expertise of the responder
+    - Consider the flow and logical development of the conversation
+Here is the podcast conversation content:
+{stt_data}
+1. What is the current topic of discussion in the podcast?
+2. What are the main questions from the questioner and the main answers from the responder?
+"""
+BUNDESLIGA_SEARCH_PROMPT = """
+You are a Focused Search and Analysis Assistant for sports podcast interviews.
+Your tasks:
+1. Read the Input script which was extracted from real-time voice records of a podcast interview between an interviewer (questioner) and an interviewee (responder) discussing how Bundesliga uses data and AI to innovate fan experience. Note that the podcast content is in English.
+2. Extract exactly 3 most significant elements from the script, focusing specifically on:
+- The main discussion topic being addressed in the conversation
+- Key questions posed by the interviewer
+- Important answers and insights provided by the responder
+3. For each extracted element:
+- The data you enter is transcribed from an English podcast interview
+- First understand the question-answer exchange structure correctly
+- Process any sports terminology, team names, or technical terms that may contain typos
+- Exclude unclear statements or tangential discussions
+- Search for relevant information that provides context to the discussion topics
+- Provide comprehensive summaries in English. All summaries MUST be provided in English only.
+- Each element should be one word and short.
+4. Priority should be given to:
+- Main topics of discussion in the interview
+- Specific questions asked by the interviewer about data/AI in Bundesliga
+- Key insights, examples, or explanations provided by the responder
+- Discussion points that reveal how Bundesliga is using technology
+5. Language handling:
+- Even though the input content is in English, you must extract keywords in English and provide all summaries in English
+- Translate any technical terms appropriately into English
+- Ensure the English summaries are natural and fluent
+Output Format:
+{sample_schema}
+* keyword1 should relate to the main discussion topic
+* keyword2 should relate to a key question from the interviewer
+* keyword3 should relate to an important answer/insight from the responder
+* Summary 1, 2, 3 are the English searches and answers for each keyword. Include a detailed description of at least 2-3 sentences that helps understand the context of the podcast discussion.
+"""
+AWS_CLAUDE_PROMPT = """
+You are a real-time analyst for a YouTube video covering major cloud services introduced at the 2024 AWS re:Invent event.
+The video features a host (Speaker 0) and AWS Heroes (Speakers 1, 2, 3).
+Identify the ongoing topics in the conversation and summarize the statements made by each AWS Hero.
+Read the IMPORTANT section carefully, and ensure all summaries are in English.
+* IMPORTANT:
+1. The input data is a script converted from real-time speech, so typos may occur.
+    - Interpret typos that make sense in context with the correct meaning
+    - Exclude content that doesn't make sense
+2. Speaker information may not be accurate, so:
+    - Determine the actual speaker based on the context and flow of the conversation
+    - Check continuity with previous statements
+    - Use distinctive speech patterns of the host and heroes
+3. Focus points for summary:
+    - Clearly identify the changing topics in real-time
+    - Summarize the key technologies of AWS services mentioned by each hero
+    - If a hero consistently mentions a specific service, output only that hero's statements
+    - Use the following format for each hero:
+        - • Hero Name (Company Name, Job Title)
+    - Understand the intent of statements even from inaccurate text
+Here is the video conversation content:
+{stt_data}
+1. What is the current topic of discussion?
+2. Summarize the main statements about AWS services made by each hero.
+"""
+AWS_SEARCH_PROMPT = """
+You are a Specialized AWS Cloud Services Analysis Assistant.
+Your tasks:
+1. Read the Input script which was extracted from 2024 AWS re:Invent event videos.
+2. Extract exactly 3 most significant elements from the script, including:
+   - AWS cloud services and product names (e.g., EC2, S3, Lambda)
+   - Cloud computing technologies and concepts
+   - New features or service announcements
+   - AWS Heroes or presenters' names
+   - Cloud architecture patterns or best practices
+   - Security or cost optimization strategies
+3. For each extracted element:
+   - The data you enter is scripted data transcribed from real-time speech and may contain typos. Process typos that make sense in context (e.g., "lambda" might be "Lambda").
+   - Correct technical terminology when transcription errors occur due to English-Korean pronunciation differences
+   - Search for relevant technical background information
+   - Provide comprehensive summaries in English. All summaries MUST be provided in English only.
+   - Focus on technical context and cloud computing significance
+   - Each element should be one word or short phrase, preferably the official AWS service name or technical term.
+4. Priority should be given to:
+   - Newly announced AWS services or features
+   - Frequently mentioned cloud architectures or services
+   - Technical terms or cloud concepts that need explanation
+   - Key AWS Heroes or AWS leadership mentioned
+   - Case studies or demonstrations highlighted in the content
+   - Differentiated AWS technologies or approaches
+Output Format:
+{sample_schema}
+* keyword1, 2, 3 are the main AWS-related keywords pulled from the script data.
+* Summary 1, 2, 3 are the searches and answers for each keyword. Include a detailed technical description of at least 2-3 sentences in English, explaining the service functionality and cloud computing context.
+"""

realtime_video_analysis.py ADDED Viewed

	@@ -0,0 +1,146 @@

+import nest_asyncio
+import asyncio
+import aiofiles
+from amazon_transcribe.client import TranscribeStreamingClient
+from amazon_transcribe.handlers import TranscriptResultStreamHandler
+from amazon_transcribe.model import TranscriptEvent
+import logging
+# Enable support for nested asyncio event loops
+nest_asyncio.apply()
+# Set up logging
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
+logger = logging.getLogger('transcription_system')
+class TranscriptHandler(TranscriptResultStreamHandler):
+    """Handler class for processing Amazon Transcribe events"""
+    def __init__(self, output_stream, output_file_path="./transcribe_texts"):
+        super().__init__(output_stream)
+        self.results = []
+        self.processing_complete = False
+        self.output_file_path = output_file_path
+        # Initialize the output file at the start
+        with open(self.output_file_path, 'w', encoding="utf-8") as f:
+            f.write("")
+    async def handle_transcript_event(self, transcript_event: TranscriptEvent):
+        results = transcript_event.transcript.results
+        for result in results:
+            if not result.is_partial and result.channel_id == 'ch_1':
+                for alt in result.alternatives:
+                    start_time = None
+                    end_time = None
+                    current_speaker = None
+                    utterance = []
+                    for item in alt.items:
+                        if start_time is None:
+                            start_time = item.start_time
+                        if item.speaker is not None:
+                            if current_speaker is None:
+                                current_speaker = item.speaker
+                            if current_speaker != item.speaker:
+                                transcript = f"Speaker {current_speaker}: {''.join(utterance).strip()}"
+                                print(f"\n{transcript}")
+                                self.results.append(transcript)
+                                self._append_to_file(transcript)
+                                current_speaker = item.speaker
+                                start_time = item.start_time
+                                utterance = []
+                        if item.item_type == 'pronunciation' and utterance:
+                            utterance.append(' ')
+                        utterance.append(item.content)
+                        end_time = item.end_time
+                    # Output the last utterance
+                    if utterance:
+                        transcript = f"Speaker {current_speaker}: {''.join(utterance).strip()}"
+                        print(f"\n{transcript}")
+                        self.results.append(transcript)
+                        self._append_to_file(transcript)
+    def _append_to_file(self, transcript):
+        """Append STT script to file"""
+        try:
+            with open(self.output_file_path, 'a', encoding='utf-8') as f:
+                f.write(transcript + "\n")
+        except Exception as e:
+            logger.error(f"Error occurred while writing to file: {str(e)}")
+    def set_complete(self):
+        """Indicate that transcription processing is complete"""
+        self.processing_complete = True
+        try:
+            with open(self.output_file_path, 'a', encoding="utf-8") as f:
+                f.write("\n----STT work complete---\n")
+        except Exception as e:
+            logger.error(f"Error occurred while writing completion marker: {str(e)}")
+async def process_audio_file(file_path, region="ap-northeast-2", sample_rate=32000, language=None, content_type=None):
+    """Asynchronous function to process audio files and generate transcripts"""
+    logger.info(f"Starting transcription for file '{file_path}'")
+    if language is None:
+        if content_type == "Bundesliga Fan Experience" or "bundesliga" in file_path.lower():
+            language = "en-US"
+            logger.info("English content detected: changing language setting to 'en-us'")
+        else:
+            language = "ko-KR"
+            logger.info("Default language setting: 'ko-KR'")
+    client = TranscribeStreamingClient(region=region)
+    stream = await client.start_stream_transcription(
+        language_code=language,
+        media_sample_rate_hz=sample_rate,
+        media_encoding="pcm",
+        enable_partial_results_stabilization=True,
+        partial_results_stability="high",
+        show_speaker_label=True,
+        enable_channel_identification=True,
+        number_of_channels=2
+    )
+    handler = TranscriptHandler(stream.output_stream)
+    async def write_chunks():
+        try:
+            async with aiofiles.open(file_path, 'rb') as afp:
+                # Skip WAV header
+                await afp.seek(44)
+                while True:
+                    chunk = await afp.read(1024*16)
+                    if not chunk:
+                        break
+                    await stream.input_stream.send_audio_event(audio_chunk=chunk)
+                    await asyncio.sleep(0.125)
+            await stream.input_stream.end_stream()
+        except Exception as e:
+            logger.error(f"Error occurred while writing chunks: {str(e)}")
+    await asyncio.gather(write_chunks(), handler.handle_events())
+    handler.set_complete()
+    logger.info(f"Transcription complete: {len(handler.results)} utterance segments processed")
+    return handler
+def run_transcription(file_path, content_type=None):
+    """Synchronous wrapper function to run in ThreadPoolExecutor"""
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+    try:
+        handler = loop.run_until_complete(process_audio_file(file_path, content_type=content_type))
+        return handler  # Return the handler object itself
+    finally:
+        loop.close()

requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+boto3==1.38.1
+botocore==1.38.1
+amazon-transcribe
+aiofiles
+asyncio
+nest-asyncio
+streamlit
+pillow
+gradio
+python-dotenv
+google-genai
+huggingface-hub

run_backend.py ADDED Viewed

	@@ -0,0 +1,184 @@

+import concurrent.futures
+import os
+import time
+import logging
+import threading
+from realtime_video_analysis import run_transcription
+from analyze_claude import analyze_with_claude
+from google_search import grounding_with_google_search
+# Set up logging
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
+logger = logging.getLogger('backend_system')
+analysis_results = []
+search_results = []
+def periodic_claude_analysis(party, output_file_path="./transcribe_texts"):
+    """
+    Function to read and analyze transcripts from a file
+    - Wait 30 seconds before the first analysis
+    - Analyze at 10-second intervals thereafter
+    """
+    logger.info("Starting Claude analysis task")
+    analysis_count = 0
+    while not os.path.exists(output_file_path):
+        logger.info("Waiting for transcription file to be created...")
+        time.sleep(2)
+    logger.info("Waiting 30 seconds for initial transcription collection...")
+    time.sleep(30)
+    logger.info("Wait complete, starting analysis")
+    while True:
+        try:
+            if not os.path.exists(output_file_path):
+                logger.warning("Transcription file is missing. Waiting...")
+                time.sleep(5)
+                continue
+            with open(output_file_path, "r", encoding="utf-8") as f:
+                current_content = f.read()
+            if current_content.strip():
+                analysis_count += 1
+                logger.info(f"Starting analysis #{analysis_count}: Read content from file")
+                try:
+                    analysis_result = analyze_with_claude(current_content, party)
+                    print("\n" + "="*50)
+                    print(f"Analysis result #{analysis_count} - {time.strftime('%Y-%m-%d %H:%M:%S')}")
+                    print("="*50)
+                    print(analysis_result)
+                    print("="*50 + "\n")
+                    analysis_results.append(analysis_result)
+                except Exception as e:
+                    logger.error(f"Error occurred during Claude summarization: {str(e)}")
+            else:
+                logger.info("No content in file. Waiting...")
+            if "----STT work complete---" in current_content:
+                break
+        except Exception as e:
+            logger.error(f"Error occurred while reading file: {str(e)}")
+        time.sleep(10)
+    logger.info("Claude analysis task complete")
+def periodic_google_search(party, output_file_path="./transcribe_texts"):
+    """
+    Function to read entire transcript from a file and perform keyword extraction and search with Gemini
+    - Wait 30 seconds before the first search
+    - Search at 10-second intervals thereafter
+    """
+    logger.info("Starting Google search task")
+    search_count = 0
+    # Wait until the file is created
+    while not os.path.exists(output_file_path):
+        logger.info("Waiting for transcription file...")
+        time.sleep(2)
+    # Initial 30-second wait
+    logger.info("Waiting 30 seconds for initial transcription collection...")
+    time.sleep(30)
+    logger.info("Wait complete, starting Google search")
+    # If the file exists, read and search periodically
+    while True:
+        try:
+            # Check if the file exists
+            if not os.path.exists(output_file_path):
+                logger.warning("Transcription file is missing. Waiting...")
+                time.sleep(5)
+                continue
+            # Read entire file content
+            with open(output_file_path, 'r', encoding='utf-8') as f:
+                content = f.read()
+                all_lines = content.splitlines()
+            # Use only the last 5 lines of the entire file content for google_search
+            last_lines = all_lines[-5:] if len(all_lines) >= 5 else all_lines
+            current_content = "".join(last_lines).strip()
+            # Log content (for debugging)
+            logger.info(f"Starting Google search #{search_count}: Analyzing last 5 lines in STT file")
+            # If there is content, perform search
+            if current_content:
+                search_count += 1
+                logger.info(f"Google Search #{search_count} Start: Analyzing last 5 lines in STT file")
+                try:
+                    # Keyword extraction and search with Gemini
+                    search_result = grounding_with_google_search(current_content, party)
+                    # Output search results
+                    print("\n" + "="*50)
+                    print(f"Google Search Result #{search_count} - {time.strftime('%Y-%m-%d %H:%M:%S')}")
+                    print("="*50)
+                    print(search_result)
+                    print("="*50 + "\n")
+                    search_results.append(search_result)
+                except Exception as e:
+                    logger.error(f"Error occurred during Google search: {str(e)}")
+            else:
+                logger.info("No content in file. Waiting...")
+            # Check if the completion marker is present
+            if "----STT work complete---" in current_content:
+                logger.info("Completion marker detected. Google search complete.")
+                break
+        except Exception as e:
+            logger.error(f"Error occurred while reading file: {str(e)}")
+        # Wait 10 seconds
+        time.sleep(10)
+    logger.info("Google search task complete")
+def main(party=None):
+    """Main function - run parallel tasks"""
+    # Select audio file based on the button clicked
+    if party == "더불어민주당":
+        audio_file = None
+    elif party == "Agents for Amazon Bedrock":
+        audio_file = './data/summit_sungwoo.wav'
+    elif party == "Bundesliga Fan Experience":
+        audio_file = './data/aws_bundesliga.wav'
+    elif party == "AWS_2024_recap":
+        audio_file = './data/aws.wav'
+    else:  # Default or "국민의힘"
+        audio_file = None
+        party = "국민의힘"
+    output_file_path = './transcribe_texts'
+    logger.info("Backend system started")
+    # Run parallel tasks using ThreadPoolExecutor
+    with concurrent.futures.ThreadPoolExecutor() as executor:
+        # Submit two tasks simultaneously
+        task1 = executor.submit(run_transcription, audio_file, party)
+        task2 = executor.submit(periodic_claude_analysis, party, output_file_path)
+        task3 = executor.submit(periodic_google_search, party, output_file_path)
+        # Wait for both tasks to complete
+        task1.result()
+        task2.result()
+        task3.result()
+    logger.info("All tasks complete")
+    return "Analysis complete"
+if __name__ == "__main__":
+    results = main()

transcribe_texts ADDED Viewed

File without changes