# CS-UY 4613: Project

Yufei Zhen

macOS: Ventura 13.3.1 (a), GPU: Apple M2 Max

## Setup

* video source: [https://www.youtube.com/@pantelism](https://www.youtube.com/@pantelism)

* **option 1** (repository source: [https://github.com/PacktPublishing/LLM-Engineers-Handbook](https://github.com/PacktPublishing/LLM-Engineers-Handbook))



In [None]:
# !git clone https://github.com/PacktPublishing/LLM-Engineers-Handbook.git

Cloning into 'LLM-Engineers-Handbook'...
remote: Enumerating objects: 1970, done.[K
remote: Counting objects: 100% (515/515), done.[K
remote: Compressing objects: 100% (138/138), done.[K
remote: Total 1970 (delta 414), reused 377 (delta 377), pack-reused 1455 (from 2)[K
Receiving objects: 100% (1970/1970), 4.77 MiB | 21.22 MiB/s, done.
Resolving deltas: 100% (1263/1263), done.


In [None]:
# !poetry env use 3.11
# !poetry install --without aws
# !poetry run pre-commit install

In [1]:
import torch
print(f"MPS available: {torch.backends.mps.is_available()}")
print(f"CUDA available: {torch.cuda.is_available()}")

MPS available: True
CUDA available: False


## RAG Architecture

- Integrating into [https://github.com/PacktPublishing/LLM-Engineers-Handbook/tree/main/llm_engineering/application/rag](https://github.com/PacktPublishing/LLM-Engineers-Handbook/tree/main/llm_engineering/application/rag):

- Directory overview: 

```
.
├── ... 
├── clips/               # Generated video clip responses
├── llm_engineering/     # Core project package
│   ├── application/
│   │   ├── ...
│   │   ├── rag          # Main RAG architecture
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── multimodel_dispatcher.py (new)
│   │   │   ├── pipeline.py (new)
│   │   │   ├── prompt_templates.py
│   │   │   ├── query_expansion.py
│   │   │   ├── reranking.py
│   │   │   ├── retriever.py (modified)
│   │   │   ├── self_query.py
│   │   │   ├── topic_retriever.py (new)
│   │   │   ├── video_ingetser.py (new)
│   │   │   ├── video_processor.py (new)
│   ├── domain/
│   │   ├── ...
│   │   ├── queries.py (modified)
│   │   ├── video_chunks.py (new)
├── demonstration.ipynb (YOU'RE HERE)
```

## Video Ingestion

In [1]:
video_db = "/Users/yufeizhen/Desktop/project/videos"

In [2]:
from llm_engineering.application.rag.video_ingester import VideoIngester

ingester = VideoIngester(video_root=video_db)
# ingester.process_video_library(force_reprocess=True)
ingester.process_video_library()

[32m2025-05-04 03:25:21.777[0m | [1mINFO    [0m | [36mllm_engineering.settings[0m:[36mload_settings[0m:[36m94[0m - [1mLoading settings from the ZenML secret store.[0m
[32m2025-05-04 03:25:22.015[0m | [1mINFO    [0m | [36mllm_engineering.infrastructure.db.mongo[0m:[36m__new__[0m:[36m20[0m - [1mConnection to MongoDB with URI successful: mongodb://llm_engineering:llm_engineering@127.0.0.1:27017[0m


[1;35mPyTorch version 2.2.2 available.[0m


[32m2025-05-04 03:25:23.410[0m | [1mINFO    [0m | [36mllm_engineering.infrastructure.db.qdrant[0m:[36m__new__[0m:[36m29[0m - [1mConnection to Qdrant DB with URI successful: str[0m


[1;35mLoad pretrained SentenceTransformer: all-MiniLM-L6-v2[0m
Initializing fallback TextEmbedder
[1;35mLoad pretrained SentenceTransformer: all-MiniLM-L6-v2[0m
Loading CLIP model: openai/clip-vit-base-patch32
CLIP model loaded successfully
Initialized embedders
Loaded NLP model
Loaded BERTopic
Processing videos from: /Users/yufeizhen/Desktop/project/videos
Already processed 8 videos
Previously processed videos:
  - 9CGGh6ivg68
  - FCQ-rih6cHY
  - TV-DjM8242s
  - WXoOohWU28Y
  - eFgkZKhNUdM
  - eQ6UE968Xe4
  - lb_5AdUpfuA
  - rCVlIVKqqGE
Found 8 video folders
Will process 0 videos (8 skipped)
Skipping TV-DjM8242s (already processed)
Skipping eFgkZKhNUdM (already processed)
Skipping eQ6UE968Xe4 (already processed)
Skipping rCVlIVKqqGE (already processed)
Skipping lb_5AdUpfuA (already processed)
Skipping FCQ-rih6cHY (already processed)
Skipping 9CGGh6ivg68 (already processed)
Skipping WXoOohWU28Y (already processed)

All videos processed!
Total processed videos: 8


In [3]:
from qdrant_client import QdrantClient

client = QdrantClient(path="/Users/yufeizhen/Desktop/project/qdrant_storage")
print("Total stored vectors:", client.count("video_chunks").count)

Total stored vectors: 403


## Video Q&A

In [3]:
from llm_engineering.application.rag.pipeline import VideoQAEngine

engine = VideoQAEngine(video_root=video_db)

def respond(question):
    clips = engine.ask(question)
    return [(str(clip["path"]), f"Relevance: {clip['score']:.2f}") for clip in clips]

Initializing VideoQAEngine
Video root: /Users/yufeizhen/Desktop/project/videos
Qdrant storage path: /Users/yufeizhen/Desktop/project/qdrant_storage
Connected to Qdrant storage at: /Users/yufeizhen/Desktop/project/qdrant_storage
Available collections: collections=[CollectionDescription(name='video_chunks')]
Found video_chunks collection with 403 points
Initializing fallback TextEmbedder
[1;35mLoad pretrained SentenceTransformer: all-MiniLM-L6-v2[0m
Loading CLIP model: openai/clip-vit-base-patch32
CLIP model loaded successfully
VideoQAEngine initialized successfully


In [4]:
question = "Using only the videos, explain the the binary cross entropy loss function."

In [5]:
respond(question)


--- Processing query: 'Using only the videos, explain the the binary cross entropy loss function.' ---
Retrieving relevant video segments...
Encoding query with CLIP: 'Using only the videos, explain the the binary cros...'
Cleaned text for CLIP: Using only the videos, explain the the binary cros...
Query embedded successfully
Sending search request to Qdrant (attempt 1/5)
Creating fresh connection to Qdrant...
Search successful, found 3 results
Retrieval completed in 0.07 seconds
Found 3 relevant video segments

Processing result 1/3:
  Video ID: eFgkZKhNUdM
  Timestamps: 1270.0s - 1302.0s
  Score: 0.8472
  Found alternative video path: /Users/yufeizhen/Desktop/project/videos/eFgkZKhNUdM/eFgkZKhNUdM.mp4
  Creating clip to: clips/clip_eFgkZKhNUdM_1270_0.847.mp4
  Clip created successfully

Processing result 2/3:
  Video ID: eFgkZKhNUdM
  Timestamps: 642.0s - 647.0s
  Score: 0.8467
  Found alternative video path: /Users/yufeizhen/Desktop/project/videos/eFgkZKhNUdM/eFgkZKhNUdM.mp4
  Crea

[('clips/clip_eFgkZKhNUdM_1270_0.847.mp4', 'Relevance: 0.85'),
 ('clips/clip_eFgkZKhNUdM_642_0.847.mp4', 'Relevance: 0.85'),
 ('clips/clip_eFgkZKhNUdM_874_0.838.mp4', 'Relevance: 0.84')]

## Gradio App

In [4]:
import gradio as gr

interface = gr.Interface(
    fn=respond,
    inputs=gr.Textbox(label="Ask about the video content"),
    outputs=gr.Gallery(label="Relevant Video Clips"),
    examples=[
        ["Using only the videos, explain how ResNets work."],
        ["Using only the videos, explain the advantages of CNNs over fully connected networks."],
        ["Using only the videos, explain the the binary cross entropy loss function."]
    ]
)

[1;35mHTTP Request: GET [0m[34mhttps://api.gradio.app/pkg-version[1;35m "HTTP/1.1 200 OK"[0m


In [5]:
interface.launch(share=True)

* Running on local URL:  http://127.0.0.1:7860
[1;35mHTTP Request: GET [0m[34mhttp://127.0.0.1:7860/gradio_api/startup-events[1;35m "HTTP/1.1 200 OK"[0m
[1;35mHTTP Request: HEAD [0m[34mhttp://127.0.0.1:7860/[1;35m "HTTP/1.1 200 OK"[0m
[1;35mHTTP Request: GET [0m[34mhttps://api.gradio.app/v3/tunnel-request[1;35m "HTTP/1.1 200 OK"[0m
* Running on public URL: https://382d4d0bacff86ee02.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
[1;35mHTTP Request: HEAD [0m[34mhttps://382d4d0bacff86ee02.gradio.live[1;35m "HTTP/1.1 200 OK"[0m





--- Processing query: 'Using only the videos, explain the the binary cross entropy loss function.' ---
Retrieving relevant video segments...
Encoding query with CLIP: 'Using only the videos, explain the the binary cros...'
Cleaned text for CLIP: Using only the videos, explain the the binary cross entropy loss function....
Cleaned text for CLIP: Using only the videos, explain the the binary cros...
Query embedded successfully
Sending search request to Qdrant (attempt 1/5)
Search successful, found 3 results
Retrieval completed in 0.34 seconds
Found 3 relevant video segments

Processing result 1/3:
  Video ID: eFgkZKhNUdM
  Timestamps: 1270.0s - 1302.0s
  Score: 0.8472
  Found alternative video path: /Users/yufeizhen/Desktop/project/videos/eFgkZKhNUdM/eFgkZKhNUdM.mp4
  Creating clip to: clips/clip_eFgkZKhNUdM_1270_0.847.mp4
  Clip created successfully

Processing result 2/3:
  Video ID: eFgkZKhNUdM
  Timestamps: 642.0s - 647.0s
  Score: 0.8467
  Found alternative video path: /Users/yufei

In [9]:
import gradio as gr
from llm_engineering.application.rag.pipeline import VideoQAEngine

# Initialize the VideoQAEngine with the video database
video_db = "/Users/yufeizhen/Desktop/project/videos"
engine = VideoQAEngine(video_root=video_db)

# Define the chat function that processes messages and returns relevant video clips
def chat(message, history):
    # Process message to get relevant clips
    clips = engine.ask(message)
    
    # Format for display
    clips_gallery = [(str(clip["path"]), "Relevance: {:.2f}".format(clip['score'])) for clip in clips]
    
    # Return both a text response and the clips
    return "Here are the relevant video clips for: '{}'".format(message), clips_gallery

# Create a more flexible interface using Blocks
with gr.Blocks(theme="soft") as demo:
    gr.Markdown("# Chat with your Video Library")
    gr.Markdown("Ask questions about the video content and get relevant clips. You can continue the conversation with follow-up questions.")
    
    # Create chatbot for conversation history
    chatbot = gr.Chatbot(height=300)
    
    # Create gallery to display video clips
    gallery = gr.Gallery(label="Relevant Video Clips", show_label=True)
    
    # Create message input
    msg = gr.Textbox(
        placeholder="Ask about the video content...", 
        label="Your Question",
        show_label=False
    )
    
    # Define clear button
    clear = gr.Button("Clear")
    
    # Example questions
    gr.Examples(
        examples=[
            "Using only the videos, explain how ResNets work.",
            "Using only the videos, explain the advantages of CNNs over fully connected networks.",
            "Using only the videos, explain the the binary cross entropy loss function."
        ],
        inputs=msg
    )
    
    # Define the chat function that updates both chatbot and gallery
    def respond(message, chat_history):
        # Get text response and clips
        response, clips = chat(message, chat_history)
        
        # Update chat history
        chat_history.append((message, response))
        
        # Return updated chat history and gallery
        return "", chat_history, clips
    
    # Set up the event handlers
    msg.submit(respond, [msg, chatbot], [msg, chatbot, gallery])
    clear.click(lambda: ([], [], None), None, [chatbot, gallery, msg])

Initializing VideoQAEngine
Video root: /Users/yufeizhen/Desktop/project/videos
Qdrant storage path: /Users/yufeizhen/Desktop/project/qdrant_storage
Connected to Qdrant storage at: /Users/yufeizhen/Desktop/project/qdrant_storage
Available collections: collections=[CollectionDescription(name='video_chunks')]
Found video_chunks collection with 403 points
Initializing fallback TextEmbedder
[1;35mLoad pretrained SentenceTransformer: all-MiniLM-L6-v2[0m
Loading CLIP model: openai/clip-vit-base-patch32
CLIP model loaded successfully
VideoQAEngine initialized successfully


[1;35mHTTP Request: GET [0m[34mhttps://api.gradio.app/pkg-version[1;35m "HTTP/1.1 200 OK"[0m


In [10]:
demo.launch(share=True)

* Running on local URL:  http://127.0.0.1:7861
[1;35mHTTP Request: GET [0m[34mhttp://127.0.0.1:7861/gradio_api/startup-events[1;35m "HTTP/1.1 200 OK"[0m
[1;35mHTTP Request: HEAD [0m[34mhttp://127.0.0.1:7861/[1;35m "HTTP/1.1 200 OK"[0m
[1;35mHTTP Request: GET [0m[34mhttps://api.gradio.app/v3/tunnel-request[1;35m "HTTP/1.1 200 OK"[0m
* Running on public URL: https://48d861a2319613eb9b.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
[1;35mHTTP Request: HEAD [0m[34mhttps://48d861a2319613eb9b.gradio.live[1;35m "HTTP/1.1 200 OK"[0m





--- Processing query: 'Using only the videos, explain the the binary cross entropy loss function.' ---
Retrieving relevant video segments...
Encoding query with CLIP: 'Using only the videos, explain the the binary cros...'
Cleaned text for CLIP: Using only the videos, explain the the binary cros...
Query embedded successfully
Sending search request to Qdrant (attempt 1/5)
Creating fresh connection to Qdrant...
Search successful, found 3 results
Retrieval completed in 0.07 seconds
Found 3 relevant video segments

Processing result 1/3:
  Video ID: eFgkZKhNUdM
  Timestamps: 1270.0s - 1302.0s
  Score: 0.8472
  Found alternative video path: /Users/yufeizhen/Desktop/project/videos/eFgkZKhNUdM/eFgkZKhNUdM.mp4
  Creating clip to: clips/clip_eFgkZKhNUdM_1270_0.847.mp4
