anuragbb commited on
Commit
ea56bfd
Β·
verified Β·
1 Parent(s): 79eb3c9

Update knowledge-base/Projects/VideoComplianceEngine.md

Browse files
knowledge-base/Projects/VideoComplianceEngine.md CHANGED
@@ -1,99 +1,21 @@
1
- # Video Compliance AI πŸŽ¬βš–οΈ
2
 
3
- Video Compliance AI is an end-to-end agentic system that automatically audits YouTube videos for brand and regulatory compliance. Give it a YouTube URL and it returns a detailed legal audit report β€” flagging FTC violations, misleading claims, missing disclosures (#ad, #sponsored), and more. No human needs to watch the video.
4
 
5
- Built on Azure with LangGraph agent orchestration, Azure Video Indexer for multimodal ingestion, Azure AI Search as the RAG knowledge base, and deployed as a production FastAPI service with full telemetry.
6
 
7
- ## 🎯 Problem It Solves
8
 
9
- * Brand teams manually review hundreds of influencer videos for regulatory compliance β€” slow, inconsistent, and expensive
10
- * FTC regulations require specific disclosures in exact locations β€” missing them is a legal risk
11
- * This system automates the full pipeline: ingest a video URL β†’ extract speech + on-screen text β†’ retrieve relevant rules β†’ LLM reasons over evidence β†’ structured audit report
12
 
13
- ## πŸ—οΈ System Architecture β€” 5 Layers
14
 
15
- ### Layer 1: Data Ingestion (`video_indexer.py`)
16
- * **yt-dlp** downloads the YouTube video to a local `.mp4` file
17
- * **Azure Video Indexer (AVI)** receives the file and performs **Speech-to-Text** (transcript) and **OCR** (on-screen text recognition) simultaneously β€” this is multimodal ingestion
18
- * AVI returns structured JSON with timestamped transcript lines and OCR frames
19
- * The service polls AVI every 30 seconds until processing completes β€” the **polling pattern for async jobs**
20
- * Authentication uses **Azure DefaultAzureCredential** β€” two-step: ARM token β†’ VI account token
21
 
22
- ### Layer 2: Knowledge Base (`index_documents.py` + Azure AI Search)
23
- * **PyPDFLoader** reads the FTC Disclosures guide and YouTube Ad Specifications PDFs
24
- * **RecursiveCharacterTextSplitter** cuts them into 1000-character chunks with 200-character overlap
25
- * **AzureOpenAIEmbeddings** (text-embedding-3-small) converts each chunk to a 1536-dimensional vector
26
- * All vectors stored in **Azure AI Search** β€” supports both keyword and semantic (vector) search
27
- * This is a one-time offline step β€” run `index_documents.py` once to populate the knowledge base
28
 
29
- ### Layer 3: Orchestration (`workflow.py` + LangGraph)
30
- * **LangGraph** builds a Directed Acyclic Graph (DAG) β€” a fixed ordered pipeline
31
- * Two nodes: `indexer` (video ingestion) and `auditor` (RAG + LLM analysis)
32
- * All nodes share a single **`VideoAuditState` TypedDict** β€” like a shared whiteboard
33
- * `operator.add` on list fields allows multiple nodes to append violations without overwriting each other
34
 
35
- ### Layer 4: Reasoning (`nodes.py` β€” `audit_content_node`)
36
- * **RAG Retrieval:** Transcript + OCR combined into a query β†’ `similarity_search(k=3)` retrieves top 3 rule chunks from Azure AI Search
37
- * **Prompt Construction:** Retrieved rules injected into the system prompt as context β€” grounds the LLM in the actual rulebook
38
- * **LLM Call:** AzureChatOpenAI at temperature=0 (deterministic, critical for compliance) reasons over evidence + rules
39
- * **Structured Output:** LLM returns strict JSON β€” `compliance_results`, `status`, `final_report`. Regex strips markdown fences before parsing
40
 
41
- ### Layer 5: Deployment & Observability (`server.py` + `telemetry.py`)
42
- * **FastAPI** exposes `POST /audit` with Pydantic models validating both request and response schemas
43
- * **Azure Monitor + OpenTelemetry** auto-instruments every API call β€” captures latency, error rates, dependency traces with zero manual logging
44
- * Telemetry setup is fail-safe: if the connection string is missing, the app still runs normally
45
-
46
-
47
- ## πŸ”‘ Key Technical Decisions
48
-
49
- ### LangGraph State with Reducers
50
- ```python
51
- compliance_results: Annotated[List[ComplianceIssue], operator.add]
52
- ```
53
- `operator.add` tells LangGraph to append new items rather than overwrite β€” safe for concurrent node writes.
54
-
55
- ### Two-Step Azure Authentication
56
- 1. `DefaultAzureCredential` β†’ ARM (Azure Resource Manager) token β†’ proves Azure identity
57
- 2. Exchange ARM token β†’ Video Indexer account token with Contributor permissions
58
- In development: uses Azure CLI auth. In production: switches automatically to Managed Identity β€” zero code change.
59
-
60
- ### RAG Query Strategy
61
- ```python
62
- query_text = f"{transcript} {' '.join(ocr_text)}"
63
- docs = vector_store.similarity_search(query_text, k=3)
64
- ```
65
- Transcript and OCR are combined because a violation might be spoken, shown on screen, or both.
66
-
67
- ### Deterministic Compliance via Temperature=0
68
- For compliance auditing, `temperature=0` ensures the same video always produces the same findings β€” non-negotiable for legal reproducibility.
69
-
70
- ### Defensive JSON Parsing
71
- ```python
72
- if "```" in content:
73
- content = re.search(r"```(?:json)?(.*?)```", content, re.DOTALL).group(1)
74
- ```
75
- Even with explicit JSON instructions, LLMs sometimes wrap responses in markdown fences. This regex strips them before `json.loads()`.
76
-
77
- ## πŸ“Š Evaluation Metrics
78
-
79
-
80
- * Precision : Of all violations flagged, what % are real? (TP / (TP + FP))
81
- * Recall : Of all real violations, what % were caught? (TP / (TP + FN))
82
- * F1 Score: Harmonic mean of precision and recall
83
- * RAG Retrieval Accuracy : Did top-3 chunks contain the relevant compliance rule?
84
- * Schema Conformance Rate : % of LLM responses that parse successfully
85
- * Latency P95 : 95th percentile end-to-end response time (target < 60s)
86
-
87
- ## βš™οΈ Tech Stack
88
-
89
- * **Agent Orchestration:** LangGraph (DAG-style stateful workflow)
90
- * **LLM:** Azure OpenAI (GPT-4) via AzureChatOpenAI
91
- * **Embeddings:** AzureOpenAIEmbeddings β€” text-embedding-3-small (1536 dimensions)
92
- * **Vector Store:** Azure AI Search (semantic + keyword search)
93
- * **Video Processing:** Azure Video Indexer (Speech-to-Text + OCR)
94
- * **Video Download:** yt-dlp
95
- * **API Framework:** FastAPI + uvicorn + Pydantic
96
- * **Observability:** Azure Monitor + OpenTelemetry (auto-instrumented)
97
- * **Authentication:** Azure DefaultAzureCredential
98
- * **PDF Loading:** LangChain PyPDFLoader
99
- * **Knowledge Base:** FTC Disclosures 101 PDF + YouTube Ad Specifications PDF
 
1
+ # Video Compliance AI
2
 
3
+ Video Compliance AI is an agentic system that automatically audits YouTube videos for FTC regulatory compliance. Give it a YouTube URL and it returns a structured audit report detailing any violations found β€” missing disclosures like #ad or #sponsored, misleading claims, and unsubstantiated endorsements. No human needs to watch the video.
4
 
5
+ ## What It Does
6
 
7
+ Brand teams and agencies manually review influencer videos for regulatory compliance β€” slow, inconsistent, and expensive. This system automates the full pipeline: download the video, extract speech and on-screen text, retrieve relevant FTC rules, reason over the evidence, and return a structured JSON audit report.
8
 
9
+ ## How It Works
 
 
10
 
11
+ Azure Video Indexer downloads the video via yt-dlp and simultaneously performs speech-to-text transcription and OCR on on-screen text. This multimodal ingestion captures both spoken claims and text overlays. The transcript and OCR output are combined into a query that retrieves the top 3 relevant FTC rule chunks from Azure AI Search, which holds the FTC Disclosures guide and YouTube Ad Specifications as indexed vectors. GPT-4 then reasons over the video content against those retrieved rules and returns a structured compliance verdict.
12
 
13
+ The entire pipeline is orchestrated by a two-node LangGraph DAG β€” an indexer node handles ingestion and an auditor node handles RAG reasoning. Both nodes share a TypedDict state object.
 
 
 
 
 
14
 
15
+ ## What Makes It Interesting
 
 
 
 
 
16
 
17
+ The system is deployed as a production FastAPI service with Azure Monitor and OpenTelemetry auto-instrumentation capturing request latency, error rates, and dependency traces. LangSmith provides per-node agent trace inspection for debugging. Authentication uses DefaultAzureCredential β€” in development it uses Azure CLI, in production it switches automatically to Managed Identity with no code change.
 
 
 
 
18
 
19
+ ## Tech Stack
 
 
 
 
20
 
21
+ Python, LangGraph, LangSmith, Azure Video Indexer, Azure AI Search, Azure OpenAI (GPT-4), Azure Monitor, OpenTelemetry, FastAPI, Pydantic, yt-dlp.