Update knowledge-base/Projects/VideoComplianceEngine.md
Browse files
knowledge-base/Projects/VideoComplianceEngine.md
CHANGED
|
@@ -1,99 +1,21 @@
|
|
| 1 |
-
# Video Compliance AI
|
| 2 |
|
| 3 |
-
Video Compliance AI is an
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
| 10 |
-
* FTC regulations require specific disclosures in exact locations β missing them is a legal risk
|
| 11 |
-
* This system automates the full pipeline: ingest a video URL β extract speech + on-screen text β retrieve relevant rules β LLM reasons over evidence β structured audit report
|
| 12 |
|
| 13 |
-
|
| 14 |
|
| 15 |
-
|
| 16 |
-
* **yt-dlp** downloads the YouTube video to a local `.mp4` file
|
| 17 |
-
* **Azure Video Indexer (AVI)** receives the file and performs **Speech-to-Text** (transcript) and **OCR** (on-screen text recognition) simultaneously β this is multimodal ingestion
|
| 18 |
-
* AVI returns structured JSON with timestamped transcript lines and OCR frames
|
| 19 |
-
* The service polls AVI every 30 seconds until processing completes β the **polling pattern for async jobs**
|
| 20 |
-
* Authentication uses **Azure DefaultAzureCredential** β two-step: ARM token β VI account token
|
| 21 |
|
| 22 |
-
##
|
| 23 |
-
* **PyPDFLoader** reads the FTC Disclosures guide and YouTube Ad Specifications PDFs
|
| 24 |
-
* **RecursiveCharacterTextSplitter** cuts them into 1000-character chunks with 200-character overlap
|
| 25 |
-
* **AzureOpenAIEmbeddings** (text-embedding-3-small) converts each chunk to a 1536-dimensional vector
|
| 26 |
-
* All vectors stored in **Azure AI Search** β supports both keyword and semantic (vector) search
|
| 27 |
-
* This is a one-time offline step β run `index_documents.py` once to populate the knowledge base
|
| 28 |
|
| 29 |
-
|
| 30 |
-
* **LangGraph** builds a Directed Acyclic Graph (DAG) β a fixed ordered pipeline
|
| 31 |
-
* Two nodes: `indexer` (video ingestion) and `auditor` (RAG + LLM analysis)
|
| 32 |
-
* All nodes share a single **`VideoAuditState` TypedDict** β like a shared whiteboard
|
| 33 |
-
* `operator.add` on list fields allows multiple nodes to append violations without overwriting each other
|
| 34 |
|
| 35 |
-
##
|
| 36 |
-
* **RAG Retrieval:** Transcript + OCR combined into a query β `similarity_search(k=3)` retrieves top 3 rule chunks from Azure AI Search
|
| 37 |
-
* **Prompt Construction:** Retrieved rules injected into the system prompt as context β grounds the LLM in the actual rulebook
|
| 38 |
-
* **LLM Call:** AzureChatOpenAI at temperature=0 (deterministic, critical for compliance) reasons over evidence + rules
|
| 39 |
-
* **Structured Output:** LLM returns strict JSON β `compliance_results`, `status`, `final_report`. Regex strips markdown fences before parsing
|
| 40 |
|
| 41 |
-
|
| 42 |
-
* **FastAPI** exposes `POST /audit` with Pydantic models validating both request and response schemas
|
| 43 |
-
* **Azure Monitor + OpenTelemetry** auto-instruments every API call β captures latency, error rates, dependency traces with zero manual logging
|
| 44 |
-
* Telemetry setup is fail-safe: if the connection string is missing, the app still runs normally
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
## π Key Technical Decisions
|
| 48 |
-
|
| 49 |
-
### LangGraph State with Reducers
|
| 50 |
-
```python
|
| 51 |
-
compliance_results: Annotated[List[ComplianceIssue], operator.add]
|
| 52 |
-
```
|
| 53 |
-
`operator.add` tells LangGraph to append new items rather than overwrite β safe for concurrent node writes.
|
| 54 |
-
|
| 55 |
-
### Two-Step Azure Authentication
|
| 56 |
-
1. `DefaultAzureCredential` β ARM (Azure Resource Manager) token β proves Azure identity
|
| 57 |
-
2. Exchange ARM token β Video Indexer account token with Contributor permissions
|
| 58 |
-
In development: uses Azure CLI auth. In production: switches automatically to Managed Identity β zero code change.
|
| 59 |
-
|
| 60 |
-
### RAG Query Strategy
|
| 61 |
-
```python
|
| 62 |
-
query_text = f"{transcript} {' '.join(ocr_text)}"
|
| 63 |
-
docs = vector_store.similarity_search(query_text, k=3)
|
| 64 |
-
```
|
| 65 |
-
Transcript and OCR are combined because a violation might be spoken, shown on screen, or both.
|
| 66 |
-
|
| 67 |
-
### Deterministic Compliance via Temperature=0
|
| 68 |
-
For compliance auditing, `temperature=0` ensures the same video always produces the same findings β non-negotiable for legal reproducibility.
|
| 69 |
-
|
| 70 |
-
### Defensive JSON Parsing
|
| 71 |
-
```python
|
| 72 |
-
if "```" in content:
|
| 73 |
-
content = re.search(r"```(?:json)?(.*?)```", content, re.DOTALL).group(1)
|
| 74 |
-
```
|
| 75 |
-
Even with explicit JSON instructions, LLMs sometimes wrap responses in markdown fences. This regex strips them before `json.loads()`.
|
| 76 |
-
|
| 77 |
-
## π Evaluation Metrics
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
* Precision : Of all violations flagged, what % are real? (TP / (TP + FP))
|
| 81 |
-
* Recall : Of all real violations, what % were caught? (TP / (TP + FN))
|
| 82 |
-
* F1 Score: Harmonic mean of precision and recall
|
| 83 |
-
* RAG Retrieval Accuracy : Did top-3 chunks contain the relevant compliance rule?
|
| 84 |
-
* Schema Conformance Rate : % of LLM responses that parse successfully
|
| 85 |
-
* Latency P95 : 95th percentile end-to-end response time (target < 60s)
|
| 86 |
-
|
| 87 |
-
## βοΈ Tech Stack
|
| 88 |
-
|
| 89 |
-
* **Agent Orchestration:** LangGraph (DAG-style stateful workflow)
|
| 90 |
-
* **LLM:** Azure OpenAI (GPT-4) via AzureChatOpenAI
|
| 91 |
-
* **Embeddings:** AzureOpenAIEmbeddings β text-embedding-3-small (1536 dimensions)
|
| 92 |
-
* **Vector Store:** Azure AI Search (semantic + keyword search)
|
| 93 |
-
* **Video Processing:** Azure Video Indexer (Speech-to-Text + OCR)
|
| 94 |
-
* **Video Download:** yt-dlp
|
| 95 |
-
* **API Framework:** FastAPI + uvicorn + Pydantic
|
| 96 |
-
* **Observability:** Azure Monitor + OpenTelemetry (auto-instrumented)
|
| 97 |
-
* **Authentication:** Azure DefaultAzureCredential
|
| 98 |
-
* **PDF Loading:** LangChain PyPDFLoader
|
| 99 |
-
* **Knowledge Base:** FTC Disclosures 101 PDF + YouTube Ad Specifications PDF
|
|
|
|
| 1 |
+
# Video Compliance AI
|
| 2 |
|
| 3 |
+
Video Compliance AI is an agentic system that automatically audits YouTube videos for FTC regulatory compliance. Give it a YouTube URL and it returns a structured audit report detailing any violations found β missing disclosures like #ad or #sponsored, misleading claims, and unsubstantiated endorsements. No human needs to watch the video.
|
| 4 |
|
| 5 |
+
## What It Does
|
| 6 |
|
| 7 |
+
Brand teams and agencies manually review influencer videos for regulatory compliance β slow, inconsistent, and expensive. This system automates the full pipeline: download the video, extract speech and on-screen text, retrieve relevant FTC rules, reason over the evidence, and return a structured JSON audit report.
|
| 8 |
|
| 9 |
+
## How It Works
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
Azure Video Indexer downloads the video via yt-dlp and simultaneously performs speech-to-text transcription and OCR on on-screen text. This multimodal ingestion captures both spoken claims and text overlays. The transcript and OCR output are combined into a query that retrieves the top 3 relevant FTC rule chunks from Azure AI Search, which holds the FTC Disclosures guide and YouTube Ad Specifications as indexed vectors. GPT-4 then reasons over the video content against those retrieved rules and returns a structured compliance verdict.
|
| 12 |
|
| 13 |
+
The entire pipeline is orchestrated by a two-node LangGraph DAG β an indexer node handles ingestion and an auditor node handles RAG reasoning. Both nodes share a TypedDict state object.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
+
## What Makes It Interesting
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
+
The system is deployed as a production FastAPI service with Azure Monitor and OpenTelemetry auto-instrumentation capturing request latency, error rates, and dependency traces. LangSmith provides per-node agent trace inspection for debugging. Authentication uses DefaultAzureCredential β in development it uses Azure CLI, in production it switches automatically to Managed Identity with no code change.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
+
## Tech Stack
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
+
Python, LangGraph, LangSmith, Azure Video Indexer, Azure AI Search, Azure OpenAI (GPT-4), Azure Monitor, OpenTelemetry, FastAPI, Pydantic, yt-dlp.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|