Spaces:

AnshulPrasad
/

transcript-rag-summarizer

Sleeping

App Files Files Community

Anshul Prasad commited on Feb 27

Commit

66fc578

1 Parent(s): 9820e6d

update.

Browse files

Files changed (8) hide show

.gitattributes +1 -0
.github/workflows/main.yml +1 -1
.github/workflows/space-keepalive.yml +1 -1
README.md +6 -16
config.py +4 -4
frontend/index.html +3 -3
pyproject.toml +1 -1
uv.lock +30 -30

.gitattributes CHANGED Viewed

@@ -1,2 +1,3 @@
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.faiss filter=lfs diff=lfs merge=lfs -text

 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.faiss filter=lfs diff=lfs merge=lfs -text
+*.webp filter=lfs diff=lfs merge=lfs -text

.github/workflows/main.yml CHANGED Viewed

@@ -16,4 +16,4 @@ jobs:
       - name: Push to hub
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
-        run: git push https://AnshulPrasad:$HF_TOKEN@huggingface.co/spaces/AnshulPrasad/Acharya_Prashant main

       - name: Push to hub
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: git push https://AnshulPrasad:$HF_TOKEN@huggingface.co/spaces/AnshulPrasad/transcript-rag-summarizer main

.github/workflows/space-keepalive.yml CHANGED Viewed

@@ -12,7 +12,7 @@ jobs:
       - name: Ping Space (3 retries)
         run: |
           for i in 1 2 3; do
-            curl -fsS -o /dev/null -L 'https://huggingface.co/spaces/AnshulPrasad/Acharya_Prashant' && exit 0
             sleep 15
           done
           echo "All attempts failed" >&2

       - name: Ping Space (3 retries)
         run: |
           for i in 1 2 3; do
+            curl -fsS -o /dev/null -L 'https://huggingface.co/spaces/AnshulPrasad/transcript-rag-summarizer' && exit 0
             sleep 15
           done
           echo "All attempts failed" >&2

README.md CHANGED Viewed

@@ -1,16 +1,6 @@
----
-title: ask Acharya Prashant
-emoji: 📚
-colorFrom: indigo
-colorTo: blue
-sdk: docker
-app_file: app.py
-pinned: false
----
-# Acharya Prashant RAG Assistant
-A retrieval-augmented question-answering (RAG) system built on Acharya Prashant's YouTube subtitles.
 The project provides:
 - A FastAPI backend (`/ask`) for question answering.
@@ -39,7 +29,7 @@ The project provides:
 2. Query is embedded using `all-MiniLM-L6-v2`.
 3. Top-K transcript chunks are retrieved from the FAISS index.
 4. Retrieved context is token-trimmed (`MAX_CONTEXT_TOKENS`).
-5. Groq chat completion API generates the final answer using a system prompt aligned to Acharya Prashant's tone.
 Core runtime flow:
 - `app.py` loads `data/file_paths.pkl` and `data/transcripts.pkl` at startup.
@@ -148,13 +138,13 @@ Open `http://localhost:7860`.
 Build:
 ```bash
-docker build -t acharya-prashant-rag .
 ```
 Run:
 ```bash
-docker run --rm -p 7860:7860 -e GROQ_API_KEY="your_groq_api_key" acharya-prashant-rag
 ```
 ## API Reference
@@ -195,7 +185,7 @@ curl -X POST "http://localhost:7860/ask" \
 `main.py` includes stages for data preparation and querying.
 Pipeline stages:
-1. Download subtitles from channels (`utils/download_vtt.py`)
 2. Convert `.vtt` to cleaned `.txt` (`utils/vtt_to_txt.py`, `utils/preprocess.py`)
 3. Load and persist transcript corpus (`data/*.pkl`)
 4. Create FAISS index (`api/embed_transcripts.py`)

+# RAG Q&A Assistant
+A retrieval-augmented question-answering (RAG) system built on curated YouTube subtitle transcripts.
 The project provides:
 - A FastAPI backend (`/ask`) for question answering.
 2. Query is embedded using `all-MiniLM-L6-v2`.
 3. Top-K transcript chunks are retrieved from the FAISS index.
 4. Retrieved context is token-trimmed (`MAX_CONTEXT_TOKENS`).
+5. Groq chat completion API generates the final answer using a domain-aligned system prompt.
 Core runtime flow:
 - `app.py` loads `data/file_paths.pkl` and `data/transcripts.pkl` at startup.
 Build:
 ```bash
+docker build -t rag-qa-assistant .
 ```
 Run:
 ```bash
+docker run --rm -p 7860:7860 -e GROQ_API_KEY="your_groq_api_key" rag-qa-assistant
 ```
 ## API Reference
 `main.py` includes stages for data preparation and querying.
 Pipeline stages:
+1. Download subtitles from configured channels (`utils/download_vtt.py`)
 2. Convert `.vtt` to cleaned `.txt` (`utils/vtt_to_txt.py`, `utils/preprocess.py`)
 3. Load and persist transcript corpus (`data/*.pkl`)
 4. Create FAISS index (`api/embed_transcripts.py`)

config.py CHANGED Viewed

@@ -2,8 +2,8 @@ import os
 from pathlib import Path
 CHANNEL_URLS = [
-    "https://www.youtube.com/@AcharyaPrashant",
-    "https://www.youtube.com/@ShriPrashant",
 ]
 VTT_DIR = Path("data/subtitles_vtt")
@@ -18,11 +18,11 @@ GROQ_API_KEY = os.environ.get("GROQ_API_KEY")
 MODEL = "llama-3.1-8b-instant"
 MAX_CONTEXT_TOKENS = 4500
 SYSTEM_PROMPT = """
-You are speaking as Acharya Prashant.
 Your role is to explain questions related to life, self-knowledge, suffering,
 fear, desire, relationships, and meaning from the perspective of Advaita Vedanta
-and the Upanishadic tradition, as taught by Acharya Prashant.
 Guidelines:
 - Speak in a calm, direct, and uncompromising tone.

 from pathlib import Path
 CHANNEL_URLS = [
+    "https://www.youtube.com/@CHANNEL_ID_1",
+    "https://www.youtube.com/@CHANNEL_ID_2",
 ]
 VTT_DIR = Path("data/subtitles_vtt")
 MODEL = "llama-3.1-8b-instant"
 MAX_CONTEXT_TOKENS = 4500
 SYSTEM_PROMPT = """
+You are speaking as Spiritual Guru.
 Your role is to explain questions related to life, self-knowledge, suffering,
 fear, desire, relationships, and meaning from the perspective of Advaita Vedanta
+and the Upanishadic tradition, as taught by Spiritual Guru.
 Guidelines:
 - Speak in a calm, direct, and uncompromising tone.

frontend/index.html CHANGED Viewed

@@ -2,7 +2,7 @@
 <html lang="en">
 <head>
   <meta charset="UTF-8">
-  <title>Ask Acharya Prashant</title>
   <meta name="viewport" content="width=device-width, initial-scale=1">
   <!-- Markdown renderer -->
   <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
@@ -125,8 +125,8 @@
   <!-- HERO / BANNER -->
   <header class="hero">
-    <img src="assets/images/Acharya_Prashant.webp" alt="Acharya Prashant">
-    <h1>Ask Acharya Prashant</h1>
   </header>
   <!-- Q&A CARD -->

 <html lang="en">
 <head>
   <meta charset="UTF-8">
+  <title>Ask Assistant</title>
   <meta name="viewport" content="width=device-width, initial-scale=1">
   <!-- Markdown renderer -->
   <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
   <!-- HERO / BANNER -->
   <header class="hero">
+    <img src="assets/images/image1.webp" alt="Assistant">
+    <h1>Ask Assistant</h1>
   </header>
   <!-- Q&A CARD -->

pyproject.toml CHANGED Viewed

@@ -1,5 +1,5 @@
 [project]
-name = "acharya-prashant"
 version = "0.1.0"
 description = "Add your description here"
 readme = "README.md"

 [project]
+name = "transcript-rag-summarizer"
 version = "0.1.0"
 description = "Add your description here"
 readme = "README.md"

uv.lock CHANGED Viewed

@@ -8,36 +8,6 @@ resolution-markers = [
     "python_full_version < '3.12' and sys_platform == 'darwin'",
 ]
-[[package]]
-name = "acharya-prashant"
-version = "0.1.0"
-source = { virtual = "." }
-dependencies = [
-    { name = "faiss-cpu" },
-    { name = "fastapi" },
-    { name = "groq" },
-    { name = "pytz" },
-    { name = "sentence-transformers" },
-    { name = "tiktoken" },
-    { name = "torch", version = "2.10.0", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "sys_platform == 'darwin'" },
-    { name = "torch", version = "2.10.0+cpu", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "sys_platform != 'darwin'" },
-    { name = "transformers" },
-    { name = "uvicorn" },
-]
-[package.metadata]
-requires-dist = [
-    { name = "faiss-cpu", specifier = "==1.9.0" },
-    { name = "fastapi", specifier = "==0.116.1" },
-    { name = "groq", specifier = ">=1.0.0" },
-    { name = "pytz", specifier = "==2025.2" },
-    { name = "sentence-transformers", specifier = "==3.0.1" },
-    { name = "tiktoken", specifier = ">=0.12.0" },
-    { name = "torch", specifier = ">=2.10.0", index = "https://download.pytorch.org/whl/cpu" },
-    { name = "transformers", specifier = "==4.57.1" },
-    { name = "uvicorn", specifier = "==0.38.0" },
-]
 [[package]]
 name = "annotated-types"
 version = "0.7.0"
@@ -1311,6 +1281,36 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" },
 ]
 [[package]]
 name = "transformers"
 version = "4.57.1"

     "python_full_version < '3.12' and sys_platform == 'darwin'",
 ]
 [[package]]
 name = "annotated-types"
 version = "0.7.0"
     { url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" },
 ]
+[[package]]
+name = "transcript-rag-summarizer"
+version = "0.1.0"
+source = { virtual = "." }
+dependencies = [
+    { name = "faiss-cpu" },
+    { name = "fastapi" },
+    { name = "groq" },
+    { name = "pytz" },
+    { name = "sentence-transformers" },
+    { name = "tiktoken" },
+    { name = "torch", version = "2.10.0", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "sys_platform == 'darwin'" },
+    { name = "torch", version = "2.10.0+cpu", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "sys_platform != 'darwin'" },
+    { name = "transformers" },
+    { name = "uvicorn" },
+]
+[package.metadata]
+requires-dist = [
+    { name = "faiss-cpu", specifier = "==1.9.0" },
+    { name = "fastapi", specifier = "==0.116.1" },
+    { name = "groq", specifier = ">=1.0.0" },
+    { name = "pytz", specifier = "==2025.2" },
+    { name = "sentence-transformers", specifier = "==3.0.1" },
+    { name = "tiktoken", specifier = ">=0.12.0" },
+    { name = "torch", specifier = ">=2.10.0", index = "https://download.pytorch.org/whl/cpu" },
+    { name = "transformers", specifier = "==4.57.1" },
+    { name = "uvicorn", specifier = "==0.38.0" },
+]
 [[package]]
 name = "transformers"
 version = "4.57.1"