Spaces:
Sleeping
Sleeping
| title: SCDM Chatbot App | |
| emoji: 🚀 | |
| colorFrom: indigo | |
| colorTo: pink | |
| sdk: streamlit | |
| sdk_version: 1.36.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| ## SCDM Chatbot (Streamlit + LangChain + Groq) | |
| ChatGPT-like assistant for SCDM content. It answers questions, summarizes, and generates quizzes over PDFs in `data/pdf/`, always showing clear, human-readable sources (document title, page, and a clickable link from `data/source_links.json`). | |
| ### Features | |
| - Q&A with retrieval-augmented generation (RAG) and readable citations | |
| - Summarization (single or multi-document context) | |
| - Quiz generation (MCQs with answers, explanations, and citations) | |
| - “Auto” intent routing (classifies input to Q&A / Summarize / Quiz) | |
| - Clean source display: full paragraph block quotes, with title + page + link | |
| ### Requirements | |
| - Python 3.10–3.12 recommended | |
| - A Groq API key (`GROQ_API_KEY`) | |
| - macOS/Linux/Windows (CPU only; no GPU required) | |
| ### Quickstart | |
| 1) Create a virtual environment | |
| ```bash | |
| python3 -m venv .venv | |
| source .venv/bin/activate # Windows: .venv\Scripts\activate | |
| ``` | |
| 2) Install dependencies | |
| ```bash | |
| pip install --upgrade pip | |
| pip install -r requirements.txt | |
| ``` | |
| 3) Configure environment | |
| ```bash | |
| cp .env.example .env | |
| # Edit .env and set: GROQ_API_KEY=your_key_here | |
| ``` | |
| 4) Build the index (extracts paragraphs with page metadata and embeds them) | |
| ```bash | |
| python ingest.py | |
| ``` | |
| 5) Run the app | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| ### Usage | |
| - Select a model in the sidebar (default: `llama-3.3-70b-versatile`; also available: `llama-3.1-8b-instant`). | |
| - Choose a mode: Auto, Q&A, Summarize, or Quiz. Auto attempts to classify your intent. | |
| - Ask things like: | |
| - “Tell me about CDM to CDS” | |
| - “Summarize the key QbD responsibilities for CDS and cite sources.” | |
| - “Create a 5-question quiz on RBQM with citations.” | |
| - Sources appear below each answer as expanders with: | |
| - Document title and page number | |
| - Clickable URL like `...pdf#page=10` | |
| - Full paragraph block quotes for readability | |
| ### Adding/Updating Documents | |
| 1) Place PDFs in `data/pdf/`. | |
| 2) Add/update entries in `data/source_links.json` with the PDF file name → public link mapping. | |
| 3) Rebuild the index: | |
| ```bash | |
| python ingest.py | |
| ``` | |
| ### Project Structure | |
| ``` | |
| scdm_chatbot/ | |
| app.py # Streamlit UI and chains (Q&A, Summarize, Quiz) | |
| ingest.py # PDF → paragraph extraction → FAISS index | |
| requirements.txt # Python dependencies | |
| .env.example # Env var template (GROQ_API_KEY) | |
| data/ | |
| pdf/ # Input PDFs | |
| source_links.json # File name → source URL mapping | |
| index/ # Generated FAISS index and manifest | |
| user_requirements.txt # Problem statement and expected use cases | |
| ``` | |
| ### Troubleshooting | |
| - Groq error mentioning `reasoning_format` or `Completions.create`: update packages | |
| ```bash | |
| pip install --upgrade groq langchain-groq langchain | |
| ``` | |
| - `Vector index not found`: run ingestion | |
| ```bash | |
| python ingest.py | |
| ``` | |
| - `GROQ_API_KEY is not set`: configure `.env` or export the variable | |
| ```bash | |
| export GROQ_API_KEY=your_key_here | |
| ``` | |
| - PDF parsing issues: ensure files are valid PDFs; the app uses PyMuPDF to extract text and split into paragraphs with page numbers. | |
| ### Notes on Citations | |
| - The app displays sources as human-readable cards with full paragraphs to avoid broken chunks. | |
| - Citations include title, page (e.g., “(Title, p. 10)”), and a clickable link derived from `data/source_links.json`. | |
| ### Commands Cheat Sheet | |
| ```bash | |
| # Setup | |
| python3 -m venv .venv && source .venv/bin/activate | |
| pip install -r requirements.txt | |
| cp .env.example .env # set GROQ_API_KEY | |
| # Index and run | |
| python ingest.py | |
| streamlit run app.py | |
| # Update core libs if needed | |
| pip install --upgrade groq langchain-groq langchain | |
| ``` | |