A newer version of the Gradio SDK is available: 6.17.3
RAG Implementation Log
Progress Tracking
Implementation of the RAG system based on PLAN.md β COMPLETE β
2026-03-26 β Full Implementation
Phase 1: Environment & Dependencies β
- Created
requirements.txtwith all necessary Python packages - Created
config.pywith:- Document source paths (lectures, datasheets, app notes, source code)
- Ollama configuration (base URL, model selection)
- Chunking parameters (size, overlap for code and prose)
- ChromaDB persistence settings
Phase 2: Document Ingestion β
Implemented
ingest/code_loader.py- Loads instructor solution source files (.cpp, .h, .c)
- Skips student templates (noise reduction)
- Adds source headers for citation tracking
Implemented
ingest/pptx_extract.py- Extracts text from PowerPoint slides
- One document per slide for granular retrieval
- Preserves slide numbers for citations
Implemented
ingest/pdf_ocr.py- Fast path: pdfplumber for native-text PDFs
- Fallback: PaddleOCR for image-heavy/scanned PDFs
- Sparsity detection to choose best extraction method
- Page-level granularity for citations
Implemented
ingest/chunker.py- Overlapping text chunks (langchain RecursiveCharacterTextSplitter)
- Different strategies for code vs prose
- Code separators: function/class boundaries
- Chunk metadata includes source, page, assignment info
Phase 3: Embedding & Vector Store β
Implemented
vectorstore/embedder.py- Calls Ollama
/api/embeddingsendpoint - Wraps nomic-embed-text model (768-dim vectors)
- Includes error handling with zero-vector fallback
- Calls Ollama
Implemented
vectorstore/store.py- ChromaDB persistent client management
- Custom OllamaEmbeddingFunction class for integration
add_documents()β store chunks with embeddingsquery()β retrieve top-k similar chunks- Cosine similarity metric for document retrieval
Phase 4: Query Pipeline β
Implemented
query/retriever.py- Simple wrapper around vector store queries
- Configurable top-k retrieval (default 5)
Implemented
query/prompt_builder.py- System prompt guides LLM to use context only
- Formats retrieved chunks with source citations
- Builds structured messages for Ollama chat API
Implemented
query/generator.py- Calls Ollama
/api/chatendpoint - Handles errors gracefully
- Returns response text directly
- Calls Ollama
Phase 5: CLI Scripts β
Implemented
scripts/ingest_all.py- Orchestrates full pipeline: extraction β chunking β embedding β storage
- Walks all document directories recursively
- Separates code vs prose for appropriate chunking
--dry-runflag for OCR quality testing- Prints summary statistics per category
Implemented
scripts/query_cli.py- Interactive loop for asking questions
- Shows retrieved chunks on
--verboseflag - Displays source citations with each answer
- Clean formatting for terminal output
Implemented
scripts/launch_ui.py- Gradio web interface on localhost:7860
- Text input for questions
- Toggle to show/hide retrieved sources
- User-friendly markdown output for answers
Phase 6: Testing β
Implemented
tests/test_ingest.py- Verifies code loader finds instructor files
- Checks that student directories are skipped
- Tests chunking respects size bounds
- Code chunks use appropriate larger sizes
Implemented
tests/test_retrieval.py- Tests ChromaDB collection initialization
- Validates add_documents and query interface
- Checks retrieval respects top-k parameter
- Tests retrieve function structure
Implemented
tests/test_end_to_end.py- Full pipeline interface tests
- Prompt building with context validation
- Generation interface verification
- Graceful skipping when Ollama unavailable
Documentation & Configuration β
Created
.gitignoreto exclude:- Virtual environment
- ChromaDB persistent storage
- Cache and build artifacts
Created
README.mdwith:- Quick start guide
- Installation instructions
- Configuration options
- Example queries
- Architecture diagram
- Troubleshooting guide
- Known limitations
Implementation Statistics
- Total Python files: 17
- Total lines of code: ~1400
- Phases completed: 6/6 β
Directory Structure (Final)
rag/
βββ README.md # User guide
βββ PLAN.md # Architecture plan
βββ LOG.md # This file
βββ requirements.txt # Python dependencies
βββ config.py # Centralized configuration
βββ .gitignore # Git exclusions
βββ ingest/ # Document extraction
β βββ __init__.py
β βββ code_loader.py
β βββ pptx_extract.py
β βββ pdf_ocr.py
β βββ chunker.py
βββ vectorstore/ # Vector storage
β βββ __init__.py
β βββ embedder.py
β βββ store.py
βββ query/ # Query pipeline
β βββ __init__.py
β βββ retriever.py
β βββ prompt_builder.py
β βββ generator.py
βββ scripts/ # CLI tools
β βββ ingest_all.py
β βββ query_cli.py
β βββ launch_ui.py
βββ tests/ # Test suite
β βββ __init__.py
β βββ test_ingest.py
β βββ test_retrieval.py
β βββ test_end_to_end.py
βββ chroma_db/ # Vector storage (gitignored)
βββ [ChromaDB data]
Next Steps for Usage
- Install dependencies:
pip install -r requirements.txt - Ensure Ollama is running:
ollama serve - Ingest documents:
python scripts/ingest_all.py - Query:
- CLI:
python scripts/query_cli.py - Web UI:
python scripts/launch_ui.py
- CLI:
Key Design Decisions
- Pdfplumber + PaddleOCR fallback β Fast for native PDFs, handles scanned documents
- ChromaDB β Embedded vector store, no server needed, persistent on disk
- Ollama local inference β Privacy-respecting, no API costs, full control
- Instructor-only code indexing β Reduces noise, focuses on solutions
- Page/slide-level granularity β Precise citations, better UX
- Separate code chunking strategy β Respects function boundaries
- Modular architecture β Each component independently testable
Implementation Status: READY FOR TESTING β
All core functionality implemented. System is ready for:
- Installing dependencies
- Running ingestion pipeline
- Testing with CLI and web UI
- Integration into course workflow