# SPARKNET Implementation Report ## Agentic Document Intelligence Platform **Report Date:** January 2025 **Version:** 0.1.0 --- ## Executive Summary SPARKNET is an enterprise-grade **Agentic Document Intelligence Platform** that follows FAANG best practices for: - **Modular Architecture**: Clean separation of concerns with well-defined interfaces - **Local-First Privacy**: All processing happens locally via Ollama - **Evidence Grounding**: Every extraction includes verifiable source references - **Production-Ready**: Type-safe, tested, configurable, and scalable --- ## 1. What Has Been Implemented ### 1.1 Core Subsystems | Subsystem | Location | Status | Description | |-----------|----------|--------|-------------| | **Document Intelligence** | `src/document_intelligence/` | Complete | Vision-first document understanding | | **Legacy Document Pipeline** | `src/document/` | Complete | OCR, layout, chunking pipeline | | **RAG Subsystem** | `src/rag/` | Complete | Vector search with grounded retrieval | | **Multi-Agent System** | `src/agents/` | Complete | ReAct-style agents with tools | | **LLM Integration** | `src/llm/` | Complete | Ollama client with routing | | **CLI** | `src/cli/` | Complete | Full command-line interface | | **API** | `api/` | Complete | FastAPI REST endpoints | | **Demo UI** | `demo/` | Complete | Streamlit dashboard | ### 1.2 Document Intelligence Module (`src/document_intelligence/`) **Architecture (FAANG-inspired: Google DocAI pattern):** ``` src/document_intelligence/ ├── chunks/ # Core data models (BoundingBox, DocumentChunk, TableChunk) │ ├── models.py # Pydantic models with full type safety │ └── __init__.py ├── io/ # Document loading with caching │ ├── base.py # Abstract interfaces │ ├── pdf.py # PyMuPDF-based PDF loading │ ├── image.py # PIL image loading │ └── cache.py # LRU page caching ├── models/ # ML model interfaces │ ├── base.py # BaseModel, BatchableModel │ ├── ocr.py # OCRModel interface │ ├── layout.py # LayoutModel interface │ ├── table.py # TableModel interface │ └── vlm.py # VisionLanguageModel interface ├── parsing/ # Document parsing pipeline │ ├── parser.py # DocumentParser orchestrator │ └── chunking.py # SemanticChunker ├── grounding/ # Visual evidence │ ├── evidence.py # EvidenceBuilder, EvidenceTracker │ └── crops.py # Image cropping utilities ├── extraction/ # Field extraction │ ├── schema.py # ExtractionSchema, FieldSpec │ ├── extractor.py # FieldExtractor │ └── validator.py # ExtractionValidator ├── tools/ # Agent tools │ ├── document_tools.py # ParseDocumentTool, ExtractFieldsTool, etc. │ └── rag_tools.py # IndexDocumentTool, RetrieveChunksTool, RAGAnswerTool └── agent_adapter.py # EnhancedDocumentAgent integration ``` **Key Features:** - **Zero-Shot Capability**: Works across document formats without training - **Schema-Driven Extraction**: Define fields using JSON Schema or Pydantic - **Abstention Policy**: Never guesses - abstains when confidence is low - **Visual Grounding**: Every extraction includes page, bbox, snippet, confidence ### 1.3 RAG Subsystem (`src/rag/`) **Architecture (FAANG-inspired: Meta FAISS + Google Vertex AI pattern):** ``` src/rag/ ├── store.py # VectorStore interface + ChromaVectorStore ├── embeddings.py # OllamaEmbedding + OpenAIEmbedding (feature-flagged) ├── indexer.py # DocumentIndexer for chunked documents ├── retriever.py # DocumentRetriever with evidence support ├── generator.py # GroundedGenerator with citations ├── docint_bridge.py # Bridge to document_intelligence subsystem └── __init__.py # Clean exports ``` **Key Features:** - **Local-First Embeddings**: Ollama `nomic-embed-text` by default - **Cloud Opt-In**: OpenAI embeddings disabled by default, feature-flagged - **Metadata Filtering**: Filter by document_id, chunk_type, page_range - **Citation Generation**: Answers include `[1]`, `[2]` references - **Confidence-Based Abstention**: Returns "I don't know" when uncertain ### 1.4 Multi-Agent System (`src/agents/`) **Agents Implemented:** | Agent | Purpose | Model | |-------|---------|-------| | `ExecutorAgent` | Task execution with tools | llama3.1:8b | | `DocumentAgent` | ReAct-style document analysis | llama3.1:8b | | `PlannerAgent` | Task decomposition | mistral | | `CriticAgent` | Output validation | phi3 | | `MemoryAgent` | Context management | llama3.2 | | `VisionOCRAgent` | Vision-based OCR | llava (optional) | ### 1.5 CLI Commands ```bash # Document Intelligence sparknet docint parse document.pdf -o result.json sparknet docint extract invoice.pdf --preset invoice sparknet docint ask document.pdf "What is the total?" sparknet docint classify document.pdf # RAG Operations sparknet docint index document.pdf # Index into vector store sparknet docint index-stats # Show index statistics sparknet docint retrieve "payment terms" -k 10 # Semantic search sparknet docint ask doc.pdf "question" --use-rag # RAG-powered Q&A # Legacy Document Commands sparknet document parse invoice.pdf sparknet document extract contract.pdf -f "party_name" sparknet rag index *.pdf --collection my_docs sparknet rag search "query" --top 10 ``` --- ## 2. How to Execute SPARKNET ### 2.1 Prerequisites ```bash # 1. System Requirements # - Python 3.10+ # - NVIDIA GPU with CUDA 12.0+ (optional but recommended) # - 16GB+ RAM # - 50GB+ disk space # 2. Install Ollama (if not installed) curl -fsSL https://ollama.com/install.sh | sh # 3. Start Ollama server ollama serve ``` ### 2.2 Installation ```bash cd /home/mhamdan/SPARKNET # Option A: Use existing virtual environment source sparknet/bin/activate # Option B: Create new environment python3 -m venv sparknet source sparknet/bin/activate # Install dependencies pip install -r requirements.txt pip install -r demo/requirements.txt # Install SPARKNET in development mode pip install -e . ``` ### 2.3 Download Required Models ```bash # Embedding model (required for RAG) ollama pull nomic-embed-text:latest # LLM models (at least one required) ollama pull llama3.2:latest # Fast, 2GB ollama pull llama3.1:8b # General purpose, 5GB ollama pull mistral:latest # Good reasoning, 4GB # Optional: Larger models for complex tasks ollama pull qwen2.5:14b # Complex reasoning, 9GB ``` ### 2.4 Running the Demo UI **Method 1: Using the launcher script** ```bash cd /home/mhamdan/SPARKNET ./run_demo.sh 8501 ``` **Method 2: Direct Streamlit command** ```bash cd /home/mhamdan/SPARKNET source sparknet/bin/activate streamlit run demo/app.py --server.port 8501 ``` **Method 3: Bind to specific IP (for remote access)** ```bash streamlit run demo/app.py \ --server.address 172.24.50.21 \ --server.port 8501 \ --server.headless true ``` **Access at:** http://172.24.50.21:8501 or http://localhost:8501 ### 2.5 Running the API Server ```bash cd /home/mhamdan/SPARKNET source sparknet/bin/activate uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload ``` **API Endpoints:** - `GET /health` - Health check - `POST /api/documents/parse` - Parse document - `POST /api/documents/extract` - Extract fields - `POST /api/rag/index` - Index document - `POST /api/rag/query` - Query RAG ### 2.6 Running Examples ```bash cd /home/mhamdan/SPARKNET source sparknet/bin/activate # Document Intelligence Demo python examples/document_intelligence_demo.py # RAG End-to-End Pipeline python examples/document_rag_end_to_end.py # Simple Agent Task python examples/simple_task.py # Document Agent python examples/document_agent.py ``` ### 2.7 Running Tests ```bash cd /home/mhamdan/SPARKNET source sparknet/bin/activate # Run all tests pytest tests/ -v # Run specific test suites pytest tests/unit/test_document_intelligence.py -v pytest tests/unit/test_rag_integration.py -v # Run with coverage pytest tests/ --cov=src --cov-report=html ``` --- ## 3. Configuration ### 3.1 RAG Configuration (`configs/rag.yaml`) ```yaml vector_store: type: chroma chroma: persist_directory: "./.sparknet/chroma_db" collection_name: "sparknet_documents" distance_metric: cosine embeddings: provider: ollama # Local-first ollama: model: nomic-embed-text base_url: "http://localhost:11434" openai: enabled: false # Disabled by default generator: provider: ollama ollama: model: llama3.2 abstain_on_low_confidence: true abstain_threshold: 0.3 ``` ### 3.2 Document Configuration (`config/document.yaml`) ```yaml ocr: engine: paddleocr # or tesseract languages: ["en"] confidence_threshold: 0.5 layout: enabled: true reading_order: true chunking: min_chunk_chars: 10 max_chunk_chars: 4000 target_chunk_chars: 500 ``` --- ## 4. FAANG Best Practices Applied ### 4.1 Google-Inspired Patterns - **DocAI Architecture**: Modular vision-first document understanding - **Structured Output**: Schema-driven extraction with validation - **Abstention Policy**: Never hallucinate, return "I don't know" ### 4.2 Meta-Inspired Patterns - **FAISS Integration**: Fast similarity search (optional alongside ChromaDB) - **RAG Pipeline**: Retrieve-then-generate with citations ### 4.3 Amazon-Inspired Patterns - **Textract-like API**: Structured field extraction with confidence scores - **Evidence Grounding**: Every output traceable to source ### 4.4 Microsoft-Inspired Patterns - **Form Recognizer Pattern**: Pre-built schemas for invoices, contracts - **Confidence Thresholds**: Configurable abstention levels ### 4.5 Apple-Inspired Patterns - **Privacy-First**: All processing local by default - **Opt-In Cloud**: OpenAI and cloud services disabled by default --- ## 5. Quick Start Commands ```bash # === SETUP === cd /home/mhamdan/SPARKNET source sparknet/bin/activate ollama serve & # Start in background # === DEMO UI === streamlit run demo/app.py --server.port 8501 # === CLI USAGE === # Parse a document python -m src.cli.main docint parse Dataset/IBM*.pdf -o result.json # Index for RAG python -m src.cli.main docint index Dataset/*.pdf # Ask questions with RAG python -m src.cli.main docint ask Dataset/IBM*.pdf "What is this document about?" --use-rag # === PYTHON API === python -c " from src.document_intelligence import DocumentParser parser = DocumentParser() result = parser.parse('Dataset/IBM N_A.pdf') print(f'Parsed {len(result.chunks)} chunks') " # === RUN TESTS === pytest tests/unit/ -v ``` --- ## 6. Troubleshooting ### Issue: Ollama not running ```bash # Check status curl http://localhost:11434/api/tags # Start Ollama ollama serve # If port in use pkill ollama && ollama serve ``` ### Issue: Missing models ```bash ollama list # See installed models ollama pull nomic-embed-text # Install embedding model ollama pull llama3.2 # Install LLM ``` ### Issue: ChromaDB errors ```bash # Reset vector store rm -rf .sparknet/chroma_db ``` ### Issue: Import errors ```bash # Ensure in correct directory cd /home/mhamdan/SPARKNET # Ensure venv activated source sparknet/bin/activate # Reinstall pip install -e . ``` --- ## 7. Architecture Diagram ``` ┌─────────────────────────────────────────────────────────────────┐ │ SPARKNET Platform │ ├─────────────────────────────────────────────────────────────────┤ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Streamlit │ │ FastAPI │ │ CLI │ Interfaces │ │ │ Demo │ │ API │ │ Commands │ │ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ ├─────────┴────────────────┴────────────────┴─────────────────────┤ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ Agent Layer │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │ Document │ │ Executor │ │ Planner │ │ Critic │ │ │ │ │ │ Agent │ │ Agent │ │ Agent │ │ Agent │ │ │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ └───────┴────────────┴────────────┴────────────┴───────────┘ │ │ │ │ ┌────────────────────┐ ┌─────────────────────────────────┐ │ │ │ Document Intel │ │ RAG Subsystem │ │ │ │ ┌───────┐ ┌──────┐ │ │ ┌─────────┐ ┌─────────────────┐ │ │ │ │ │Parser │ │Extract│ │ │ │Indexer │ │ Retriever │ │ │ │ │ └───────┘ └──────┘ │ │ └─────────┘ └─────────────────┘ │ │ │ │ ┌───────┐ ┌──────┐ │ │ ┌─────────┐ ┌─────────────────┐ │ │ │ │ │Ground │ │Valid │ │ │ │Embedder │ │ Generator │ │ │ │ │ └───────┘ └──────┘ │ │ └─────────┘ └─────────────────┘ │ │ │ └────────────────────┘ └─────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Infrastructure │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │ Ollama │ │ ChromaDB │ │ GPU │ │ Cache │ │ │ │ │ │ Client │ │ Store │ │ Manager │ │ Layer │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ └─────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## 8. Files Modified/Created in Recent Session | File | Action | Description | |------|--------|-------------| | `src/rag/docint_bridge.py` | Created | Bridge between document_intelligence and RAG | | `src/document_intelligence/tools/rag_tools.py` | Created | RAG tools for agents | | `src/document_intelligence/tools/__init__.py` | Modified | Added RAG tool exports | | `src/document_intelligence/tools/document_tools.py` | Modified | Enhanced AnswerQuestionTool with RAG | | `src/cli/docint.py` | Modified | Added index, retrieve, delete-index commands | | `src/rag/__init__.py` | Modified | Added bridge exports | | `configs/rag.yaml` | Created | RAG configuration file | | `tests/unit/test_rag_integration.py` | Created | RAG integration tests | | `examples/document_rag_end_to_end.py` | Created | End-to-end RAG example | --- **Report Complete** For questions or issues, refer to the troubleshooting section above or check the test files for usage examples.