| # SPARKNET Implementation Report | |
| ## Agentic Document Intelligence Platform | |
| **Report Date:** January 2025 | |
| **Version:** 0.1.0 | |
| --- | |
| ## Executive Summary | |
| SPARKNET is an enterprise-grade **Agentic Document Intelligence Platform** that follows FAANG best practices for: | |
| - **Modular Architecture**: Clean separation of concerns with well-defined interfaces | |
| - **Local-First Privacy**: All processing happens locally via Ollama | |
| - **Evidence Grounding**: Every extraction includes verifiable source references | |
| - **Production-Ready**: Type-safe, tested, configurable, and scalable | |
| --- | |
| ## 1. What Has Been Implemented | |
| ### 1.1 Core Subsystems | |
| | Subsystem | Location | Status | Description | | |
| |-----------|----------|--------|-------------| | |
| | **Document Intelligence** | `src/document_intelligence/` | Complete | Vision-first document understanding | | |
| | **Legacy Document Pipeline** | `src/document/` | Complete | OCR, layout, chunking pipeline | | |
| | **RAG Subsystem** | `src/rag/` | Complete | Vector search with grounded retrieval | | |
| | **Multi-Agent System** | `src/agents/` | Complete | ReAct-style agents with tools | | |
| | **LLM Integration** | `src/llm/` | Complete | Ollama client with routing | | |
| | **CLI** | `src/cli/` | Complete | Full command-line interface | | |
| | **API** | `api/` | Complete | FastAPI REST endpoints | | |
| | **Demo UI** | `demo/` | Complete | Streamlit dashboard | | |
| ### 1.2 Document Intelligence Module (`src/document_intelligence/`) | |
| **Architecture (FAANG-inspired: Google DocAI pattern):** | |
| ``` | |
| src/document_intelligence/ | |
| βββ chunks/ # Core data models (BoundingBox, DocumentChunk, TableChunk) | |
| β βββ models.py # Pydantic models with full type safety | |
| β βββ __init__.py | |
| βββ io/ # Document loading with caching | |
| β βββ base.py # Abstract interfaces | |
| β βββ pdf.py # PyMuPDF-based PDF loading | |
| β βββ image.py # PIL image loading | |
| β βββ cache.py # LRU page caching | |
| βββ models/ # ML model interfaces | |
| β βββ base.py # BaseModel, BatchableModel | |
| β βββ ocr.py # OCRModel interface | |
| β βββ layout.py # LayoutModel interface | |
| β βββ table.py # TableModel interface | |
| β βββ vlm.py # VisionLanguageModel interface | |
| βββ parsing/ # Document parsing pipeline | |
| β βββ parser.py # DocumentParser orchestrator | |
| β βββ chunking.py # SemanticChunker | |
| βββ grounding/ # Visual evidence | |
| β βββ evidence.py # EvidenceBuilder, EvidenceTracker | |
| β βββ crops.py # Image cropping utilities | |
| βββ extraction/ # Field extraction | |
| β βββ schema.py # ExtractionSchema, FieldSpec | |
| β βββ extractor.py # FieldExtractor | |
| β βββ validator.py # ExtractionValidator | |
| βββ tools/ # Agent tools | |
| β βββ document_tools.py # ParseDocumentTool, ExtractFieldsTool, etc. | |
| β βββ rag_tools.py # IndexDocumentTool, RetrieveChunksTool, RAGAnswerTool | |
| βββ agent_adapter.py # EnhancedDocumentAgent integration | |
| ``` | |
| **Key Features:** | |
| - **Zero-Shot Capability**: Works across document formats without training | |
| - **Schema-Driven Extraction**: Define fields using JSON Schema or Pydantic | |
| - **Abstention Policy**: Never guesses - abstains when confidence is low | |
| - **Visual Grounding**: Every extraction includes page, bbox, snippet, confidence | |
| ### 1.3 RAG Subsystem (`src/rag/`) | |
| **Architecture (FAANG-inspired: Meta FAISS + Google Vertex AI pattern):** | |
| ``` | |
| src/rag/ | |
| βββ store.py # VectorStore interface + ChromaVectorStore | |
| βββ embeddings.py # OllamaEmbedding + OpenAIEmbedding (feature-flagged) | |
| βββ indexer.py # DocumentIndexer for chunked documents | |
| βββ retriever.py # DocumentRetriever with evidence support | |
| βββ generator.py # GroundedGenerator with citations | |
| βββ docint_bridge.py # Bridge to document_intelligence subsystem | |
| βββ __init__.py # Clean exports | |
| ``` | |
| **Key Features:** | |
| - **Local-First Embeddings**: Ollama `nomic-embed-text` by default | |
| - **Cloud Opt-In**: OpenAI embeddings disabled by default, feature-flagged | |
| - **Metadata Filtering**: Filter by document_id, chunk_type, page_range | |
| - **Citation Generation**: Answers include `[1]`, `[2]` references | |
| - **Confidence-Based Abstention**: Returns "I don't know" when uncertain | |
| ### 1.4 Multi-Agent System (`src/agents/`) | |
| **Agents Implemented:** | |
| | Agent | Purpose | Model | | |
| |-------|---------|-------| | |
| | `ExecutorAgent` | Task execution with tools | llama3.1:8b | | |
| | `DocumentAgent` | ReAct-style document analysis | llama3.1:8b | | |
| | `PlannerAgent` | Task decomposition | mistral | | |
| | `CriticAgent` | Output validation | phi3 | | |
| | `MemoryAgent` | Context management | llama3.2 | | |
| | `VisionOCRAgent` | Vision-based OCR | llava (optional) | | |
| ### 1.5 CLI Commands | |
| ```bash | |
| # Document Intelligence | |
| sparknet docint parse document.pdf -o result.json | |
| sparknet docint extract invoice.pdf --preset invoice | |
| sparknet docint ask document.pdf "What is the total?" | |
| sparknet docint classify document.pdf | |
| # RAG Operations | |
| sparknet docint index document.pdf # Index into vector store | |
| sparknet docint index-stats # Show index statistics | |
| sparknet docint retrieve "payment terms" -k 10 # Semantic search | |
| sparknet docint ask doc.pdf "question" --use-rag # RAG-powered Q&A | |
| # Legacy Document Commands | |
| sparknet document parse invoice.pdf | |
| sparknet document extract contract.pdf -f "party_name" | |
| sparknet rag index *.pdf --collection my_docs | |
| sparknet rag search "query" --top 10 | |
| ``` | |
| --- | |
| ## 2. How to Execute SPARKNET | |
| ### 2.1 Prerequisites | |
| ```bash | |
| # 1. System Requirements | |
| # - Python 3.10+ | |
| # - NVIDIA GPU with CUDA 12.0+ (optional but recommended) | |
| # - 16GB+ RAM | |
| # - 50GB+ disk space | |
| # 2. Install Ollama (if not installed) | |
| curl -fsSL https://ollama.com/install.sh | sh | |
| # 3. Start Ollama server | |
| ollama serve | |
| ``` | |
| ### 2.2 Installation | |
| ```bash | |
| cd /home/mhamdan/SPARKNET | |
| # Option A: Use existing virtual environment | |
| source sparknet/bin/activate | |
| # Option B: Create new environment | |
| python3 -m venv sparknet | |
| source sparknet/bin/activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| pip install -r demo/requirements.txt | |
| # Install SPARKNET in development mode | |
| pip install -e . | |
| ``` | |
| ### 2.3 Download Required Models | |
| ```bash | |
| # Embedding model (required for RAG) | |
| ollama pull nomic-embed-text:latest | |
| # LLM models (at least one required) | |
| ollama pull llama3.2:latest # Fast, 2GB | |
| ollama pull llama3.1:8b # General purpose, 5GB | |
| ollama pull mistral:latest # Good reasoning, 4GB | |
| # Optional: Larger models for complex tasks | |
| ollama pull qwen2.5:14b # Complex reasoning, 9GB | |
| ``` | |
| ### 2.4 Running the Demo UI | |
| **Method 1: Using the launcher script** | |
| ```bash | |
| cd /home/mhamdan/SPARKNET | |
| ./run_demo.sh 8501 | |
| ``` | |
| **Method 2: Direct Streamlit command** | |
| ```bash | |
| cd /home/mhamdan/SPARKNET | |
| source sparknet/bin/activate | |
| streamlit run demo/app.py --server.port 8501 | |
| ``` | |
| **Method 3: Bind to specific IP (for remote access)** | |
| ```bash | |
| streamlit run demo/app.py \ | |
| --server.address 172.24.50.21 \ | |
| --server.port 8501 \ | |
| --server.headless true | |
| ``` | |
| **Access at:** http://172.24.50.21:8501 or http://localhost:8501 | |
| ### 2.5 Running the API Server | |
| ```bash | |
| cd /home/mhamdan/SPARKNET | |
| source sparknet/bin/activate | |
| uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload | |
| ``` | |
| **API Endpoints:** | |
| - `GET /health` - Health check | |
| - `POST /api/documents/parse` - Parse document | |
| - `POST /api/documents/extract` - Extract fields | |
| - `POST /api/rag/index` - Index document | |
| - `POST /api/rag/query` - Query RAG | |
| ### 2.6 Running Examples | |
| ```bash | |
| cd /home/mhamdan/SPARKNET | |
| source sparknet/bin/activate | |
| # Document Intelligence Demo | |
| python examples/document_intelligence_demo.py | |
| # RAG End-to-End Pipeline | |
| python examples/document_rag_end_to_end.py | |
| # Simple Agent Task | |
| python examples/simple_task.py | |
| # Document Agent | |
| python examples/document_agent.py | |
| ``` | |
| ### 2.7 Running Tests | |
| ```bash | |
| cd /home/mhamdan/SPARKNET | |
| source sparknet/bin/activate | |
| # Run all tests | |
| pytest tests/ -v | |
| # Run specific test suites | |
| pytest tests/unit/test_document_intelligence.py -v | |
| pytest tests/unit/test_rag_integration.py -v | |
| # Run with coverage | |
| pytest tests/ --cov=src --cov-report=html | |
| ``` | |
| --- | |
| ## 3. Configuration | |
| ### 3.1 RAG Configuration (`configs/rag.yaml`) | |
| ```yaml | |
| vector_store: | |
| type: chroma | |
| chroma: | |
| persist_directory: "./.sparknet/chroma_db" | |
| collection_name: "sparknet_documents" | |
| distance_metric: cosine | |
| embeddings: | |
| provider: ollama # Local-first | |
| ollama: | |
| model: nomic-embed-text | |
| base_url: "http://localhost:11434" | |
| openai: | |
| enabled: false # Disabled by default | |
| generator: | |
| provider: ollama | |
| ollama: | |
| model: llama3.2 | |
| abstain_on_low_confidence: true | |
| abstain_threshold: 0.3 | |
| ``` | |
| ### 3.2 Document Configuration (`config/document.yaml`) | |
| ```yaml | |
| ocr: | |
| engine: paddleocr # or tesseract | |
| languages: ["en"] | |
| confidence_threshold: 0.5 | |
| layout: | |
| enabled: true | |
| reading_order: true | |
| chunking: | |
| min_chunk_chars: 10 | |
| max_chunk_chars: 4000 | |
| target_chunk_chars: 500 | |
| ``` | |
| --- | |
| ## 4. FAANG Best Practices Applied | |
| ### 4.1 Google-Inspired Patterns | |
| - **DocAI Architecture**: Modular vision-first document understanding | |
| - **Structured Output**: Schema-driven extraction with validation | |
| - **Abstention Policy**: Never hallucinate, return "I don't know" | |
| ### 4.2 Meta-Inspired Patterns | |
| - **FAISS Integration**: Fast similarity search (optional alongside ChromaDB) | |
| - **RAG Pipeline**: Retrieve-then-generate with citations | |
| ### 4.3 Amazon-Inspired Patterns | |
| - **Textract-like API**: Structured field extraction with confidence scores | |
| - **Evidence Grounding**: Every output traceable to source | |
| ### 4.4 Microsoft-Inspired Patterns | |
| - **Form Recognizer Pattern**: Pre-built schemas for invoices, contracts | |
| - **Confidence Thresholds**: Configurable abstention levels | |
| ### 4.5 Apple-Inspired Patterns | |
| - **Privacy-First**: All processing local by default | |
| - **Opt-In Cloud**: OpenAI and cloud services disabled by default | |
| --- | |
| ## 5. Quick Start Commands | |
| ```bash | |
| # === SETUP === | |
| cd /home/mhamdan/SPARKNET | |
| source sparknet/bin/activate | |
| ollama serve & # Start in background | |
| # === DEMO UI === | |
| streamlit run demo/app.py --server.port 8501 | |
| # === CLI USAGE === | |
| # Parse a document | |
| python -m src.cli.main docint parse Dataset/IBM*.pdf -o result.json | |
| # Index for RAG | |
| python -m src.cli.main docint index Dataset/*.pdf | |
| # Ask questions with RAG | |
| python -m src.cli.main docint ask Dataset/IBM*.pdf "What is this document about?" --use-rag | |
| # === PYTHON API === | |
| python -c " | |
| from src.document_intelligence import DocumentParser | |
| parser = DocumentParser() | |
| result = parser.parse('Dataset/IBM N_A.pdf') | |
| print(f'Parsed {len(result.chunks)} chunks') | |
| " | |
| # === RUN TESTS === | |
| pytest tests/unit/ -v | |
| ``` | |
| --- | |
| ## 6. Troubleshooting | |
| ### Issue: Ollama not running | |
| ```bash | |
| # Check status | |
| curl http://localhost:11434/api/tags | |
| # Start Ollama | |
| ollama serve | |
| # If port in use | |
| pkill ollama && ollama serve | |
| ``` | |
| ### Issue: Missing models | |
| ```bash | |
| ollama list # See installed models | |
| ollama pull nomic-embed-text # Install embedding model | |
| ollama pull llama3.2 # Install LLM | |
| ``` | |
| ### Issue: ChromaDB errors | |
| ```bash | |
| # Reset vector store | |
| rm -rf .sparknet/chroma_db | |
| ``` | |
| ### Issue: Import errors | |
| ```bash | |
| # Ensure in correct directory | |
| cd /home/mhamdan/SPARKNET | |
| # Ensure venv activated | |
| source sparknet/bin/activate | |
| # Reinstall | |
| pip install -e . | |
| ``` | |
| --- | |
| ## 7. Architecture Diagram | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β SPARKNET Platform β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β βββββββββββββββ βββββββββββββββ βββββββββββββββ β | |
| β β Streamlit β β FastAPI β β CLI β Interfaces β | |
| β β Demo β β API β β Commands β β | |
| β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β | |
| βββββββββββ΄βββββββββββββββββ΄βββββββββββββββββ΄ββββββββββββββββββββββ€ | |
| β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β Agent Layer β β | |
| β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β | |
| β β β Document β β Executor β β Planner β β Critic β β β | |
| β β β Agent β β Agent β β Agent β β Agent β β β | |
| β β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β β | |
| β βββββββββ΄βββββββββββββ΄βββββββββββββ΄βββββββββββββ΄ββββββββββββ β | |
| β β | |
| β ββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ β | |
| β β Document Intel β β RAG Subsystem β β | |
| β β βββββββββ ββββββββ β β βββββββββββ βββββββββββββββββββ β β | |
| β β βParser β βExtractβ β β βIndexer β β Retriever β β β | |
| β β βββββββββ ββββββββ β β βββββββββββ βββββββββββββββββββ β β | |
| β β βββββββββ ββββββββ β β βββββββββββ βββββββββββββββββββ β β | |
| β β βGround β βValid β β β βEmbedder β β Generator β β β | |
| β β βββββββββ ββββββββ β β βββββββββββ βββββββββββββββββββ β β | |
| β ββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ β | |
| β β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β Infrastructure β β | |
| β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β | |
| β β β Ollama β β ChromaDB β β GPU β β Cache β β β | |
| β β β Client β β Store β β Manager β β Layer β β β | |
| β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## 8. Files Modified/Created in Recent Session | |
| | File | Action | Description | | |
| |------|--------|-------------| | |
| | `src/rag/docint_bridge.py` | Created | Bridge between document_intelligence and RAG | | |
| | `src/document_intelligence/tools/rag_tools.py` | Created | RAG tools for agents | | |
| | `src/document_intelligence/tools/__init__.py` | Modified | Added RAG tool exports | | |
| | `src/document_intelligence/tools/document_tools.py` | Modified | Enhanced AnswerQuestionTool with RAG | | |
| | `src/cli/docint.py` | Modified | Added index, retrieve, delete-index commands | | |
| | `src/rag/__init__.py` | Modified | Added bridge exports | | |
| | `configs/rag.yaml` | Created | RAG configuration file | | |
| | `tests/unit/test_rag_integration.py` | Created | RAG integration tests | | |
| | `examples/document_rag_end_to_end.py` | Created | End-to-end RAG example | | |
| --- | |
| **Report Complete** | |
| For questions or issues, refer to the troubleshooting section above or check the test files for usage examples. | |