Spaces:

MHamdan
/

SPARKNET

Sleeping

App Files Files Community

SPARKNET / IMPLEMENTATION_REPORT.md

MHamdan

Initial commit: SPARKNET framework

d520909 25 days ago

preview code

raw

history blame contribute delete

17.4 kB

	# SPARKNET Implementation Report
	## Agentic Document Intelligence Platform

	Report Date: January 2025
	Version: 0.1.0

	---

	## Executive Summary

	SPARKNET is an enterprise-grade Agentic Document Intelligence Platform that follows FAANG best practices for:
	- Modular Architecture: Clean separation of concerns with well-defined interfaces
	- Local-First Privacy: All processing happens locally via Ollama
	- Evidence Grounding: Every extraction includes verifiable source references
	- Production-Ready: Type-safe, tested, configurable, and scalable

	---

	## 1. What Has Been Implemented

	### 1.1 Core Subsystems

	\| Subsystem \| Location \| Status \| Description \|
	\|-----------\|----------\|--------\|-------------\|
	\| Document Intelligence \| `src/document_intelligence/` \| Complete \| Vision-first document understanding \|
	\| Legacy Document Pipeline \| `src/document/` \| Complete \| OCR, layout, chunking pipeline \|
	\| RAG Subsystem \| `src/rag/` \| Complete \| Vector search with grounded retrieval \|
	\| Multi-Agent System \| `src/agents/` \| Complete \| ReAct-style agents with tools \|
	\| LLM Integration \| `src/llm/` \| Complete \| Ollama client with routing \|
	\| CLI \| `src/cli/` \| Complete \| Full command-line interface \|
	\| API \| `api/` \| Complete \| FastAPI REST endpoints \|
	\| Demo UI \| `demo/` \| Complete \| Streamlit dashboard \|

	### 1.2 Document Intelligence Module (`src/document_intelligence/`)

	Architecture (FAANG-inspired: Google DocAI pattern):

	```
	src/document_intelligence/
	├── chunks/ # Core data models (BoundingBox, DocumentChunk, TableChunk)
	│ ├── models.py # Pydantic models with full type safety
	│ └── __init__.py
	├── io/ # Document loading with caching
	│ ├── base.py # Abstract interfaces
	│ ├── pdf.py # PyMuPDF-based PDF loading
	│ ├── image.py # PIL image loading
	│ └── cache.py # LRU page caching
	├── models/ # ML model interfaces
	│ ├── base.py # BaseModel, BatchableModel
	│ ├── ocr.py # OCRModel interface
	│ ├── layout.py # LayoutModel interface
	│ ├── table.py # TableModel interface
	│ └── vlm.py # VisionLanguageModel interface
	├── parsing/ # Document parsing pipeline
	│ ├── parser.py # DocumentParser orchestrator
	│ └── chunking.py # SemanticChunker
	├── grounding/ # Visual evidence
	│ ├── evidence.py # EvidenceBuilder, EvidenceTracker
	│ └── crops.py # Image cropping utilities
	├── extraction/ # Field extraction
	│ ├── schema.py # ExtractionSchema, FieldSpec
	│ ├── extractor.py # FieldExtractor
	│ └── validator.py # ExtractionValidator
	├── tools/ # Agent tools
	│ ├── document_tools.py # ParseDocumentTool, ExtractFieldsTool, etc.
	│ └── rag_tools.py # IndexDocumentTool, RetrieveChunksTool, RAGAnswerTool
	└── agent_adapter.py # EnhancedDocumentAgent integration
	```

	Key Features:
	- Zero-Shot Capability: Works across document formats without training
	- Schema-Driven Extraction: Define fields using JSON Schema or Pydantic
	- Abstention Policy: Never guesses - abstains when confidence is low
	- Visual Grounding: Every extraction includes page, bbox, snippet, confidence

	### 1.3 RAG Subsystem (`src/rag/`)

	Architecture (FAANG-inspired: Meta FAISS + Google Vertex AI pattern):

	```
	src/rag/
	├── store.py # VectorStore interface + ChromaVectorStore
	├── embeddings.py # OllamaEmbedding + OpenAIEmbedding (feature-flagged)
	├── indexer.py # DocumentIndexer for chunked documents
	├── retriever.py # DocumentRetriever with evidence support
	├── generator.py # GroundedGenerator with citations
	├── docint_bridge.py # Bridge to document_intelligence subsystem
	└── __init__.py # Clean exports
	```

	Key Features:
	- Local-First Embeddings: Ollama `nomic-embed-text` by default
	- Cloud Opt-In: OpenAI embeddings disabled by default, feature-flagged
	- Metadata Filtering: Filter by document_id, chunk_type, page_range
	- Citation Generation: Answers include `[1]`, `[2]` references
	- Confidence-Based Abstention: Returns "I don't know" when uncertain

	### 1.4 Multi-Agent System (`src/agents/`)

	Agents Implemented:
	\| Agent \| Purpose \| Model \|
	\|-------\|---------\|-------\|
	\| `ExecutorAgent` \| Task execution with tools \| llama3.1:8b \|
	\| `DocumentAgent` \| ReAct-style document analysis \| llama3.1:8b \|
	\| `PlannerAgent` \| Task decomposition \| mistral \|
	\| `CriticAgent` \| Output validation \| phi3 \|
	\| `MemoryAgent` \| Context management \| llama3.2 \|
	\| `VisionOCRAgent` \| Vision-based OCR \| llava (optional) \|

	### 1.5 CLI Commands

	```bash
	# Document Intelligence
	sparknet docint parse document.pdf -o result.json
	sparknet docint extract invoice.pdf --preset invoice
	sparknet docint ask document.pdf "What is the total?"
	sparknet docint classify document.pdf

	# RAG Operations
	sparknet docint index document.pdf # Index into vector store
	sparknet docint index-stats # Show index statistics
	sparknet docint retrieve "payment terms" -k 10 # Semantic search
	sparknet docint ask doc.pdf "question" --use-rag # RAG-powered Q&A

	# Legacy Document Commands
	sparknet document parse invoice.pdf
	sparknet document extract contract.pdf -f "party_name"
	sparknet rag index *.pdf --collection my_docs
	sparknet rag search "query" --top 10
	```

	---

	## 2. How to Execute SPARKNET

	### 2.1 Prerequisites

	```bash
	# 1. System Requirements
	# - Python 3.10+
	# - NVIDIA GPU with CUDA 12.0+ (optional but recommended)
	# - 16GB+ RAM
	# - 50GB+ disk space

	# 2. Install Ollama (if not installed)
	curl -fsSL https://ollama.com/install.sh \| sh

	# 3. Start Ollama server
	ollama serve
	```

	### 2.2 Installation

	```bash
	cd /home/mhamdan/SPARKNET

	# Option A: Use existing virtual environment
	source sparknet/bin/activate

	# Option B: Create new environment
	python3 -m venv sparknet
	source sparknet/bin/activate

	# Install dependencies
	pip install -r requirements.txt
	pip install -r demo/requirements.txt

	# Install SPARKNET in development mode
	pip install -e .
	```

	### 2.3 Download Required Models

	```bash
	# Embedding model (required for RAG)
	ollama pull nomic-embed-text:latest

	# LLM models (at least one required)
	ollama pull llama3.2:latest # Fast, 2GB
	ollama pull llama3.1:8b # General purpose, 5GB
	ollama pull mistral:latest # Good reasoning, 4GB

	# Optional: Larger models for complex tasks
	ollama pull qwen2.5:14b # Complex reasoning, 9GB
	```

	### 2.4 Running the Demo UI

	Method 1: Using the launcher script
	```bash
	cd /home/mhamdan/SPARKNET
	./run_demo.sh 8501
	```

	Method 2: Direct Streamlit command
	```bash
	cd /home/mhamdan/SPARKNET
	source sparknet/bin/activate
	streamlit run demo/app.py --server.port 8501
	```

	Method 3: Bind to specific IP (for remote access)
	```bash
	streamlit run demo/app.py \
	--server.address 172.24.50.21 \
	--server.port 8501 \
	--server.headless true
	```

	Access at: http://172.24.50.21:8501 or http://localhost:8501

	### 2.5 Running the API Server

	```bash
	cd /home/mhamdan/SPARKNET
	source sparknet/bin/activate
	uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload
	```

	API Endpoints:
	- `GET /health` - Health check
	- `POST /api/documents/parse` - Parse document
	- `POST /api/documents/extract` - Extract fields
	- `POST /api/rag/index` - Index document
	- `POST /api/rag/query` - Query RAG

	### 2.6 Running Examples

	```bash
	cd /home/mhamdan/SPARKNET
	source sparknet/bin/activate

	# Document Intelligence Demo
	python examples/document_intelligence_demo.py

	# RAG End-to-End Pipeline
	python examples/document_rag_end_to_end.py

	# Simple Agent Task
	python examples/simple_task.py

	# Document Agent
	python examples/document_agent.py
	```

	### 2.7 Running Tests

	```bash
	cd /home/mhamdan/SPARKNET
	source sparknet/bin/activate

	# Run all tests
	pytest tests/ -v

	# Run specific test suites
	pytest tests/unit/test_document_intelligence.py -v
	pytest tests/unit/test_rag_integration.py -v

	# Run with coverage
	pytest tests/ --cov=src --cov-report=html
	```

	---

	## 3. Configuration

	### 3.1 RAG Configuration (`configs/rag.yaml`)

	```yaml
	vector_store:
	type: chroma
	chroma:
	persist_directory: "./.sparknet/chroma_db"
	collection_name: "sparknet_documents"
	distance_metric: cosine

	embeddings:
	provider: ollama # Local-first
	ollama:
	model: nomic-embed-text
	base_url: "http://localhost:11434"
	openai:
	enabled: false # Disabled by default

	generator:
	provider: ollama
	ollama:
	model: llama3.2
	abstain_on_low_confidence: true
	abstain_threshold: 0.3
	```

	### 3.2 Document Configuration (`config/document.yaml`)

	```yaml
	ocr:
	engine: paddleocr # or tesseract
	languages: ["en"]
	confidence_threshold: 0.5

	layout:
	enabled: true
	reading_order: true

	chunking:
	min_chunk_chars: 10
	max_chunk_chars: 4000
	target_chunk_chars: 500
	```

	---

	## 4. FAANG Best Practices Applied

	### 4.1 Google-Inspired Patterns
	- DocAI Architecture: Modular vision-first document understanding
	- Structured Output: Schema-driven extraction with validation
	- Abstention Policy: Never hallucinate, return "I don't know"

	### 4.2 Meta-Inspired Patterns
	- FAISS Integration: Fast similarity search (optional alongside ChromaDB)
	- RAG Pipeline: Retrieve-then-generate with citations

	### 4.3 Amazon-Inspired Patterns
	- Textract-like API: Structured field extraction with confidence scores
	- Evidence Grounding: Every output traceable to source

	### 4.4 Microsoft-Inspired Patterns
	- Form Recognizer Pattern: Pre-built schemas for invoices, contracts
	- Confidence Thresholds: Configurable abstention levels

	### 4.5 Apple-Inspired Patterns
	- Privacy-First: All processing local by default
	- Opt-In Cloud: OpenAI and cloud services disabled by default

	---

	## 5. Quick Start Commands

	```bash
	# === SETUP ===
	cd /home/mhamdan/SPARKNET
	source sparknet/bin/activate
	ollama serve & # Start in background

	# === DEMO UI ===
	streamlit run demo/app.py --server.port 8501

	# === CLI USAGE ===
	# Parse a document
	python -m src.cli.main docint parse Dataset/IBM*.pdf -o result.json

	# Index for RAG
	python -m src.cli.main docint index Dataset/*.pdf

	# Ask questions with RAG
	python -m src.cli.main docint ask Dataset/IBM*.pdf "What is this document about?" --use-rag

	# === PYTHON API ===
	python -c "
	from src.document_intelligence import DocumentParser
	parser = DocumentParser()
	result = parser.parse('Dataset/IBM N_A.pdf')
	print(f'Parsed {len(result.chunks)} chunks')
	"

	# === RUN TESTS ===
	pytest tests/unit/ -v
	```

	---

	## 6. Troubleshooting

	### Issue: Ollama not running
	```bash
	# Check status
	curl http://localhost:11434/api/tags

	# Start Ollama
	ollama serve

	# If port in use
	pkill ollama && ollama serve
	```

	### Issue: Missing models
	```bash
	ollama list # See installed models
	ollama pull nomic-embed-text # Install embedding model
	ollama pull llama3.2 # Install LLM
	```

	### Issue: ChromaDB errors
	```bash
	# Reset vector store
	rm -rf .sparknet/chroma_db
	```

	### Issue: Import errors
	```bash
	# Ensure in correct directory
	cd /home/mhamdan/SPARKNET

	# Ensure venv activated
	source sparknet/bin/activate

	# Reinstall
	pip install -e .
	```

	---

	## 7. Architecture Diagram

	```
	┌─────────────────────────────────────────────────────────────────┐
	│ SPARKNET Platform │
	├─────────────────────────────────────────────────────────────────┤
	│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
	│ │ Streamlit │ │ FastAPI │ │ CLI │ Interfaces │
	│ │ Demo │ │ API │ │ Commands │ │
	│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
	├─────────┴────────────────┴────────────────┴─────────────────────┤
	│ │
	│ ┌──────────────────────────────────────────────────────────┐ │
	│ │ Agent Layer │ │
	│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
	│ │ │ Document │ │ Executor │ │ Planner │ │ Critic │ │ │
	│ │ │ Agent │ │ Agent │ │ Agent │ │ Agent │ │ │
	│ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │
	│ └───────┴────────────┴────────────┴────────────┴───────────┘ │
	│ │
	│ ┌────────────────────┐ ┌─────────────────────────────────┐ │
	│ │ Document Intel │ │ RAG Subsystem │ │
	│ │ ┌───────┐ ┌──────┐ │ │ ┌─────────┐ ┌─────────────────┐ │ │
	│ │ │Parser │ │Extract│ │ │ │Indexer │ │ Retriever │ │ │
	│ │ └───────┘ └──────┘ │ │ └─────────┘ └─────────────────┘ │ │
	│ │ ┌───────┐ ┌──────┐ │ │ ┌─────────┐ ┌─────────────────┐ │ │
	│ │ │Ground │ │Valid │ │ │ │Embedder │ │ Generator │ │ │
	│ │ └───────┘ └──────┘ │ │ └─────────┘ └─────────────────┘ │ │
	│ └────────────────────┘ └─────────────────────────────────┘ │
	│ │
	│ ┌─────────────────────────────────────────────────────────┐ │
	│ │ Infrastructure │ │
	│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
	│ │ │ Ollama │ │ ChromaDB │ │ GPU │ │ Cache │ │ │
	│ │ │ Client │ │ Store │ │ Manager │ │ Layer │ │ │
	│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
	│ └─────────────────────────────────────────────────────────┘ │
	└─────────────────────────────────────────────────────────────────┘
	```

	---

	## 8. Files Modified/Created in Recent Session

	\| File \| Action \| Description \|
	\|------\|--------\|-------------\|
	\| `src/rag/docint_bridge.py` \| Created \| Bridge between document_intelligence and RAG \|
	\| `src/document_intelligence/tools/rag_tools.py` \| Created \| RAG tools for agents \|
	\| `src/document_intelligence/tools/__init__.py` \| Modified \| Added RAG tool exports \|
	\| `src/document_intelligence/tools/document_tools.py` \| Modified \| Enhanced AnswerQuestionTool with RAG \|
	\| `src/cli/docint.py` \| Modified \| Added index, retrieve, delete-index commands \|
	\| `src/rag/__init__.py` \| Modified \| Added bridge exports \|
	\| `configs/rag.yaml` \| Created \| RAG configuration file \|
	\| `tests/unit/test_rag_integration.py` \| Created \| RAG integration tests \|
	\| `examples/document_rag_end_to_end.py` \| Created \| End-to-end RAG example \|

	---

	Report Complete

	For questions or issues, refer to the troubleshooting section above or check the test files for usage examples.