Spaces:

MHamdan
/

SPARKNET

Sleeping

File size: 17,356 Bytes

d520909

# SPARKNET Implementation Report
## Agentic Document Intelligence Platform

**Report Date:** January 2025
**Version:** 0.1.0

---

## Executive Summary

SPARKNET is an enterprise-grade **Agentic Document Intelligence Platform** that follows FAANG best practices for:
- **Modular Architecture**: Clean separation of concerns with well-defined interfaces
- **Local-First Privacy**: All processing happens locally via Ollama
- **Evidence Grounding**: Every extraction includes verifiable source references
- **Production-Ready**: Type-safe, tested, configurable, and scalable

---

## 1. What Has Been Implemented

### 1.1 Core Subsystems

| Subsystem | Location | Status | Description |
|-----------|----------|--------|-------------|
| **Document Intelligence** | `src/document_intelligence/` | Complete | Vision-first document understanding |
| **Legacy Document Pipeline** | `src/document/` | Complete | OCR, layout, chunking pipeline |
| **RAG Subsystem** | `src/rag/` | Complete | Vector search with grounded retrieval |
| **Multi-Agent System** | `src/agents/` | Complete | ReAct-style agents with tools |
| **LLM Integration** | `src/llm/` | Complete | Ollama client with routing |
| **CLI** | `src/cli/` | Complete | Full command-line interface |
| **API** | `api/` | Complete | FastAPI REST endpoints |
| **Demo UI** | `demo/` | Complete | Streamlit dashboard |

### 1.2 Document Intelligence Module (`src/document_intelligence/`)

**Architecture (FAANG-inspired: Google DocAI pattern):**

```
src/document_intelligence/
├── chunks/           # Core data models (BoundingBox, DocumentChunk, TableChunk)
│   ├── models.py     # Pydantic models with full type safety
│   └── __init__.py
├── io/               # Document loading with caching
│   ├── base.py       # Abstract interfaces
│   ├── pdf.py        # PyMuPDF-based PDF loading
│   ├── image.py      # PIL image loading
│   └── cache.py      # LRU page caching
├── models/           # ML model interfaces
│   ├── base.py       # BaseModel, BatchableModel
│   ├── ocr.py        # OCRModel interface
│   ├── layout.py     # LayoutModel interface
│   ├── table.py      # TableModel interface
│   └── vlm.py        # VisionLanguageModel interface
├── parsing/          # Document parsing pipeline
│   ├── parser.py     # DocumentParser orchestrator
│   └── chunking.py   # SemanticChunker
├── grounding/        # Visual evidence
│   ├── evidence.py   # EvidenceBuilder, EvidenceTracker
│   └── crops.py      # Image cropping utilities
├── extraction/       # Field extraction
│   ├── schema.py     # ExtractionSchema, FieldSpec
│   ├── extractor.py  # FieldExtractor
│   └── validator.py  # ExtractionValidator
├── tools/            # Agent tools
│   ├── document_tools.py  # ParseDocumentTool, ExtractFieldsTool, etc.
│   └── rag_tools.py       # IndexDocumentTool, RetrieveChunksTool, RAGAnswerTool
└── agent_adapter.py  # EnhancedDocumentAgent integration
```

**Key Features:**
- **Zero-Shot Capability**: Works across document formats without training
- **Schema-Driven Extraction**: Define fields using JSON Schema or Pydantic
- **Abstention Policy**: Never guesses - abstains when confidence is low
- **Visual Grounding**: Every extraction includes page, bbox, snippet, confidence

### 1.3 RAG Subsystem (`src/rag/`)

**Architecture (FAANG-inspired: Meta FAISS + Google Vertex AI pattern):**

```
src/rag/
├── store.py          # VectorStore interface + ChromaVectorStore
├── embeddings.py     # OllamaEmbedding + OpenAIEmbedding (feature-flagged)
├── indexer.py        # DocumentIndexer for chunked documents
├── retriever.py      # DocumentRetriever with evidence support
├── generator.py      # GroundedGenerator with citations
├── docint_bridge.py  # Bridge to document_intelligence subsystem
└── __init__.py       # Clean exports
```

**Key Features:**
- **Local-First Embeddings**: Ollama `nomic-embed-text` by default
- **Cloud Opt-In**: OpenAI embeddings disabled by default, feature-flagged
- **Metadata Filtering**: Filter by document_id, chunk_type, page_range
- **Citation Generation**: Answers include `[1]`, `[2]` references
- **Confidence-Based Abstention**: Returns "I don't know" when uncertain

### 1.4 Multi-Agent System (`src/agents/`)

**Agents Implemented:**
| Agent | Purpose | Model |
|-------|---------|-------|
| `ExecutorAgent` | Task execution with tools | llama3.1:8b |
| `DocumentAgent` | ReAct-style document analysis | llama3.1:8b |
| `PlannerAgent` | Task decomposition | mistral |
| `CriticAgent` | Output validation | phi3 |
| `MemoryAgent` | Context management | llama3.2 |
| `VisionOCRAgent` | Vision-based OCR | llava (optional) |

### 1.5 CLI Commands

```bash
# Document Intelligence
sparknet docint parse document.pdf -o result.json
sparknet docint extract invoice.pdf --preset invoice
sparknet docint ask document.pdf "What is the total?"
sparknet docint classify document.pdf

# RAG Operations
sparknet docint index document.pdf              # Index into vector store
sparknet docint index-stats                     # Show index statistics
sparknet docint retrieve "payment terms" -k 10  # Semantic search
sparknet docint ask doc.pdf "question" --use-rag  # RAG-powered Q&A

# Legacy Document Commands
sparknet document parse invoice.pdf
sparknet document extract contract.pdf -f "party_name"
sparknet rag index *.pdf --collection my_docs
sparknet rag search "query" --top 10
```

---

## 2. How to Execute SPARKNET

### 2.1 Prerequisites

```bash
# 1. System Requirements
# - Python 3.10+
# - NVIDIA GPU with CUDA 12.0+ (optional but recommended)
# - 16GB+ RAM
# - 50GB+ disk space

# 2. Install Ollama (if not installed)
curl -fsSL https://ollama.com/install.sh | sh

# 3. Start Ollama server
ollama serve
```

### 2.2 Installation

```bash
cd /home/mhamdan/SPARKNET

# Option A: Use existing virtual environment
source sparknet/bin/activate

# Option B: Create new environment
python3 -m venv sparknet
source sparknet/bin/activate

# Install dependencies
pip install -r requirements.txt
pip install -r demo/requirements.txt

# Install SPARKNET in development mode
pip install -e .
```

### 2.3 Download Required Models

```bash
# Embedding model (required for RAG)
ollama pull nomic-embed-text:latest

# LLM models (at least one required)
ollama pull llama3.2:latest     # Fast, 2GB
ollama pull llama3.1:8b         # General purpose, 5GB
ollama pull mistral:latest      # Good reasoning, 4GB

# Optional: Larger models for complex tasks
ollama pull qwen2.5:14b         # Complex reasoning, 9GB
```

### 2.4 Running the Demo UI

**Method 1: Using the launcher script**
```bash
cd /home/mhamdan/SPARKNET
./run_demo.sh 8501
```

**Method 2: Direct Streamlit command**
```bash
cd /home/mhamdan/SPARKNET
source sparknet/bin/activate
streamlit run demo/app.py --server.port 8501
```

**Method 3: Bind to specific IP (for remote access)**
```bash
streamlit run demo/app.py \
  --server.address 172.24.50.21 \
  --server.port 8501 \
  --server.headless true
```

**Access at:** http://172.24.50.21:8501 or http://localhost:8501

### 2.5 Running the API Server

```bash
cd /home/mhamdan/SPARKNET
source sparknet/bin/activate
uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload
```

**API Endpoints:**
- `GET /health` - Health check
- `POST /api/documents/parse` - Parse document
- `POST /api/documents/extract` - Extract fields
- `POST /api/rag/index` - Index document
- `POST /api/rag/query` - Query RAG

### 2.6 Running Examples

```bash
cd /home/mhamdan/SPARKNET
source sparknet/bin/activate

# Document Intelligence Demo
python examples/document_intelligence_demo.py

# RAG End-to-End Pipeline
python examples/document_rag_end_to_end.py

# Simple Agent Task
python examples/simple_task.py

# Document Agent
python examples/document_agent.py
```

### 2.7 Running Tests

```bash
cd /home/mhamdan/SPARKNET
source sparknet/bin/activate

# Run all tests
pytest tests/ -v

# Run specific test suites
pytest tests/unit/test_document_intelligence.py -v
pytest tests/unit/test_rag_integration.py -v

# Run with coverage
pytest tests/ --cov=src --cov-report=html
```

---

## 3. Configuration

### 3.1 RAG Configuration (`configs/rag.yaml`)

```yaml
vector_store:
  type: chroma
  chroma:
    persist_directory: "./.sparknet/chroma_db"
    collection_name: "sparknet_documents"
    distance_metric: cosine

embeddings:
  provider: ollama  # Local-first
  ollama:
    model: nomic-embed-text
    base_url: "http://localhost:11434"
  openai:
    enabled: false  # Disabled by default

generator:
  provider: ollama
  ollama:
    model: llama3.2
  abstain_on_low_confidence: true
  abstain_threshold: 0.3
```

### 3.2 Document Configuration (`config/document.yaml`)

```yaml
ocr:
  engine: paddleocr  # or tesseract
  languages: ["en"]
  confidence_threshold: 0.5

layout:
  enabled: true
  reading_order: true

chunking:
  min_chunk_chars: 10
  max_chunk_chars: 4000
  target_chunk_chars: 500
```

---

## 4. FAANG Best Practices Applied

### 4.1 Google-Inspired Patterns
- **DocAI Architecture**: Modular vision-first document understanding
- **Structured Output**: Schema-driven extraction with validation
- **Abstention Policy**: Never hallucinate, return "I don't know"

### 4.2 Meta-Inspired Patterns
- **FAISS Integration**: Fast similarity search (optional alongside ChromaDB)
- **RAG Pipeline**: Retrieve-then-generate with citations

### 4.3 Amazon-Inspired Patterns
- **Textract-like API**: Structured field extraction with confidence scores
- **Evidence Grounding**: Every output traceable to source

### 4.4 Microsoft-Inspired Patterns
- **Form Recognizer Pattern**: Pre-built schemas for invoices, contracts
- **Confidence Thresholds**: Configurable abstention levels

### 4.5 Apple-Inspired Patterns
- **Privacy-First**: All processing local by default
- **Opt-In Cloud**: OpenAI and cloud services disabled by default

---

## 5. Quick Start Commands

```bash
# === SETUP ===
cd /home/mhamdan/SPARKNET
source sparknet/bin/activate
ollama serve &  # Start in background

# === DEMO UI ===
streamlit run demo/app.py --server.port 8501

# === CLI USAGE ===
# Parse a document
python -m src.cli.main docint parse Dataset/IBM*.pdf -o result.json

# Index for RAG
python -m src.cli.main docint index Dataset/*.pdf

# Ask questions with RAG
python -m src.cli.main docint ask Dataset/IBM*.pdf "What is this document about?" --use-rag

# === PYTHON API ===
python -c "
from src.document_intelligence import DocumentParser
parser = DocumentParser()
result = parser.parse('Dataset/IBM N_A.pdf')
print(f'Parsed {len(result.chunks)} chunks')
"

# === RUN TESTS ===
pytest tests/unit/ -v
```

---

## 6. Troubleshooting

### Issue: Ollama not running
```bash
# Check status
curl http://localhost:11434/api/tags

# Start Ollama
ollama serve

# If port in use
pkill ollama && ollama serve
```

### Issue: Missing models
```bash
ollama list  # See installed models
ollama pull nomic-embed-text  # Install embedding model
ollama pull llama3.2  # Install LLM
```

### Issue: ChromaDB errors
```bash
# Reset vector store
rm -rf .sparknet/chroma_db
```

### Issue: Import errors
```bash
# Ensure in correct directory
cd /home/mhamdan/SPARKNET

# Ensure venv activated
source sparknet/bin/activate

# Reinstall
pip install -e .
```

---

## 7. Architecture Diagram

```
┌─────────────────────────────────────────────────────────────────┐
│                        SPARKNET Platform                        │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐             │
│  │  Streamlit  │  │  FastAPI    │  │    CLI      │  Interfaces │
│  │    Demo     │  │    API      │  │  Commands   │             │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘             │
├─────────┴────────────────┴────────────────┴─────────────────────┤
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                   Agent Layer                            │  │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐    │  │
│  │  │ Document │ │ Executor │ │ Planner  │ │  Critic  │    │  │
│  │  │  Agent   │ │  Agent   │ │  Agent   │ │  Agent   │    │  │
│  │  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘    │  │
│  └───────┴────────────┴────────────┴────────────┴───────────┘  │
│                                                                 │
│  ┌────────────────────┐  ┌─────────────────────────────────┐   │
│  │ Document Intel     │  │         RAG Subsystem           │   │
│  │ ┌───────┐ ┌──────┐ │  │ ┌─────────┐ ┌─────────────────┐ │   │
│  │ │Parser │ │Extract│ │  │ │Indexer  │ │   Retriever     │ │   │
│  │ └───────┘ └──────┘ │  │ └─────────┘ └─────────────────┘ │   │
│  │ ┌───────┐ ┌──────┐ │  │ ┌─────────┐ ┌─────────────────┐ │   │
│  │ │Ground │ │Valid │ │  │ │Embedder │ │   Generator     │ │   │
│  │ └───────┘ └──────┘ │  │ └─────────┘ └─────────────────┘ │   │
│  └────────────────────┘  └─────────────────────────────────┘   │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                   Infrastructure                         │   │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐   │   │
│  │  │  Ollama  │ │ ChromaDB │ │   GPU    │ │  Cache   │   │   │
│  │  │  Client  │ │  Store   │ │ Manager  │ │  Layer   │   │   │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘   │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
```

---

## 8. Files Modified/Created in Recent Session

| File | Action | Description |
|------|--------|-------------|
| `src/rag/docint_bridge.py` | Created | Bridge between document_intelligence and RAG |
| `src/document_intelligence/tools/rag_tools.py` | Created | RAG tools for agents |
| `src/document_intelligence/tools/__init__.py` | Modified | Added RAG tool exports |
| `src/document_intelligence/tools/document_tools.py` | Modified | Enhanced AnswerQuestionTool with RAG |
| `src/cli/docint.py` | Modified | Added index, retrieve, delete-index commands |
| `src/rag/__init__.py` | Modified | Added bridge exports |
| `configs/rag.yaml` | Created | RAG configuration file |
| `tests/unit/test_rag_integration.py` | Created | RAG integration tests |
| `examples/document_rag_end_to_end.py` | Created | End-to-end RAG example |

---

**Report Complete**

For questions or issues, refer to the troubleshooting section above or check the test files for usage examples.