Digi-Biz Documentation
Agentic Business Digitization Framework
Version: 1.0.0
Last Updated: March 17, 2026
π Table of Contents
Overview
Digi-Biz is an AI-powered agentic framework that automatically converts unstructured business documents into structured digital business profiles.
What It Does
- Accepts ZIP files containing mixed business documents (PDF, DOCX, Excel, images, videos)
- Intelligently extracts and structures information using multi-agent workflows
- Generates comprehensive digital business profiles with product/service inventories
- Provides dynamic UI for viewing and editing results
Key Features
β
Multi-Agent Pipeline - 5 specialized agents working together
β
Vectorless RAG - Fast document retrieval without embeddings
β
Groq Vision - Image analysis with Llama-4-Scout (17B)
β
Production-Ready - Error handling, validation, logging
β
Streamlit UI - Interactive web interface
Architecture
High-Level Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Interface (Streamlit) β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β ZIP Upload β β Results View β β Vision Tab β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent Pipeline β
β 1. File Discovery β 2. Document Parsing β 3. Table Extract β
β 4. Media Extraction β 5. Vision (Groq) β 6. Indexing (RAG) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Data Layer β
β File Storage (FileSystem) β’ Index (In-Memory) β’ Profiles β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Technology Stack
| Component | Technology |
|---|---|
| Backend | Python 3.10+ |
| Document Parsing | pdfplumber, python-docx, openpyxl |
| Image Processing | Pillow, pdf2image |
| Vision AI | Groq API (Llama-4-Scout-17B) |
| LLM (Text) | Groq API (gpt-oss-120b) |
| Validation | Pydantic |
| Frontend | Streamlit |
| Storage | Local Filesystem |
Agents
1. File Discovery Agent
Purpose: Extract ZIP files and classify all contained files
Input:
FileDiscoveryInput(
zip_file_path="/path/to/upload.zip",
job_id="job_123",
max_file_size=524288000, # 500MB
max_files=100
)
Output:
FileDiscoveryOutput(
job_id="job_123",
success=True,
documents=[...], # PDFs, DOCX
spreadsheets=[...], # XLSX, CSV
images=[...], # JPG, PNG
videos=[...], # MP4, AVI
total_files=10,
extraction_dir="/storage/extracted/job_123"
)
Features:
- ZIP bomb detection (1000:1 ratio limit)
- Path traversal prevention
- File type classification (3-strategy approach)
- Directory structure preservation
File: backend/agents/file_discovery.py
2. Document Parsing Agent
Purpose: Extract text and structure from PDF/DOCX files
Input:
DocumentParsingInput(
documents=[...], # From File Discovery
job_id="job_123",
enable_ocr=True
)
Output:
DocumentParsingOutput(
job_id="job_123",
success=True,
parsed_documents=[...],
total_pages=56,
processing_time=2.5
)
Features:
- PDF parsing (pdfplumber primary, PyPDF2 fallback, OCR final)
- DOCX parsing with structure preservation
- Table extraction
- Embedded image extraction
File: backend/agents/document_parsing.py
3. Table Extraction Agent
Purpose: Detect and classify tables from parsed documents
Input:
TableExtractionInput(
parsed_documents=[...],
job_id="job_123"
)
Output:
TableExtractionOutput(
job_id="job_123",
success=True,
tables=[...],
total_tables=42,
tables_by_type={
"itinerary": 33,
"pricing": 6,
"general": 3
}
)
Table Types:
| Type | Detection Criteria |
|---|---|
| PRICING | Headers: price/cost/rate; Currency: $, β¬, βΉ |
| ITINERARY | Headers: day/time/date; Patterns: "Day 1", "9:00 AM" |
| SPECIFICATIONS | Headers: spec/feature/dimension/weight |
| MENU | Headers: menu/dish/food/meal |
| INVENTORY | Headers: stock/quantity/available |
| GENERAL | Fallback |
File: backend/agents/table_extraction.py
4. Media Extraction Agent
Purpose: Extract embedded and standalone media
Input:
MediaExtractionInput(
parsed_documents=[...],
standalone_files=[...],
job_id="job_123"
)
Output:
MediaExtractionOutput(
job_id="job_123",
success=True,
media=MediaCollection(
images=[...],
total_count=15,
extraction_summary={...}
),
duplicates_removed=3
)
Features:
- PDF embedded image extraction (xref method)
- DOCX embedded image extraction (ZIP method)
- Perceptual hashing for deduplication
- Quality assessment
File: backend/agents/media_extraction.py
5. Vision Agent (Groq)
Purpose: Analyze images using Groq Vision API
Input:
VisionAnalysisInput(
image=ExtractedImage(...),
context="Restaurant menu with burgers",
job_id="job_123"
)
Output:
ImageAnalysis(
image_id="img_001",
description="A delicious burger with lettuce...",
category=ImageCategory.FOOD,
tags=["burger", "food", "restaurant"],
is_product=False,
is_service_related=True,
confidence=0.92,
metadata={
'provider': 'groq',
'model': 'llama-4-scout-17b',
'processing_time': 1.85
}
)
Features:
- Groq API integration (Llama-4-Scout-17B)
- Ollama fallback
- Context-aware prompts
- JSON response parsing
- Batch processing
- Automatic image resizing (<4MB)
File: backend/agents/vision_agent.py
6. Indexing Agent (Vectorless RAG)
Purpose: Build inverted index for fast document retrieval
Input:
IndexingInput(
parsed_documents=[...],
tables=[...],
images=[...],
job_id="job_123"
)
Output:
IndexingOutput(
job_id="job_123",
success=True,
page_index=PageIndex(
documents={...},
page_index={
"burger": [PageReference(...)],
"price": [PageReference(...)]
},
table_index={...},
media_index={...}
),
total_keywords=1250
)
Features:
- Keyword extraction (tokenization, N-grams, entities)
- Inverted index creation
- Query expansion with synonyms
- Context-aware retrieval
- Relevance scoring
File: backend/agents/indexing.py
Installation
Prerequisites
- Python 3.10+
- Git (for cloning)
- Groq API account (free at https://console.groq.com)
Step 1: Clone Repository
cd D:\Viswam_Projects\digi-biz
Step 2: Install Dependencies
pip install -r requirements.txt
Step 3: Configure Environment
Create .env file:
# Groq API (required for vision and text LLM)
GROQ_API_KEY=gsk_your_actual_key_here
GROQ_MODEL=gpt-oss-120b
GROQ_VISION_MODEL=meta-llama/llama-4-scout-17b-16e-instruct
# Optional: Ollama for local fallback
OLLAMA_HOST=http://localhost:11434
OLLAMA_VISION_MODEL=qwen3.5:0.8b
# Application settings
APP_ENV=development
LOG_LEVEL=INFO
MAX_FILE_SIZE=524288000 # 500MB
MAX_FILES_PER_ZIP=100
# Storage
STORAGE_BASE=./storage
Step 4: Get Groq API Key
- Visit https://console.groq.com
- Sign up / Log in
- Go to "API Keys"
- Create new key
- Copy to
.envfile
Step 5: Verify Installation
# Test Groq connection
python test_groq_vision.py
# Run tests
pytest tests/ -v
# Start Streamlit app
streamlit run app.py
Usage
Quick Start
Start the app:
streamlit run app.pyOpen browser: http://localhost:8501
Upload ZIP containing:
- Business documents (PDF, DOCX)
- Spreadsheets (XLSX, CSV)
- Images (JPG, PNG)
- Videos (MP4, AVI)
Click "Start Processing"
View results in tabs:
- Results (documents, tables)
- Vision Analysis (image descriptions)
Command Line Usage
from backend.agents.file_discovery import FileDiscoveryAgent, FileDiscoveryInput
# Initialize agent
agent = FileDiscoveryAgent()
# Create input
input_data = FileDiscoveryInput(
zip_file_path="business_docs.zip",
job_id="job_001"
)
# Run discovery
output = agent.discover(input_data)
print(f"Discovered {output.total_files} files")
Batch Processing
from backend.agents.vision_agent import VisionAgent
# Initialize with Groq
agent = VisionAgent(provider="groq")
# Analyze multiple images
analyses = agent.analyze_batch(images, context="Product catalog")
for analysis in analyses:
print(f"{analysis.category.value}: {analysis.description}")
API Reference
File Discovery Agent
class FileDiscoveryAgent:
def discover(self, input: FileDiscoveryInput) -> FileDiscoveryOutput:
"""Extract ZIP and classify files"""
pass
Document Parsing Agent
class DocumentParsingAgent:
def parse(self, input: DocumentParsingInput) -> DocumentParsingOutput:
"""Parse documents and extract text/tables/images"""
pass
Vision Agent
class VisionAgent:
def analyze(self, input: VisionAnalysisInput) -> ImageAnalysis:
"""Analyze single image"""
pass
def analyze_batch(self, images: List[ExtractedImage], context: str) -> List[ImageAnalysis]:
"""Analyze multiple images"""
pass
Indexing Agent
class IndexingAgent:
def build_index(self, input: IndexingInput) -> PageIndex:
"""Build inverted index"""
pass
def retrieve_context(self, query: str, page_index: PageIndex, max_pages: int) -> Dict:
"""Retrieve relevant context"""
pass
Troubleshooting
Groq API Issues
Error: Groq API Key Missing
Solution:
# Check .env file
cat .env | grep GROQ_API_KEY
# Should show your actual key, not placeholder
GROQ_API_KEY=gsk_xxxxx
Error: Request Entity Too Large (413)
Solution: Images are automatically resized. If still failing, compress images before uploading.
Ollama Issues
Error: Cannot connect to Ollama
Solution:
# Start Ollama server
ollama serve
# Verify running
ollama list
Memory Issues
Error: Out of memory
Solution:
# Reduce concurrent processing
# In .env:
MAX_CONCURRENT_PARSING=3
MAX_CONCURRENT_VISION=2
Performance Issues
Slow processing:
- Check internet connection (Groq API requires internet)
- Reduce image sizes before upload
- Process fewer files at once
- Check Groq API status: https://status.groq.com
Testing
Run All Tests
pytest tests/ -v
Run Specific Agent Tests
# File Discovery
pytest tests/agents/test_file_discovery.py -v
# Document Parsing
pytest tests/agents/test_document_parsing.py -v
# Vision Agent
pytest tests/agents/test_vision_agent.py -v
# Indexing Agent
pytest tests/agents/test_indexing.py -v # (to be created)
Test Coverage
pytest tests/ --cov=backend --cov-report=html
start htmlcov/index.html # Windows
open htmlcov/index.html # macOS/Linux
Project Structure
digi-biz/
βββ backend/
β βββ agents/
β β βββ file_discovery.py β
Complete
β β βββ document_parsing.py β
Complete
β β βββ table_extraction.py β
Complete
β β βββ media_extraction.py β
Complete
β β βββ vision_agent.py β
Complete
β β βββ indexing.py β
Complete
β βββ models/
β β βββ schemas.py β
Complete
β β βββ enums.py β
Complete
β βββ utils/
β βββ storage_manager.py
β βββ file_classifier.py
β βββ logger.py
β βββ groq_vision_client.py
βββ tests/
β βββ agents/
β βββ test_file_discovery.py
β βββ test_document_parsing.py
β βββ test_table_extraction.py
β βββ test_media_extraction.py
β βββ test_vision_agent.py
βββ app.py β
Streamlit App
βββ requirements.txt
βββ .env.example
βββ docs/
βββ DOCUMENTATION.md β
This file
Performance Benchmarks
| Agent | Processing Time | Test Data |
|---|---|---|
| File Discovery | ~1-2s | 10 files ZIP |
| Document Parsing | ~50ms/doc | PDF 10 pages |
| Table Extraction | ~100ms/doc | 5 tables |
| Media Extraction | ~200ms/image | 5 images |
| Vision Analysis | ~2s/image | Groq API |
| Indexing | ~500ms | 50 pages |
End-to-End: <2 minutes for typical business folder (10 documents, 5 images)
License
MIT License - See LICENSE file for details
Support
- GitHub Issues: Report bugs and feature requests
- Documentation: This file + inline code comments
- Email: [Your contact here]
Last Updated: March 17, 2026
Version: 1.0.0
Status: Production Ready β