Spaces:

Divs0910
/

Digi-Biz

Sleeping

App Files Files Community

Digi-Biz / docs /README.md

Deployment Bot

Automated deployment to Hugging Face

255cbd1 16 days ago

preview code

raw

history blame contribute delete

4.28 kB

Digi-Biz 📄

Agentic Business Digitization Framework

Transform unstructured business documents into structured digital profiles using AI agents.

🚀 Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Configure Groq API

Get your free API key at https://console.groq.com

Create .env file:

GROQ_API_KEY=gsk_your_key_here
GROQ_VISION_MODEL=meta-llama/llama-4-scout-17b-16e-instruct

3. Run the App

streamlit run app.py

Open http://localhost:8501

✨ Features

✅ Multi-Agent Pipeline - 6 specialized agents
✅ Groq Vision - Image analysis with Llama-4-Scout (17B)
✅ Vectorless RAG - Fast document retrieval
✅ Production-Ready - Error handling, validation, logging
✅ Interactive UI - Streamlit web interface

📊 What It Does

Upload ZIP with business documents
AI Agents Process:
- File Discovery → Classify files
- Document Parsing → Extract text/tables
- Table Extraction → Detect & classify
- Media Extraction → Extract images
- Vision Analysis → Describe images (Groq)
- Indexing → Build search index (RAG)
View Results in interactive UI

🎯 Example Use Cases

Restaurant Digitization

Upload: Menu PDFs, food photos, price lists
Output: Digital menu with prices, food descriptions, categories

Travel Agency

Upload: Tour brochures, itinerary PDFs, destination photos
Output: Tour packages with itineraries, pricing, descriptions

Retail Store

Upload: Product catalogs, inventory spreadsheets, product photos
Output: Product inventory with descriptions, prices, categories

📁 Project Structure

digi-biz/
├── backend/agents/        # 6 AI agents
├── backend/models/        # Data schemas
├── backend/utils/         # Utilities
├── tests/agents/          # Test suites
├── app.py                 # Streamlit app
├── requirements.txt       # Dependencies
└── docs/                  # Documentation

🧪 Testing

All agents are thoroughly tested:

# Run all tests
pytest tests/ -v

# Test coverage
pytest tests/ --cov=backend

Test Results: 66/66 tests passing ✅

📖 Documentation

Full Documentation - Complete guide
Agent Details - Agent specifications
Streamlit App - App usage guide

🔧 Configuration

Environment Variables (.env)

# Groq API (required)
GROQ_API_KEY=gsk_xxxxx
GROQ_MODEL=gpt-oss-120b
GROQ_VISION_MODEL=meta-llama/llama-4-scout-17b-16e-instruct

# Optional: Ollama fallback
OLLAMA_HOST=http://localhost:11434

# Processing limits
MAX_FILE_SIZE=524288000    # 500MB
MAX_FILES_PER_ZIP=100

🎓 Agents

#	Agent	Purpose	Status
1	File Discovery	Extract & classify ZIP files	✅
2	Document Parsing	Parse PDF/DOCX	✅
3	Table Extraction	Detect & classify tables	✅
4	Media Extraction	Extract images/videos	✅
5	Vision Agent	Analyze images (Groq)	✅
6	Indexing Agent	Build RAG index	✅

📊 Performance

Task	Time
File Discovery (10 files)	~1-2s
Document Parsing (10 pages)	~0.5s
Table Extraction (5 tables)	~0.5s
Vision Analysis (1 image)	~2s
Total (typical folder)	<2 min

🛠️ Tech Stack

Backend: Python 3.10+, Pydantic, asyncio
Document Parsing: pdfplumber, python-docx, openpyxl
Vision AI: Groq API (Llama-4-Scout-17B)
Frontend: Streamlit
Testing: pytest

📝 License

MIT License - See LICENSE for details

🤝 Contributing

Fork the repo
Create feature branch
Add tests
Submit PR

📞 Support

Issues: GitHub Issues
Docs: docs/DOCUMENTATION.md

Made with ❤️ using AI Agents