Digi-Biz / docs /CLEAN_STRUCTURE.md
Deployment Bot
Automated deployment to Hugging Face
255cbd1

Digi-Biz - Clean Project Structure

βœ… CLEANED UP!

All documentation moved to docs/, unused files removed.


πŸ“ Final Structure

digi-biz/
β”œβ”€β”€ πŸ“„ Core Files
β”‚   β”œβ”€β”€ app.py                      # Streamlit app (MAIN)
β”‚   β”œβ”€β”€ api.py                      # FastAPI backend (alternative)
β”‚   β”œβ”€β”€ requirements.txt            # Python dependencies
β”‚   β”œβ”€β”€ .env                        # Environment variables
β”‚   └── .env.example                # Example env file
β”‚
β”œβ”€β”€ πŸ€– Backend (Python)
β”‚   β”œβ”€β”€ backend/
β”‚   β”‚   β”œβ”€β”€ api/main.py            # FastAPI server
β”‚   β”‚   β”œβ”€β”€ agents/                # 8 AI agents
β”‚   β”‚   β”‚   β”œβ”€β”€ file_discovery.py
β”‚   β”‚   β”‚   β”œβ”€β”€ document_parsing.py
β”‚   β”‚   β”‚   β”œβ”€β”€ table_extraction.py
β”‚   β”‚   β”‚   β”œβ”€β”€ media_extraction.py
β”‚   β”‚   β”‚   β”œβ”€β”€ vision_agent.py
β”‚   β”‚   β”‚   β”œβ”€β”€ indexing.py
β”‚   β”‚   β”‚   β”œβ”€β”€ schema_mapping_v2.py  # NEW - Generic extraction
β”‚   β”‚   β”‚   └── validation_agent.py
β”‚   β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”‚   β”œβ”€β”€ schemas.py         # Data models
β”‚   β”‚   β”‚   └── enums.py
β”‚   β”‚   β”œβ”€β”€ parsers/               # Document parsers
β”‚   β”‚   └── utils/                 # Utilities
β”‚
β”œβ”€β”€ 🌐 Frontend (Next.js - Optional)
β”‚   β”œβ”€β”€ frontend/
β”‚   β”‚   β”œβ”€β”€ src/app/              # Next.js pages
β”‚   β”‚   β”œβ”€β”€ src/lib/api.ts        # API client
β”‚   β”‚   └── package.json
β”‚
β”œβ”€β”€ πŸ“š Documentation
β”‚   β”œβ”€β”€ docs/                     # ALL .md files moved here
β”‚   β”‚   β”œβ”€β”€ README.md             # Project overview
β”‚   β”‚   β”œβ”€β”€ HACKATHON_QUICKSTART.md
β”‚   β”‚   β”œβ”€β”€ CURRENT_STATUS.md
β”‚   β”‚   └── [20+ more docs]
β”‚   └── README.md                 # Main README (root)
β”‚
β”œβ”€β”€ πŸ’Ύ Storage
β”‚   └── storage/
β”‚       β”œβ”€β”€ profiles/             # Generated profiles (JSON)
β”‚       └── extracted/            # Extracted media
β”‚
└── πŸ§ͺ Tests
    └── tests/
        └── agents/               # Agent tests

🎯 What's Kept

Essential Files:

  • βœ… app.py - Streamlit app (primary interface)
  • βœ… backend/ - All Python backend code
  • βœ… requirements.txt - Dependencies
  • βœ… .env - Configuration

Documentation:

  • βœ… All .md files β†’ docs/ folder
  • βœ… README.md - Clean, hackathon-ready

Optional:

  • ⚠️ frontend/ - Next.js (can be removed if not using)
  • ⚠️ tests/ - Unit tests (keep for development)

πŸ—‘οΈ What Was Removed

  • ❌ test_*.py files (root level)
  • ❌ debug_*.py files
  • ❌ resume.py
  • ❌ Old agent versions (schema_mapping.py, schema_mapping_simple.py)
  • ❌ Duplicate/unused files

πŸš€ Quick Start (Clean)

# 1. Install
pip install -r requirements.txt

# 2. Configure
cp .env.example .env
# Edit .env with your Groq API key

# 3. Run
streamlit run app.py

πŸ“Š File Count

Category Count
Core Files 5
Backend Agents 8
Backend Utils 6
Documentation 26 (in docs/)
Tests 5
Total Python Files ~30

Clean, organized, and hackathon-ready! πŸŽ‰