rooting-future / .dev /REFACTOR_SPEC.md
mtornani's picture
Initial HF Spaces deployment (clean branch without large binaries)
38f9c15

Rooting Future - Spec-Driven Refactoring Plan

v6.1: Modularization & Simplification

Goal: Ridurre da 41k a ~20k LOC senza perdere feature, migliorando manutenibilitΓ 


πŸ“Š Analisi Attuale

Total Lines:    41,238 LOC
Core Files:
  - app.py                5,056 LOC (96 functions) ⚠️ MONOLITH
  - app_backup.py         1,691 LOC (duplicate)
  - knowledge_store.py    1,395 LOC (50 functions)
  - agents.py             1,390 LOC (27 functions)
  - executive_report.py   1,343 LOC
  - structured_renderer.py 1,083 LOC

Problemi Identificati:

  1. ❌ app.py è un monolite (96 route + business logic)
  2. ❌ Duplicazione: app_backup.py (1,691 LOC inutili)
  3. ❌ Export layer: 6 file separati nonostante BaseExporter
  4. ❌ Rendering: structured_renderer.py + executive_report.py + methodology_section.py = duplicazioni
  5. ❌ Data pipeline: data_sourcing.py + data_ingestor.py + data_estimator.py = overlap

🎯 Target Architecture (Spec-Driven)

rooting_future/
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ models.py              # Data models (consolidate data_models.py)
β”‚   └── config.py              # Configuration management
β”‚
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ routes_plans.py        # /api/generate, /api/generate-from-docx
β”‚   β”œβ”€β”€ routes_upload.py       # /api/upload-docx, /api/check-analysis
β”‚   β”œβ”€β”€ routes_export.py       # /api/export/*, /download/*
β”‚   β”œβ”€β”€ routes_editor.py       # /api/edit/*, /api/approve/*
β”‚   └── routes_system.py       # /api/system/*, /api/stream/*
β”‚
β”œβ”€β”€ domain/
β”‚   β”œβ”€β”€ agents/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ orchestrator.py    # MultiAgentOrchestrator
β”‚   β”‚   β”œβ”€β”€ async_client.py    # AsyncGeminiClient
β”‚   β”‚   └── specialized.py     # STW agents
β”‚   β”‚
β”‚   β”œβ”€β”€ export/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ base.py            # BaseExporter (giΓ  fatto OPT-003)
β”‚   β”‚   └── exporters.py       # Tutti gli esportatori in 1 file
β”‚   β”‚
β”‚   β”œβ”€β”€ ingestion/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── pipeline.py        # Unify data_sourcing + data_ingestor + data_estimator
β”‚   β”‚
β”‚   └── rendering/
β”‚       β”œβ”€β”€ __init__.py
β”‚       └── renderer.py        # Unify structured_renderer + executive_report + methodology
β”‚
β”œβ”€β”€ storage/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ knowledge.py           # KnowledgeStore
β”‚   β”œβ”€β”€ file_search.py         # FileSearchManager
β”‚   └── cache.py               # Web research cache
β”‚
β”œβ”€β”€ web/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ app.py                 # Flask app init + middleware (< 200 LOC)
β”‚   └── views.py               # Frontend routes (render templates only)
β”‚
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ colors.py              # club_identity.py
β”‚   β”œβ”€β”€ stw.py                 # stw_matrix.py + stw_analyzer.py
β”‚   └── web.py                 # web_research.py
β”‚
└── app.py (NEW)               # Entry point (< 50 LOC)

πŸ“‹ Refactoring Tasks (Spec-Driven)

PHASE 1: Quick Wins (2-3 ore)

REF-001: Delete Dead Code

Priority: HIGH | Impact: -1,691 LOC | Risk: ZERO

# Spec:
- Delete app_backup.py (obsolete)
- Delete temp_*.py files
- Delete *.pyc and __pycache__
- Delete unused imports (run autoflake)

Verification:

  • python app.py starts without errors
  • All tests pass (if any)

REF-002: Consolidate Export Layer

Priority: HIGH | Impact: -800 LOC | Risk: LOW

# Spec: Merge all exporters into domain/export/exporters.py

# File: domain/export/exporters.py
from .base import BaseExporter

class PDFExporter(BaseExporter):
    """Server-side PDF via WeasyPrint"""
    # Move from export_pdf_server.py

class HTMLExporter(BaseExporter):
    """HTML export"""
    # Move from export_html.py

class DOCXExporter(BaseExporter):
    """DOCX export"""
    # Move from export_docx.py

class PagedHTMLExporter(BaseExporter):
    """Paged HTML export"""
    # Move from export_paged.py

class OnePagerExporter(BaseExporter):
    """One-page summary"""
    # Move from export_onepager.py

# Single import point:
from domain.export.exporters import (
    PDFExporter, HTMLExporter, DOCXExporter,
    PagedHTMLExporter, OnePagerExporter
)

Verification:

  • All export formats work
  • PDF generation completes in < 10s
  • Backward compatibility maintained

REF-003: Consolidate Rendering Layer

Priority: MEDIUM | Impact: -1,500 LOC | Risk: MEDIUM

# Spec: Merge structured_renderer + executive_report + methodology_section

# File: domain/rendering/renderer.py
class PlanRenderer:
    """Unified plan rendering"""

    def render_structured(self, plan, metadata):
        """Structured 7-section output"""
        pass

    def render_executive(self, plan, metadata):
        """Executive summary"""
        pass

    def add_methodology(self, content):
        """Append methodology section"""
        pass

    def render_stw_matrix(self, stw_data):
        """Render STW matrix"""
        pass

# Single import:
from domain.rendering import PlanRenderer
renderer = PlanRenderer()

Verification:

  • Generated plans have all 7 sections
  • Executive summary renders correctly
  • Methodology section present

PHASE 2: Modularization (1-2 giorni)

REF-004: Split app.py into API Routes

Priority: HIGH | Impact: -4,000 LOC | Risk: MEDIUM

# Spec: Decompose app.py monolith

# Current:
app.py (5,056 LOC, 96 functions) β†’ TOO BIG!

# Target:
web/app.py              (150 LOC) - Flask init + middleware
web/views.py            (200 LOC) - Frontend routes
api/routes_plans.py     (400 LOC) - Plan generation
api/routes_upload.py    (300 LOC) - Upload & analysis
api/routes_export.py    (300 LOC) - Export endpoints
api/routes_editor.py    (400 LOC) - Editor endpoints
api/routes_system.py    (150 LOC) - System & SSE

# Main entry point:
app.py (NEW):
    from web.app import create_app
    app = create_app()
    if __name__ == '__main__':
        app.run()

Migration Strategy:

  1. Create api/ folder
  2. Move routes one module at a time
  3. Test after each module
  4. Update imports incrementally

Verification:

  • All existing routes still work
  • /api/generate β†’ 200 OK
  • /api/upload-docx β†’ 200 OK
  • Dashboard loads correctly

REF-005: Unify Data Ingestion Pipeline

Priority: MEDIUM | Impact: -1,200 LOC | Risk: MEDIUM

# Spec: Merge data_sourcing + data_ingestor + data_estimator

# File: domain/ingestion/pipeline.py
class DataIngestionPipeline:
    """Unified data ingestion"""

    def __init__(self):
        self.docx_ingestor = DOCXIngestor()
        self.estimator = DataEstimator()
        self.web_sourcer = WebResearchSourcer()

    def ingest_stakeholder_docs(self, files):
        """Full pipeline: DOCX β†’ extract β†’ estimate β†’ merge"""
        pass

    def synthesize_inputs(self, stakeholder_data):
        """Conflict resolution + alignment score"""
        pass

    def enrich_with_web_data(self, club_name, context):
        """Web research augmentation"""
        pass

# Single import:
from domain.ingestion import DataIngestionPipeline
pipeline = DataIngestionPipeline()

Verification:

  • Stakeholder upload works
  • Alignment dashboard appears
  • Conflict detection active

PHASE 3: Storage Abstraction (1 giorno)

REF-006: Split Storage Layer

Priority: LOW | Impact: -500 LOC | Risk: LOW

# Spec: Separate concerns in storage/

# File: storage/knowledge.py
class KnowledgeStore:
    """Plans and documents"""
    pass

# File: storage/cache.py
class CacheStore:
    """Web research + API cache"""
    pass

# File: storage/file_search.py
class FileSearchManager:
    """Gemini File Search integration"""
    pass

Verification:

  • Plan saves/loads work
  • Cache hit rate > 80%
  • File search uploads succeed

πŸ“ˆ Expected Results

Metric Before After Improvement
Total LOC 41,238 ~20,000 -51%
app.py LOC 5,056 ~150 -97%
Export files 6 files 1 file -83%
Rendering files 3 files 1 file -66%
Ingestion files 3 files 1 file -66%
Maintainability Low High βœ…

πŸš€ Implementation Order

Week 1: Quick Wins

  1. βœ… REF-001 - Delete dead code (30 min)
  2. βœ… REF-002 - Consolidate exports (3 ore)
  3. βœ… REF-003 - Consolidate rendering (5 ore)

Week 2: Core Refactor

  1. REF-004 - Split app.py (2 giorni)
  2. REF-005 - Unify ingestion (1 giorno)

Week 3: Polish

  1. REF-006 - Split storage (1 giorno)
  2. Testing & Validation (2 giorni)

βœ… Success Criteria

  • Total LOC < 22,000
  • app.py < 200 LOC
  • All features work identically
  • Performance unchanged or better
  • Zero regressions
  • Clean git history (1 commit per REF-XXX)

πŸ›‘οΈ Safety Rules

  1. Never delete functionality - only reorganize
  2. Test after each step - don't batch refactors
  3. Keep backups - git commit before each REF-XXX
  4. Backward compatibility - old imports still work
  5. Document changes - update this spec after each task

πŸ“ Notes

  • Questo Γ¨ un living document - aggiorna dopo ogni task
  • Usa git commit -m "refactor(REF-XXX): description" format
  • Se qualcosa si rompe, git revert e rivaluta
  • PrioritΓ : working code > clean code