Upload folder using huggingface_hub

7f88bdf verified about 2 months ago

4.22 kB

This guide outlines a lean, high-velocity technical implementation for your Tender-Winning Engine, pivoting from a directory to an AI-powered execution platform. The goal is to deploy a functional "Pipeline" (Upload → Extract → Generate → Charge) within 30 days.

1. High-Level System Architecture

The architecture follows the 2026 AI SaaS MVP standard, prioritizing speed of delivery over complex infrastructure.

Frontend: Next.js 15. It provides Server-Side Rendering (SSR) for fast performance on weak mobile data connections common in Kenyan county offices.
Backend: A single Node.js or Python (FastAPI) backend in a monorepo for simplicity.
Database & Auth: Supabase (PostgreSQL + pgvector). This handles relational data, authentication, and vector storage for RAG (Retrieval-Augmented Generation) in one place.
File Storage: AWS S3 or Supabase Storage for storing original RFP PDFs and generated proposal drafts.
AI Engine: Claude 3.5 Sonnet API. Claude is specifically recommended for its superior ability to reason over long, complex legal clauses and tables found in Kenyan government RFPs.

2. Document Processing Pipeline (The "Brain")

This is your primary technical moat. You must handle the "scanned garbage" PDFs often found on government portals.

Extraction Layer:
- Use pdfplumber for clean, digital-first PDFs.
- Use Tesseract OCR as a fallback for scanned documents.
Preprocessing: Normalize text by removing headers/footers and splitting documents into logical sections (Preliminary, Technical, Financial).
Structure Detection: Prompt the LLM to output STRICT JSON. Identify mandatory documents (Bid bonds, Tax certificates, NCA levels), evaluation criteria, and deadlines.
RAG Layer: Store extracted requirements in pgvector. This allows the AI to anchor the generated proposal text to the specific RFP content, preventing hallucinations.

3. Database Schema (PostgreSQL)

Keep the initial schema minimal to support the core "Job" workflow.

Users Table: id, email, plan_type (Free, Pro, White-label), company_profile (JSON for AGPO status, NCA category, etc.).
Jobs Table: id, user_id, file_url, status (Processing, Completed), created_at.
Results Table: job_id, structured_output (JSON of requirements), proposal_draft_text, risk_score.

4. Payment Integration (Revenue Reality)

For the Kenyan market, M-Pesa Daraja 3.0 API is non-negotiable.

Workflow: User uploads PDF → AI extracts basic metadata → User hits "Unlock Full Analysis/Draft" → M-Pesa STK Push (KES 999 or KES 4,999) → Webhook updates plan_type or job_status.
Conversion: Native M-Pesa integration yields significantly higher conversion than card-only systems like Stripe in Kenya.

5. 30-Day Build & Deployment Plan

Deploy in days, not weeks to validate demand manually before scaling.

Week 1 (Infrastructure): Setup Next.js + Supabase; build the PDF upload UI and S3 storage.
Week 2 (AI Core): Implement the PDF-to-JSON extraction pipeline using Claude 3.5. Refine system prompts for Kenyan procurement law (PPADA 2015).
Week 3 (Output): Build the Bid/No-Bid Scorer and the proposal drafting engine. Integrate M-Pesa callbacks.
Week 4 (Launch): Deploy to Vercel (Frontend) and HF Spaces (Backend). Hand-sell to 10–20 contractors via WhatsApp groups.

6. Critical Technical Risks to Mitigate

Garbage In, Garbage Out: Kenyan tenders are often poor quality. Your prompt engineering must be robust, and OCR fallbacks are mandatory from day one.
Data Scraper Hell: Do not start by building scrapers for dozens of portals. Let users upload the PDFs they already have; focus engineering on processing the document, not finding it.
Security: If you pursue the White-label tier for consultants, multi-tenancy and strict data isolation between different consulting firms are required to prevent data leaks.