engresearch's picture
Upload folder using huggingface_hub
7f88bdf verified
This guide outlines a lean, high-velocity technical implementation for your **Tender-Winning Engine**, pivoting from a directory to an AI-powered execution platform. The goal is to deploy a functional "Pipeline" (Upload β†’ Extract β†’ Generate β†’ Charge) within **30 days**.
### **1. High-Level System Architecture**
The architecture follows the **2026 AI SaaS MVP standard**, prioritizing speed of delivery over complex infrastructure.
* **Frontend:** **Next.js 15**. It provides Server-Side Rendering (SSR) for fast performance on weak mobile data connections common in Kenyan county offices.
* **Backend:** A single **Node.js or Python (FastAPI)** backend in a monorepo for simplicity.
* **Database & Auth:** **Supabase (PostgreSQL + pgvector)**. This handles relational data, authentication, and vector storage for RAG (Retrieval-Augmented Generation) in one place.
* **File Storage:** **AWS S3** or Supabase Storage for storing original RFP PDFs and generated proposal drafts.
* **AI Engine:** **Claude 3.5 Sonnet API**. Claude is specifically recommended for its superior ability to reason over long, complex legal clauses and tables found in Kenyan government RFPs.
---
### **2. Document Processing Pipeline (The "Brain")**
This is your primary technical moat. You must handle the "scanned garbage" PDFs often found on government portals.
1. **Extraction Layer:**
* Use **pdfplumber** for clean, digital-first PDFs.
* Use **Tesseract OCR** as a fallback for scanned documents.
2. **Preprocessing:** Normalize text by removing headers/footers and splitting documents into logical sections (Preliminary, Technical, Financial).
3. **Structure Detection:** Prompt the LLM to output **STRICT JSON**. Identify mandatory documents (Bid bonds, Tax certificates, NCA levels), evaluation criteria, and deadlines.
4. **RAG Layer:** Store extracted requirements in **pgvector**. This allows the AI to anchor the generated proposal text to the specific RFP content, preventing hallucinations.
---
### **3. Database Schema (PostgreSQL)**
Keep the initial schema minimal to support the core "Job" workflow.
* **Users Table:** `id`, `email`, `plan_type` (Free, Pro, White-label), `company_profile` (JSON for AGPO status, NCA category, etc.).
* **Jobs Table:** `id`, `user_id`, `file_url`, `status` (Processing, Completed), `created_at`.
* **Results Table:** `job_id`, `structured_output` (JSON of requirements), `proposal_draft_text`, `risk_score`.
---
### **4. Payment Integration (Revenue Reality)**
For the Kenyan market, **M-Pesa Daraja 3.0 API** is non-negotiable.
* **Workflow:** User uploads PDF β†’ AI extracts basic metadata β†’ User hits "Unlock Full Analysis/Draft" β†’ M-Pesa STK Push (KES 999 or KES 4,999) β†’ Webhook updates `plan_type` or `job_status`.
* **Conversion:** Native M-Pesa integration yields significantly higher conversion than card-only systems like Stripe in Kenya.
---
### **5. 30-Day Build & Deployment Plan**
Deploy in **days, not weeks** to validate demand manually before scaling.
* **Week 1 (Infrastructure):** Setup Next.js + Supabase; build the PDF upload UI and S3 storage.
* **Week 2 (AI Core):** Implement the PDF-to-JSON extraction pipeline using Claude 3.5. Refine system prompts for Kenyan procurement law (PPADA 2015).
* **Week 3 (Output):** Build the **Bid/No-Bid Scorer** and the proposal drafting engine. Integrate M-Pesa callbacks.
* **Week 4 (Launch):** Deploy to **Vercel** (Frontend) and **HF Spaces** (Backend). Hand-sell to 10–20 contractors via WhatsApp groups.
---
### **6. Critical Technical Risks to Mitigate**
* **Garbage In, Garbage Out:** Kenyan tenders are often poor quality. Your prompt engineering must be robust, and OCR fallbacks are mandatory from day one.
* **Data Scraper Hell:** Do **not** start by building scrapers for dozens of portals. Let users upload the PDFs they already have; focus engineering on *processing* the document, not finding it.
* **Security:** If you pursue the **White-label tier** for consultants, multi-tenancy and strict data isolation between different consulting firms are required to prevent data leaks.