Spaces:
Running
Running
File size: 4,054 Bytes
52f6453 04f25f0 d20da3f 04f25f0 ad3d8b0 04f25f0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
---
title: Chatsmith App
emoji: π¦
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---
# ChatSMITH - Website to Chatbot Generator
[](https://huggingface.co/spaces/umer6016/ChatSmith_3)
An intelligent AI system that automatically generates chatbots from any website URL using smart web scraping, gap detection, and multi-agent orchestration.
## β¨ Features (current stack)
- **Smart Website Scraping** - Directly extracts content from websites (PRIMARY SOURCE)
- **Intelligent Gap Detection** - Only runs web searches when necessary
- **JSON Knowledge Caching** - Instant load for previously processed websites
- **Polite Scraping** - Respects robots.txt, rate limiting, retry logic
- **React UI + FastAPI** - Auth, progress, and chat
## ποΈ Architecture
### Multi-Agent System
1. **Smart Website Scraper (PRIMARY SOURCE)**
- Parallel page discovery and fetching
- Respects robots.txt and rate limits
- Retry logic with exponential backoff
- Extracts and cleans HTML content
2. **Gap Detection Agent**
- Analyzes extracted content completeness
- Only triggers web search when confidence < 7/10
- Recommends specific search queries
3. **Web Search Agent (SECONDARY SOURCE)**
- Runs only when gaps are detected
- Maximum 5 targeted searches (reduced from 15)
- Results marked as secondary source
4. **Knowledge Storage System**
- JSON files saved to `knowledge_files/`
- URL-based caching (instant reload)
- Source attribution (primary vs secondary)
5. **Chatbot Generator**
- GPT-4o-mini powered responses
- Priority: Homepage > Key pages > Blog > Web search
- Context-aware answers
### Workflow
```
URL β Check Cache β [If cached: Load instantly]
β [If not cached:]
β Scrape Website (PRIMARY)
β Analyze Gaps
β Optional Web Search (SECONDARY)
β Save to JSON Cache
β Generate Chatbot
```
## π Quick Start (current stack)
### Backend (FastAPI)
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export OPENAI_API_KEY=your_openai_api_key_here
export SUPABASE_URL=https://your-project-id.supabase.co
export SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
export CORS_ALLOW_ORIGINS=http://localhost:5173,http://127.0.0.1:5173
uvicorn backend.app.main:app --reload --port 8000
```
### Frontend (Vite React)
```bash
cd frontend
cat > .env <<'EOF'
VITE_SUPABASE_URL=https://your-project-id.supabase.co
VITE_SUPABASE_ANON_KEY=your_supabase_anon_key
VITE_API_BASE_URL=http://127.0.0.1:8000/api
EOF
npm install
npm run dev # opens on http://localhost:5173
```
### Optional metrics (feature-flagged)
- Set `ENABLE_METRICS_LOGGING=true` in your environment to capture Time-to-Chatbot-Ready (TCR), cache hit flags, and chat Q/A JSONL logs (`metrics_logs/chat_answers.jsonl`). Disabled by default to avoid any impact on existing flows.
### Usage
- Sign up (first/last/email/password) β OTP β auto-login.
- Generate chatbot: paste URL, optional Force refresh β Run. A brief summary (pages scraped, web searches) shows, then the chatbot appears.
- Forgot password: email β OTP β new password (separate steps).
## π Project Structure
```
backend/ # FastAPI app and pipeline copy
frontend/ # Vite React UI (auth, run, chat)
knowledge_files/ # Cached knowledge JSONs (used by pipeline)
requirements.txt # Backend dependencies
README.md # This file
```
## π Authentication (Supabase)
- Use OTP (not magic links) in Supabase email settings for signup and password reset.
- Backend uses `SUPABASE_SERVICE_ROLE_KEY`; frontend uses `SUPABASE_ANON_KEY`.
- Reset flow: email β OTP β new password.
## π License
MIT License - See LICENSE file for details.
## π€ Contributing
Contributions welcome! Please see IMPROVEMENT_PLAN.md for planned enhancements.
|