File size: 4,054 Bytes
52f6453
 
 
 
 
 
 
 
 
04f25f0
 
d20da3f
 
04f25f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ad3d8b0
 
 
04f25f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
title: Chatsmith App
emoji: πŸ¦€
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---

# ChatSMITH - Website to Chatbot Generator

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/umer6016/ChatSmith_3)

An intelligent AI system that automatically generates chatbots from any website URL using smart web scraping, gap detection, and multi-agent orchestration.

## ✨ Features (current stack)

- **Smart Website Scraping** - Directly extracts content from websites (PRIMARY SOURCE)
- **Intelligent Gap Detection** - Only runs web searches when necessary
- **JSON Knowledge Caching** - Instant load for previously processed websites
- **Polite Scraping** - Respects robots.txt, rate limiting, retry logic
- **React UI + FastAPI** - Auth, progress, and chat

## πŸ—οΈ Architecture

### Multi-Agent System

1. **Smart Website Scraper (PRIMARY SOURCE)**
   - Parallel page discovery and fetching
   - Respects robots.txt and rate limits
   - Retry logic with exponential backoff
   - Extracts and cleans HTML content

2. **Gap Detection Agent**
   - Analyzes extracted content completeness
   - Only triggers web search when confidence < 7/10
   - Recommends specific search queries

3. **Web Search Agent (SECONDARY SOURCE)**
   - Runs only when gaps are detected
   - Maximum 5 targeted searches (reduced from 15)
   - Results marked as secondary source

4. **Knowledge Storage System**
   - JSON files saved to `knowledge_files/`
   - URL-based caching (instant reload)
   - Source attribution (primary vs secondary)

5. **Chatbot Generator**
   - GPT-4o-mini powered responses
   - Priority: Homepage > Key pages > Blog > Web search
   - Context-aware answers

### Workflow

```
URL β†’ Check Cache β†’ [If cached: Load instantly]
                  β†’ [If not cached:]
                     β†’ Scrape Website (PRIMARY)
                     β†’ Analyze Gaps
                     β†’ Optional Web Search (SECONDARY)
                     β†’ Save to JSON Cache
                     β†’ Generate Chatbot
```

## πŸš€ Quick Start (current stack)

### Backend (FastAPI)
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

export OPENAI_API_KEY=your_openai_api_key_here
export SUPABASE_URL=https://your-project-id.supabase.co
export SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
export CORS_ALLOW_ORIGINS=http://localhost:5173,http://127.0.0.1:5173

uvicorn backend.app.main:app --reload --port 8000
```

### Frontend (Vite React)
```bash
cd frontend
cat > .env <<'EOF'
VITE_SUPABASE_URL=https://your-project-id.supabase.co
VITE_SUPABASE_ANON_KEY=your_supabase_anon_key
VITE_API_BASE_URL=http://127.0.0.1:8000/api
EOF
npm install
npm run dev   # opens on http://localhost:5173
```

### Optional metrics (feature-flagged)
- Set `ENABLE_METRICS_LOGGING=true` in your environment to capture Time-to-Chatbot-Ready (TCR), cache hit flags, and chat Q/A JSONL logs (`metrics_logs/chat_answers.jsonl`). Disabled by default to avoid any impact on existing flows.

### Usage
- Sign up (first/last/email/password) β†’ OTP β†’ auto-login.
- Generate chatbot: paste URL, optional Force refresh β†’ Run. A brief summary (pages scraped, web searches) shows, then the chatbot appears.
- Forgot password: email β†’ OTP β†’ new password (separate steps).

## πŸ“ Project Structure

```
backend/            # FastAPI app and pipeline copy
frontend/           # Vite React UI (auth, run, chat)
knowledge_files/    # Cached knowledge JSONs (used by pipeline)
requirements.txt    # Backend dependencies
README.md           # This file
```

## πŸ”’ Authentication (Supabase)

- Use OTP (not magic links) in Supabase email settings for signup and password reset.
- Backend uses `SUPABASE_SERVICE_ROLE_KEY`; frontend uses `SUPABASE_ANON_KEY`.
- Reset flow: email β†’ OTP β†’ new password.

## πŸ“ License

MIT License - See LICENSE file for details.

## 🀝 Contributing

Contributions welcome! Please see IMPROVEMENT_PLAN.md for planned enhancements.