ClareCourseWare / README.md
claudqunwang's picture
Add Clare product UI: run_web.sh, README, exclude hf_space from push
c8c6034
---
title: Hanbridge Clare Assistant (Product UI)
emoji: 💬
colorFrom: yellow
colorTo: purple
sdk: docker
pinned: false
license: mit
---
# Hanbridge Clare Assistant – Product Version
This Space hosts **Clare**, an AI-powered personalized learning assistant for Hanbridge University.
## 运行方式(推荐:产品版 Web UI)
**使用 React 产品界面(Hanbridge 仪表盘风格:Ask / Review / Quiz、侧边栏、SmartReview 等):**
```bash
# 1. 安装 Python 依赖(项目根目录)
pip install -r requirements.txt
# 2. 配置 .env(至少设置 OPENAI_API_KEY)
# 3. 一键启动(会自动构建 web 并启动后端,浏览器访问 http://localhost:8000)
chmod +x run_web.sh && ./run_web.sh
```
或分步执行:
```bash
cd web && npm install && npm run build
cd .. && uvicorn api.server:app --host 0.0.0.0 --port 8000
```
更多说明见 **web/使用说明.md**
**可选:Gradio 界面**(根目录 `python app.py`,端口 7860)适用于快速演示或 Hugging Face Space 的 Gradio 版;产品部署推荐使用上述 Web UI。
## Architecture Overview
- **Frontend**: React + Vite (exported from Figma design)
- **Backend**: FastAPI (Python)
- **LLM Orchestration**: OpenAI + LangChain
- **RAG**: Vector database (FAISS) + OpenAI embeddings (text-embedding-3-small)
- **PDF Parsing**: unstructured.io (priority) + pypdf (fallback)
- **Observability**: LangSmith
- **Deployment**: Hugging Face Docker Space
### Optional: Text-to-Speech & Podcast
- **TTS**: Uses the same **OpenAI API key** (no extra secrets). Right panel: “Listen (TTS)” converts the current export/summary text to speech.
- **Podcast**: “Podcast (summary)” or “Podcast (chat)” generates an MP3 from the session summary or full conversation.
- **Hugging Face**: Set `OPENAI_API_KEY` in the Space **Settings → Secrets**. No extra env vars needed. For long podcasts, the Space may need sufficient timeout (default backend allows up to 2 minutes for `/api/podcast`).
```
📦 project/
├── app.py
├── api/
│ ├── server.py
│ ├── clare_core.py
│ ├── rag_engine.py ← RAG with vector DB (FAISS) + embeddings
│ └── tts_podcast.py ← TTS & podcast (OpenAI TTS)
├── web/ ← React frontend
└── requirements.txt
```
### RAG with Vector Database
- **Embeddings**: OpenAI `text-embedding-3-small` (1536 dimensions)
- **Vector Storage**: FAISS (in-memory, L2 distance)
- **Retrieval Strategy**: Vector similarity search + token overlap rerank
- **PDF Parsing**:
- Primary: `unstructured.io` (better quality, handles complex layouts)
- Fallback: `pypdf` (if unstructured fails)
- **Backward Compatible**: Falls back to token-based retrieval if embeddings unavailable
### Optional: GenAICoursesDB 向量知识库(方案三)
Clare 可调用 Hugging Face 上的 **GenAICoursesDB** Space 获取 GENAI 课程检索结果。设置 `GENAI_COURSES_SPACE=claudqunwang/GenAICoursesDB` 即可启用;Clare 会在每次对话时自动将课程知识库的检索结果补充到 RAG 上下文中。