Spaces:

claudqunwang
/

ClareCourseWare

Sleeping

File size: 3,117 Bytes

c8c6034

---
title: Hanbridge Clare Assistant (Product UI)
emoji: 💬
colorFrom: yellow
colorTo: purple
sdk: docker
pinned: false
license: mit
---

# Hanbridge Clare Assistant – Product Version

This Space hosts **Clare**, an AI-powered personalized learning assistant for Hanbridge University.

## 运行方式（推荐：产品版 Web UI）

**使用 React 产品界面（Hanbridge 仪表盘风格：Ask / Review / Quiz、侧边栏、SmartReview 等）：**

```bash
# 1. 安装 Python 依赖（项目根目录）
pip install -r requirements.txt

# 2. 配置 .env（至少设置 OPENAI_API_KEY）

# 3. 一键启动（会自动构建 web 并启动后端，浏览器访问 http://localhost:8000）
chmod +x run_web.sh && ./run_web.sh
```

或分步执行：

```bash
cd web && npm install && npm run build
cd .. && uvicorn api.server:app --host 0.0.0.0 --port 8000
```

更多说明见 **web/使用说明.md**。

**可选：Gradio 界面**（根目录 `python app.py`，端口 7860）适用于快速演示或 Hugging Face Space 的 Gradio 版；产品部署推荐使用上述 Web UI。

## Architecture Overview

- **Frontend**: React + Vite (exported from Figma design)
- **Backend**: FastAPI (Python)
- **LLM Orchestration**: OpenAI + LangChain
- **RAG**: Vector database (FAISS) + OpenAI embeddings (text-embedding-3-small)
- **PDF Parsing**: unstructured.io (priority) + pypdf (fallback)
- **Observability**: LangSmith
- **Deployment**: Hugging Face Docker Space

### Optional: Text-to-Speech & Podcast

- **TTS**: Uses the same **OpenAI API key** (no extra secrets). Right panel: “Listen (TTS)” converts the current export/summary text to speech.
- **Podcast**: “Podcast (summary)” or “Podcast (chat)” generates an MP3 from the session summary or full conversation.
- **Hugging Face**: Set `OPENAI_API_KEY` in the Space **Settings → Secrets**. No extra env vars needed. For long podcasts, the Space may need sufficient timeout (default backend allows up to 2 minutes for `/api/podcast`).

```
📦 project/
 ├── app.py
 ├── api/
 │   ├── server.py
 │   ├── clare_core.py
 │   ├── rag_engine.py   ← RAG with vector DB (FAISS) + embeddings
 │   └── tts_podcast.py  ← TTS & podcast (OpenAI TTS)
 ├── web/                ← React frontend
 └── requirements.txt

```

### RAG with Vector Database

- **Embeddings**: OpenAI `text-embedding-3-small` (1536 dimensions)
- **Vector Storage**: FAISS (in-memory, L2 distance)
- **Retrieval Strategy**: Vector similarity search + token overlap rerank
- **PDF Parsing**: 
  - Primary: `unstructured.io` (better quality, handles complex layouts)
  - Fallback: `pypdf` (if unstructured fails)
- **Backward Compatible**: Falls back to token-based retrieval if embeddings unavailable

### Optional: GenAICoursesDB 向量知识库（方案三）

Clare 可调用 Hugging Face 上的 **GenAICoursesDB** Space 获取 GENAI 课程检索结果。设置 `GENAI_COURSES_SPACE=claudqunwang/GenAICoursesDB` 即可启用；Clare 会在每次对话时自动将课程知识库的检索结果补充到 RAG 上下文中。