Spaces:

claudqunwang
/

ClareCourseWare

Sleeping

App Files Files Community

ClareCourseWare / README.md

claudqunwang

Add Clare product UI: run_web.sh, README, exclude hf_space from push

c8c6034 24 days ago

preview code

raw

history blame contribute delete

3.12 kB

	---
	title: Hanbridge Clare Assistant (Product UI)
	emoji: 💬
	colorFrom: yellow
	colorTo: purple
	sdk: docker
	pinned: false
	license: mit
	---

	# Hanbridge Clare Assistant – Product Version

	This Space hosts Clare, an AI-powered personalized learning assistant for Hanbridge University.

	## 运行方式（推荐：产品版 Web UI）

	使用 React 产品界面（Hanbridge 仪表盘风格：Ask / Review / Quiz、侧边栏、SmartReview 等）：

	```bash
	# 1. 安装 Python 依赖（项目根目录）
	pip install -r requirements.txt

	# 2. 配置 .env（至少设置 OPENAI_API_KEY）

	# 3. 一键启动（会自动构建 web 并启动后端，浏览器访问 http://localhost:8000）
	chmod +x run_web.sh && ./run_web.sh
	```

	或分步执行：

	```bash
	cd web && npm install && npm run build
	cd .. && uvicorn api.server:app --host 0.0.0.0 --port 8000
	```

	更多说明见 web/使用说明.md。

	可选：Gradio 界面（根目录 `python app.py`，端口 7860）适用于快速演示或 Hugging Face Space 的 Gradio 版；产品部署推荐使用上述 Web UI。

	## Architecture Overview

	- Frontend: React + Vite (exported from Figma design)
	- Backend: FastAPI (Python)
	- LLM Orchestration: OpenAI + LangChain
	- RAG: Vector database (FAISS) + OpenAI embeddings (text-embedding-3-small)
	- PDF Parsing: unstructured.io (priority) + pypdf (fallback)
	- Observability: LangSmith
	- Deployment: Hugging Face Docker Space

	### Optional: Text-to-Speech & Podcast

	- TTS: Uses the same OpenAI API key (no extra secrets). Right panel: “Listen (TTS)” converts the current export/summary text to speech.
	- Podcast: “Podcast (summary)” or “Podcast (chat)” generates an MP3 from the session summary or full conversation.
	- Hugging Face: Set `OPENAI_API_KEY` in the Space Settings → Secrets. No extra env vars needed. For long podcasts, the Space may need sufficient timeout (default backend allows up to 2 minutes for `/api/podcast`).

	```
	📦 project/
	├── app.py
	├── api/
	│ ├── server.py
	│ ├── clare_core.py
	│ ├── rag_engine.py ← RAG with vector DB (FAISS) + embeddings
	│ └── tts_podcast.py ← TTS & podcast (OpenAI TTS)
	├── web/ ← React frontend
	└── requirements.txt

	```

	### RAG with Vector Database

	- Embeddings: OpenAI `text-embedding-3-small` (1536 dimensions)
	- Vector Storage: FAISS (in-memory, L2 distance)
	- Retrieval Strategy: Vector similarity search + token overlap rerank
	- PDF Parsing:
	- Primary: `unstructured.io` (better quality, handles complex layouts)
	- Fallback: `pypdf` (if unstructured fails)
	- Backward Compatible: Falls back to token-based retrieval if embeddings unavailable

	### Optional: GenAICoursesDB 向量知识库（方案三）

	Clare 可调用 Hugging Face 上的 GenAICoursesDB Space 获取 GENAI 课程检索结果。设置 `GENAI_COURSES_SPACE=claudqunwang/GenAICoursesDB` 即可启用；Clare 会在每次对话时自动将课程知识库的检索结果补充到 RAG 上下文中。