Spaces:
Build error
Build error
| # Backend Scripts | |
| ## Reprocess All Documents | |
| 重新處理所有已上傳的文檔,使用新的 chunking 參數。 | |
| ### 使用方式 | |
| ```bash | |
| # 確保 backend API 正在運行 | |
| python backend/scripts/reprocess_all_docs.py | |
| ``` | |
| ### 功能 | |
| - 自動獲取資料庫中所有文檔 | |
| - 為每個文檔調用 `/api/ingest/reprocess/{doc_id}` endpoint | |
| - 使用新的參數重新切割文本(chunk_size=400, overlap=80) | |
| - 重新生成 embeddings | |
| - 顯示詳細進度和統計數據 | |
| ### 輸出範例 | |
| ``` | |
| 🔄 Reprocessing All Documents with New Chunk Parameters | |
| ================================================================================ | |
| New chunk_size: 400 | |
| New overlap: 80 | |
| ================================================================================ | |
| 📋 Fetching document list... | |
| ✅ Found 3 documents | |
| [1/3] Processing: 職涯諮詢師初階培訓簡報.pdf (ID: 1) | |
| ✅ Success! | |
| Old chunks: 9 | |
| New chunks: 28 | |
| Embeddings: 28 | |
| [2/3] Processing: 職涯諮詢師進階培訓簡報.pdf (ID: 2) | |
| ✅ Success! | |
| Old chunks: 4 | |
| New chunks: 15 | |
| Embeddings: 15 | |
| ================================================================================ | |
| 📊 SUMMARY | |
| ================================================================================ | |
| ✅ Successful: 2 | |
| ❌ Failed: 0 | |
| 📈 Total old chunks deleted: 13 | |
| 📈 Total new chunks created: 43 | |
| 📈 Chunk increase: +30 (+230.8%) | |
| ``` | |
| ### 注意事項 | |
| 1. **需要 API 運行**: 確保 backend API 在 `http://localhost:8000` 運行 | |
| 2. **時間較長**: 處理過程需要為每個 chunk 生成 embedding,可能需要數分鐘 | |
| 3. **OpenAI API**: 會消耗 OpenAI API credits | |
| 4. **無法復原**: 舊的 chunks 會被刪除,請確認後再執行 | |
| ### 疑難排解 | |
| 如果遇到 pgbouncer 錯誤: | |
| 1. 等待幾分鐘讓 Supabase 緩存清除 | |
| 2. 確認 `.env` 中的 `DATABASE_URL` 正確 | |
| 3. 重啟 backend API | |
| 如果遇到 timeout: | |
| 1. 增加 `httpx.AsyncClient(timeout=300.0)` 的 timeout 值 | |
| 2. 確認網路連接穩定 | |
| 3. 檢查 OpenAI API key 是否有效 | |