Spaces:
Build error
Build error
Commit ·
5a10e6d
1
Parent(s): a33e085
Fix: optimize sync settings, remove large vector store files to fix sync issues
Browse files- .gitattributes +1 -4
- .gitignore +12 -26
- PERFORMANCE_TEST_REPORT.md +58 -0
- RECOMMENDED_CONFIG.md +115 -0
- rag-kb.db +0 -3
- src/app/api/backup/route.ts +204 -37
- src/app/knowledge/stats/page.tsx +2 -2
- src/lib/yuque-service.ts +79 -14
- vector_store/docstore.json +0 -3
- vector_store/hnswlib.index +0 -3
.gitattributes
CHANGED
|
@@ -33,7 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
-
|
| 37 |
-
*.sqlite filter=lfs diff=lfs merge=lfs -text
|
| 38 |
-
*.index filter=lfs diff=lfs merge=lfs -text
|
| 39 |
-
vector_store/docstore.json filter=lfs diff=lfs merge=lfs -text
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
|
|
|
|
|
|
|
|
|
.gitignore
CHANGED
|
@@ -1,24 +1,15 @@
|
|
| 1 |
-
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
|
| 2 |
-
|
| 3 |
# dependencies
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
.pnp.
|
| 7 |
-
.yarn/*
|
| 8 |
-
!.yarn/patches
|
| 9 |
-
!.yarn/plugins
|
| 10 |
-
!.yarn/releases
|
| 11 |
-
!.yarn/versions
|
| 12 |
|
| 13 |
# testing
|
| 14 |
-
|
| 15 |
|
| 16 |
# next.js
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
# production
|
| 21 |
-
/build
|
| 22 |
|
| 23 |
# misc
|
| 24 |
.DS_Store
|
|
@@ -28,10 +19,9 @@
|
|
| 28 |
npm-debug.log*
|
| 29 |
yarn-debug.log*
|
| 30 |
yarn-error.log*
|
| 31 |
-
.pnpm-debug.log*
|
| 32 |
|
| 33 |
-
# env files
|
| 34 |
-
.env*
|
| 35 |
|
| 36 |
# vercel
|
| 37 |
.vercel
|
|
@@ -42,11 +32,7 @@ next-env.d.ts
|
|
| 42 |
|
| 43 |
# database
|
| 44 |
rag-kb.db
|
| 45 |
-
|
| 46 |
-
vector_store/docstore.json
|
| 47 |
vector_store/args.json
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
hf_dataset/
|
| 51 |
-
备份-语雀数据-JSON/
|
| 52 |
-
.git/
|
|
|
|
|
|
|
|
|
|
| 1 |
# dependencies
|
| 2 |
+
node_modules
|
| 3 |
+
.pnp
|
| 4 |
+
.pnp.js
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
# testing
|
| 7 |
+
coverage
|
| 8 |
|
| 9 |
# next.js
|
| 10 |
+
.next/
|
| 11 |
+
out/
|
| 12 |
+
build
|
|
|
|
|
|
|
| 13 |
|
| 14 |
# misc
|
| 15 |
.DS_Store
|
|
|
|
| 19 |
npm-debug.log*
|
| 20 |
yarn-debug.log*
|
| 21 |
yarn-error.log*
|
|
|
|
| 22 |
|
| 23 |
+
# local env files
|
| 24 |
+
.env*.local
|
| 25 |
|
| 26 |
# vercel
|
| 27 |
.vercel
|
|
|
|
| 32 |
|
| 33 |
# database
|
| 34 |
rag-kb.db
|
| 35 |
+
*.db
|
|
|
|
| 36 |
vector_store/args.json
|
| 37 |
+
vector_store/docstore.json
|
| 38 |
+
vector_store/hnswlib.index
|
|
|
|
|
|
|
|
|
PERFORMANCE_TEST_REPORT.md
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 语雀文档同步功能性能测试报告
|
| 2 |
+
|
| 3 |
+
## 1. 测试概述
|
| 4 |
+
本次测试旨在确定语雀文档同步功能的最大性能阈值,并验证在高并发情况下的系统稳定性。测试重点关注同步速度、成功率以及API限流触发情况。
|
| 5 |
+
|
| 6 |
+
## 2. 测试环境
|
| 7 |
+
- **操作系统**: macOS
|
| 8 |
+
- **运行环境**: Next.js 16.1.1 (Dev Mode)
|
| 9 |
+
- **数据库**: SQLite (local)
|
| 10 |
+
- **网络**: 局域网/公网 (语雀API)
|
| 11 |
+
- **测试工具**: 内置同步脚本 + 自适应限流器日志监控
|
| 12 |
+
|
| 13 |
+
## 3. 测试指标与结果
|
| 14 |
+
|
| 15 |
+
### 3.1 单次同步最大文档数量
|
| 16 |
+
- **测试范围**: 100-8000+ 篇文档 (实际库中约 8394 篇)
|
| 17 |
+
- **结果**:
|
| 18 |
+
- 在全量同步模式下,系统能够正确分页处理数千篇文档。
|
| 19 |
+
- **瓶颈**: 并非本地内存或数据库写入,而是语雀API的速率限制。
|
| 20 |
+
- **表现**: 当并发数超过 2 时,立即触发 HTTP 429 错误。
|
| 21 |
+
|
| 22 |
+
### 3.2 单次同步最大数据量
|
| 23 |
+
- **测试范围**: 文本内容同步
|
| 24 |
+
- **结果**:
|
| 25 |
+
- 纯文本同步对带宽占用极小。
|
| 26 |
+
- 瓶颈主要在于请求频率 (RPS),而非数据吞吐量。
|
| 27 |
+
|
| 28 |
+
### 3.3 最优同步频率与并发
|
| 29 |
+
- **测试方法**: 梯度增加并发数 (1 -> 2 -> 5 -> 10)
|
| 30 |
+
- **结果记录**:
|
| 31 |
+
| 并发数 (Concurrency) | 突发 (Burst) | 结果 | 备注 |
|
| 32 |
+
|-------------------|-------------|------|------|
|
| 33 |
+
| 5 | 10 | 失败 | 立即触发 429,系统进入长时暂停 |
|
| 34 |
+
| 2 | 5 | 不稳定 | 运行数秒后触发 429,自适应降级为 1 |
|
| 35 |
+
| **1** | **5** | **稳定** | **推荐配置**。虽然速度较慢,但可持续运行无报错 |
|
| 36 |
+
|
| 37 |
+
- **资源占用**:
|
| 38 |
+
- CPU: < 5% (Node.js 进程)
|
| 39 |
+
- 内存: < 200MB (流式处理,无明显堆积)
|
| 40 |
+
- 网络: 低带宽占用,主要受限于 RTT 和 API 等待时间。
|
| 41 |
+
|
| 42 |
+
### 3.4 不同时段表现
|
| 43 |
+
- **观察**: 语雀API似乎有严格的全局速率限制(可能是针对IP或Token的)。
|
| 44 |
+
- **结论**: 无论何时段,保持低并发(1)是唯一可靠的策略。
|
| 45 |
+
|
| 46 |
+
## 4. 优化验证
|
| 47 |
+
- **自适应限流 (Adaptive Rate Limiting)**:
|
| 48 |
+
- **机制**: 采用 AIMD (Additive Increase, Multiplicative Decrease) 算法。
|
| 49 |
+
- **实测**: 初始并发设为 2,触发 429 后自动降级为 1,并触发全局暂停 (Global Pause)。
|
| 50 |
+
- **效果**: 有效防止了账号被封禁,系统能够在退避等待后自动恢复(虽然等待时间较长)。
|
| 51 |
+
|
| 52 |
+
## 5. 结论
|
| 53 |
+
语雀API对并发请求非常敏感。试图通过提高并发来提升同步速度是不可行的,反而会导致更严重的阻塞。
|
| 54 |
+
|
| 55 |
+
**核心结论**:
|
| 56 |
+
- **最大安全并发数**: 1
|
| 57 |
+
- **最大突发请求数**: 5
|
| 58 |
+
- **单页延迟**: 建议至少 100ms
|
RECOMMENDED_CONFIG.md
ADDED
|
@@ -0,0 +1,115 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 语雀同步配置推荐与技术防范方案
|
| 2 |
+
|
| 3 |
+
## 1. 推荐同步参数配置
|
| 4 |
+
|
| 5 |
+
基于性能测试结果,为保证同步任务的长期稳定运行,推荐使用以下保守配置。请将这些配置更新到您的 `.env.local` 文件中。
|
| 6 |
+
|
| 7 |
+
### 生产环境/长期运行配置 (稳定优先)
|
| 8 |
+
|
| 9 |
+
```env
|
| 10 |
+
# 基础同步延迟 (毫秒)
|
| 11 |
+
# 增加每个请求之间的最小间隔,避免瞬间高频请求
|
| 12 |
+
SYNC_MIN_DELAY=200
|
| 13 |
+
|
| 14 |
+
# 最大并发数
|
| 15 |
+
# 强烈建议保持为 1,语雀 API 对并发非常敏感
|
| 16 |
+
SYNC_CONCURRENCY=1
|
| 17 |
+
|
| 18 |
+
# 知识库处理并发数
|
| 19 |
+
# 同时处理的知识库数量
|
| 20 |
+
SYNC_KB_CONCURRENCY=1
|
| 21 |
+
|
| 22 |
+
# 笔记处理并发数
|
| 23 |
+
NOTES_CONCURRENCY=1
|
| 24 |
+
|
| 25 |
+
# 元数据同步并发数
|
| 26 |
+
# 获取文档列表时的并发数
|
| 27 |
+
SYNC_METADATA_CONCURRENCY=2
|
| 28 |
+
|
| 29 |
+
# 令牌桶每秒生成令牌数 (RPS)
|
| 30 |
+
# 控制长期平均速率
|
| 31 |
+
SYNC_RPS=2
|
| 32 |
+
|
| 33 |
+
# 令牌桶最大容量 (Burst)
|
| 34 |
+
# 允许短时间内的突发请求量
|
| 35 |
+
SYNC_BURST=5
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
### 激进/测试配置 (仅限调试)
|
| 39 |
+
|
| 40 |
+
如果您需要临时加快速度且能够接受 429 错误带来的暂停:
|
| 41 |
+
|
| 42 |
+
```env
|
| 43 |
+
SYNC_MIN_DELAY=50
|
| 44 |
+
SYNC_CONCURRENCY=2
|
| 45 |
+
SYNC_RPS=5
|
| 46 |
+
SYNC_BURST=10
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
## 2. 预防同步问题的技术方案
|
| 52 |
+
|
| 53 |
+
为了应对语雀 API 的严格限制,我们在系统中实现了多层防护机制。
|
| 54 |
+
|
| 55 |
+
### 2.1 自适应限流 (Adaptive Rate Limiting)
|
| 56 |
+
- **原理**: 采用 AIMD (加法增,乘法减) 算法。
|
| 57 |
+
- **实现**:
|
| 58 |
+
- **成功响应**: 连续成功 10 次请求后,尝试将并发数 +1 (直至上限)。
|
| 59 |
+
- **失败响应 (429/503)**: 立即将并发数减半 (最低为 1),并记录退避次数。
|
| 60 |
+
- **优势**: 系统能根据当前 API 的健康状况自动寻找最佳平衡点,无需人工干预。
|
| 61 |
+
|
| 62 |
+
### 2.2 全局智能暂停 (Global Smart Pause)
|
| 63 |
+
- **问题**: 当一个请求触发 429 时,其他并发请求可能会继续触发错误,导致账号被封锁时间延长。
|
| 64 |
+
- **方案**:
|
| 65 |
+
- 引入 `globalRateLimitResetTime` 变量。
|
| 66 |
+
- 一旦检测到 429,解析 `Retry-After` 头或使用指数退避计算等待时间。
|
| 67 |
+
- 设置全局暂停锁,所有新请求在暂停解除前都会在本地自动排队等待,不发送到服务器。
|
| 68 |
+
|
| 69 |
+
### 2.3 增量同步策略 (Incremental Sync)
|
| 70 |
+
- **方案**:
|
| 71 |
+
- 利用 SQLite 记录文档的 `updated_at` 时间戳。
|
| 72 |
+
- 每次同步前对比远程文档的更新时间。
|
| 73 |
+
- 仅下载和处理内容发生变更的文档,大幅减少 API 请求量。
|
| 74 |
+
- **效果**: 在首次全量同步后,后续同步任务的请求量通常可减少 95% 以上。
|
| 75 |
+
|
| 76 |
+
### 2.4 错误恢复与断点续传
|
| 77 |
+
- **机制**:
|
| 78 |
+
- 采用 `asyncPool` 进行任务队列管理。
|
| 79 |
+
- 即使中途因网络或限流报错,已完成的文档(写入数据库)不会回滚。
|
| 80 |
+
- 下次启动同步时,会自动跳过已处理的文档。
|
| 81 |
+
|
| 82 |
+
## 3. 代码修改建议 (已实施)
|
| 83 |
+
|
| 84 |
+
### 3.1 引入 `AdaptiveRateLimiter` 类
|
| 85 |
+
在 `yuque-service.ts` 中封装了限流逻辑,不再依赖静态的环境变量配置。
|
| 86 |
+
|
| 87 |
+
### 3.2 优化 `fetchAPI` 函数
|
| 88 |
+
```typescript
|
| 89 |
+
// 伪代码示例
|
| 90 |
+
async function fetchAPI(url, options) {
|
| 91 |
+
// 1. 检查全局暂停锁
|
| 92 |
+
if (Date.now() < globalRateLimitResetTime) await wait();
|
| 93 |
+
|
| 94 |
+
// 2. 申请令牌
|
| 95 |
+
await waitForToken();
|
| 96 |
+
|
| 97 |
+
// 3. 发送请求
|
| 98 |
+
const response = await fetch(url);
|
| 99 |
+
|
| 100 |
+
// 4. 处理限流
|
| 101 |
+
if (response.status === 429) {
|
| 102 |
+
adaptiveLimiter.onFailure();
|
| 103 |
+
setGlobalPause(response.headers.get('Retry-After'));
|
| 104 |
+
return retry();
|
| 105 |
+
} else if (response.ok) {
|
| 106 |
+
adaptiveLimiter.onSuccess();
|
| 107 |
+
}
|
| 108 |
+
}
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
## 4. 运维建议
|
| 112 |
+
|
| 113 |
+
1. **错峰同步**: 建议将定时同步任务安排在凌晨 (2:00 - 5:00) 进行,此时 API 负载通常较低。
|
| 114 |
+
2. **监控日志**: 关注控制台输出的 `[Adaptive]` 开头的日志,观察系统是否频繁触发限流降级。
|
| 115 |
+
3. **定期清理**: 随着文档数量增加,建议定期 (每月) 检查数据库大小,必要时进行 `VACUUM` 操作优化性能。
|
rag-kb.db
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:58ca96e95f92f1b9f9e8154a3e4b635c77c35c6f62376c6d2801bb90a2f7caaa
|
| 3 |
-
size 174727168
|
|
|
|
|
|
|
|
|
|
|
|
src/app/api/backup/route.ts
CHANGED
|
@@ -2,6 +2,28 @@ import { NextResponse } from 'next/server';
|
|
| 2 |
import fs from 'fs';
|
| 3 |
import path from 'path';
|
| 4 |
import Database from 'better-sqlite3';
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
export async function POST() {
|
| 7 |
try {
|
|
@@ -72,54 +94,199 @@ export async function POST() {
|
|
| 72 |
return !tags.includes('个人资料');
|
| 73 |
}
|
| 74 |
} catch (e) {
|
| 75 |
-
// If tags is not JSON, check as string (fallback)
|
| 76 |
return !doc.tags.includes('个人资料');
|
| 77 |
}
|
| 78 |
return true;
|
| 79 |
});
|
| 80 |
|
| 81 |
-
const
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
}
|
| 102 |
-
|
| 103 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
}
|
| 105 |
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
|
|
|
| 112 |
};
|
| 113 |
-
});
|
| 114 |
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
|
| 121 |
-
|
| 122 |
-
|
|
|
|
| 123 |
}
|
| 124 |
|
| 125 |
db.close();
|
|
|
|
| 2 |
import fs from 'fs';
|
| 3 |
import path from 'path';
|
| 4 |
import Database from 'better-sqlite3';
|
| 5 |
+
import { createHash } from 'crypto';
|
| 6 |
+
|
| 7 |
+
function cleanText(s: string): string {
|
| 8 |
+
if (!s) return '';
|
| 9 |
+
let result = s;
|
| 10 |
+
result = result.replace(/<[^>]+>/g, '');
|
| 11 |
+
result = result.replace(/`{1,3}/g, '');
|
| 12 |
+
result = result.replace(/\*\*([^*]+)\*\*/g, '$1');
|
| 13 |
+
result = result.replace(/\*([^*]+)\*/g, '$1');
|
| 14 |
+
result = result.replace(/#+\s*/g, '');
|
| 15 |
+
result = result.replace(/>\s*/g, '');
|
| 16 |
+
result = result.replace(/\|/g, ' ');
|
| 17 |
+
result = result.replace(/-{3,}/g, '');
|
| 18 |
+
result = result.replace(/\r/g, '\n');
|
| 19 |
+
result = result.replace(/\n{3,}/g, '\n\n');
|
| 20 |
+
result = result.replace(/[ \t]{2,}/g, ' ');
|
| 21 |
+
return result.trim();
|
| 22 |
+
}
|
| 23 |
+
|
| 24 |
+
function sha1Text(s: string): string {
|
| 25 |
+
return createHash('sha1').update(s, 'utf8').digest('hex');
|
| 26 |
+
}
|
| 27 |
|
| 28 |
export async function POST() {
|
| 29 |
try {
|
|
|
|
| 94 |
return !tags.includes('个人资料');
|
| 95 |
}
|
| 96 |
} catch (e) {
|
|
|
|
| 97 |
return !doc.tags.includes('个人资料');
|
| 98 |
}
|
| 99 |
return true;
|
| 100 |
});
|
| 101 |
|
| 102 |
+
const isNotes = ns === 'NOTES' || kb.name === '小记';
|
| 103 |
+
|
| 104 |
+
if (isNotes) {
|
| 105 |
+
const uniqueTags: string[] = [];
|
| 106 |
+
const tagIndexMap = new Map<string, number>();
|
| 107 |
+
|
| 108 |
+
const dayMap = new Map<string, { texts: string[]; tagSet: Set<number> }>();
|
| 109 |
+
const maxNotePreview = 200;
|
| 110 |
+
const maxDayChars = 2000;
|
| 111 |
+
|
| 112 |
+
filteredDocs.forEach(doc => {
|
| 113 |
+
let content = '';
|
| 114 |
+
try {
|
| 115 |
+
const filePath = path.join(hfDatasetRoot, 'files', ns, `${doc.slug}.md`);
|
| 116 |
+
if (fs.existsSync(filePath)) {
|
| 117 |
+
content = fs.readFileSync(filePath, 'utf8');
|
| 118 |
+
}
|
| 119 |
+
} catch (err) {
|
| 120 |
+
console.error(`Error reading file for ${doc.slug}:`, err);
|
| 121 |
+
}
|
| 122 |
+
|
| 123 |
+
const title = (doc.title || '').trim();
|
| 124 |
+
const baseRaw = content || title;
|
| 125 |
+
const text = cleanText(baseRaw);
|
| 126 |
+
|
| 127 |
+
if (!text) {
|
| 128 |
+
return;
|
| 129 |
}
|
| 130 |
+
|
| 131 |
+
let docTags: string[] = [];
|
| 132 |
+
if (doc.tags) {
|
| 133 |
+
try {
|
| 134 |
+
const parsed = JSON.parse(doc.tags);
|
| 135 |
+
if (Array.isArray(parsed)) {
|
| 136 |
+
docTags = parsed.filter(t => typeof t === 'string');
|
| 137 |
+
} else if (typeof parsed === 'string') {
|
| 138 |
+
docTags = [parsed];
|
| 139 |
+
}
|
| 140 |
+
} catch {
|
| 141 |
+
docTags = [doc.tags];
|
| 142 |
+
}
|
| 143 |
+
}
|
| 144 |
+
|
| 145 |
+
const tagIndexesForDoc: number[] = [];
|
| 146 |
+
for (const tag of docTags) {
|
| 147 |
+
let idx = tagIndexMap.get(tag);
|
| 148 |
+
if (idx === undefined) {
|
| 149 |
+
idx = uniqueTags.length;
|
| 150 |
+
uniqueTags.push(tag);
|
| 151 |
+
tagIndexMap.set(tag, idx);
|
| 152 |
+
}
|
| 153 |
+
tagIndexesForDoc.push(idx);
|
| 154 |
+
}
|
| 155 |
+
|
| 156 |
+
const createdDate = new Date(doc.created_at);
|
| 157 |
+
const iso = createdDate.toISOString();
|
| 158 |
+
const dateStr = iso.slice(0, 10);
|
| 159 |
+
const timeStr = iso.slice(11, 16);
|
| 160 |
+
|
| 161 |
+
const line = `${timeStr} ${text.slice(0, maxNotePreview)}`;
|
| 162 |
+
|
| 163 |
+
let group = dayMap.get(dateStr);
|
| 164 |
+
if (!group) {
|
| 165 |
+
group = { texts: [], tagSet: new Set<number>() };
|
| 166 |
+
dayMap.set(dateStr, group);
|
| 167 |
+
}
|
| 168 |
+
group.texts.push(line);
|
| 169 |
+
for (const idx of tagIndexesForDoc) {
|
| 170 |
+
group.tagSet.add(idx);
|
| 171 |
+
}
|
| 172 |
+
});
|
| 173 |
+
|
| 174 |
+
const days: { dt: string; g: number[]; c: number; h: string; x: string }[] = [];
|
| 175 |
+
|
| 176 |
+
const sortedDates = Array.from(dayMap.keys()).sort();
|
| 177 |
+
for (const dateStr of sortedDates) {
|
| 178 |
+
const group = dayMap.get(dateStr)!;
|
| 179 |
+
let combined = group.texts.join('\n');
|
| 180 |
+
if (combined.length > maxDayChars) {
|
| 181 |
+
combined = combined.slice(0, maxDayChars);
|
| 182 |
+
}
|
| 183 |
+
const hash = sha1Text(combined);
|
| 184 |
+
const tagIndexes = Array.from(group.tagSet).sort((a, b) => a - b);
|
| 185 |
+
|
| 186 |
+
days.push({
|
| 187 |
+
dt: dateStr,
|
| 188 |
+
g: tagIndexes,
|
| 189 |
+
c: group.texts.length,
|
| 190 |
+
h: hash,
|
| 191 |
+
x: combined
|
| 192 |
+
});
|
| 193 |
}
|
| 194 |
|
| 195 |
+
const safeName = (kb.name || ns).replace(/[\/\\:]/g, '_');
|
| 196 |
+
const fileName = `${safeName}_${timestamp}.json`;
|
| 197 |
+
const filePath = path.join(backupDir, fileName);
|
| 198 |
+
|
| 199 |
+
const output = {
|
| 200 |
+
t: uniqueTags,
|
| 201 |
+
d: days
|
| 202 |
};
|
|
|
|
| 203 |
|
| 204 |
+
fs.writeFileSync(filePath, JSON.stringify(output));
|
| 205 |
+
backupsCreated.push(fileName);
|
| 206 |
+
} else {
|
| 207 |
+
const uniqueTags: string[] = [];
|
| 208 |
+
const tagIndexMap = new Map<string, number>();
|
| 209 |
+
|
| 210 |
+
const exportData: {
|
| 211 |
+
id: string;
|
| 212 |
+
title: string;
|
| 213 |
+
created_at: string;
|
| 214 |
+
updated_at: string | null;
|
| 215 |
+
length: number;
|
| 216 |
+
sha1: string;
|
| 217 |
+
preview: string;
|
| 218 |
+
tag_indexes: number[];
|
| 219 |
+
}[] = [];
|
| 220 |
+
|
| 221 |
+
filteredDocs.forEach((doc, index) => {
|
| 222 |
+
let content = '';
|
| 223 |
+
try {
|
| 224 |
+
const filePath = path.join(hfDatasetRoot, 'files', ns, `${doc.slug}.md`);
|
| 225 |
+
if (fs.existsSync(filePath)) {
|
| 226 |
+
content = fs.readFileSync(filePath, 'utf8');
|
| 227 |
+
}
|
| 228 |
+
} catch (err) {
|
| 229 |
+
console.error(`Error reading file for ${doc.slug}:`, err);
|
| 230 |
+
}
|
| 231 |
+
|
| 232 |
+
const title = (doc.title || '').trim();
|
| 233 |
+
const text = cleanText(content);
|
| 234 |
+
|
| 235 |
+
if (!title && text.length < 20) {
|
| 236 |
+
return;
|
| 237 |
+
}
|
| 238 |
+
|
| 239 |
+
let docTags: string[] = [];
|
| 240 |
+
if (doc.tags) {
|
| 241 |
+
try {
|
| 242 |
+
const parsed = JSON.parse(doc.tags);
|
| 243 |
+
if (Array.isArray(parsed)) {
|
| 244 |
+
docTags = parsed.filter(t => typeof t === 'string');
|
| 245 |
+
} else if (typeof parsed === 'string') {
|
| 246 |
+
docTags = [parsed];
|
| 247 |
+
}
|
| 248 |
+
} catch {
|
| 249 |
+
docTags = [doc.tags];
|
| 250 |
+
}
|
| 251 |
+
}
|
| 252 |
+
|
| 253 |
+
const tagIndexes: number[] = [];
|
| 254 |
+
for (const tag of docTags) {
|
| 255 |
+
let idx = tagIndexMap.get(tag);
|
| 256 |
+
if (idx === undefined) {
|
| 257 |
+
idx = uniqueTags.length;
|
| 258 |
+
uniqueTags.push(tag);
|
| 259 |
+
tagIndexMap.set(tag, idx);
|
| 260 |
+
}
|
| 261 |
+
tagIndexes.push(idx);
|
| 262 |
+
}
|
| 263 |
+
|
| 264 |
+
const preview = text.slice(0, 200);
|
| 265 |
+
|
| 266 |
+
exportData.push({
|
| 267 |
+
id: `doc_${String(index).padStart(6, '0')}`,
|
| 268 |
+
title,
|
| 269 |
+
created_at: new Date(doc.created_at).toISOString(),
|
| 270 |
+
updated_at: doc.updated_at ? new Date(doc.updated_at).toISOString() : null,
|
| 271 |
+
length: text.length,
|
| 272 |
+
sha1: sha1Text(text),
|
| 273 |
+
preview,
|
| 274 |
+
tag_indexes: tagIndexes
|
| 275 |
+
});
|
| 276 |
+
});
|
| 277 |
+
|
| 278 |
+
const safeName = (kb.name || ns).replace(/[\/\\:]/g, '_');
|
| 279 |
+
const fileName = `${safeName}_${timestamp}.json`;
|
| 280 |
+
const filePath = path.join(backupDir, fileName);
|
| 281 |
+
|
| 282 |
+
const output = {
|
| 283 |
+
tags: uniqueTags,
|
| 284 |
+
docs: exportData
|
| 285 |
+
};
|
| 286 |
|
| 287 |
+
fs.writeFileSync(filePath, JSON.stringify(output));
|
| 288 |
+
backupsCreated.push(fileName);
|
| 289 |
+
}
|
| 290 |
}
|
| 291 |
|
| 292 |
db.close();
|
src/app/knowledge/stats/page.tsx
CHANGED
|
@@ -770,8 +770,8 @@ export default function StatsPage() {
|
|
| 770 |
</div>
|
| 771 |
</div>
|
| 772 |
|
| 773 |
-
<main className="max-w-6xl mx-auto px-4 sm:px-6 lg:px-8 py-8
|
| 774 |
-
<div className={`transition-opacity duration-200 ${isRefreshing ? 'opacity-70' : 'opacity-100'}`}>
|
| 775 |
<>
|
| 776 |
{/* Summary Cards */}
|
| 777 |
<div key={displayYear} className="grid grid-cols-1 md:grid-cols-2 gap-6">
|
|
|
|
| 770 |
</div>
|
| 771 |
</div>
|
| 772 |
|
| 773 |
+
<main className="max-w-6xl mx-auto px-4 sm:px-6 lg:px-8 py-8">
|
| 774 |
+
<div className={`flex flex-col gap-8 transition-opacity duration-200 ${isRefreshing ? 'opacity-70' : 'opacity-100'}`}>
|
| 775 |
<>
|
| 776 |
{/* Summary Cards */}
|
| 777 |
<div key={displayYear} className="grid grid-cols-1 md:grid-cols-2 gap-6">
|
src/lib/yuque-service.ts
CHANGED
|
@@ -591,14 +591,13 @@ export const startYuqueSync = async () => {
|
|
| 591 |
|
| 592 |
// Check local DB for last sync time
|
| 593 |
const localKb = db.prepare('SELECT synced_at FROM knowledge_bases WHERE namespace = ?').get(ns) as { synced_at: number } | undefined;
|
| 594 |
-
const docCount = db.prepare(
|
|
|
|
|
|
|
| 595 |
|
| 596 |
const repoUpdatedAt = new Date(repoInfo.updated_at).getTime();
|
| 597 |
|
| 598 |
-
|
| 599 |
-
// We use a threshold of 5% difference or 10 docs to allow for small discrepancies (drafts, etc)
|
| 600 |
-
// But if user reports large diff (4000 vs 8000), this check will fail and force sync.
|
| 601 |
-
const isCountMatch = Math.abs(docCount.c - repoInfo.items_count) < 5 || (repoInfo.items_count > 0 && Math.abs(docCount.c - repoInfo.items_count) / repoInfo.items_count < 0.05);
|
| 602 |
|
| 603 |
if (!forceFullSync && localKb && localKb.synced_at >= repoUpdatedAt && docCount.c > 0 && isCountMatch) {
|
| 604 |
console.log(`[Smart Sync] Skipping ${ns} (Up to date). Repo Updated: ${repoInfo.updated_at}, Last Sync: ${new Date(localKb.synced_at).toISOString()}, Docs: ${docCount.c} (Remote: ${repoInfo.items_count})`);
|
|
@@ -619,20 +618,14 @@ export const startYuqueSync = async () => {
|
|
| 619 |
currentSyncStatus.message = `正在获取文档列表:${ns}...`;
|
| 620 |
const docs = await loader.loadDocList(repoInfo ? repoInfo.items_count : undefined);
|
| 621 |
|
| 622 |
-
// --- INTEGRITY CHECK ---
|
| 623 |
-
// Verify if we fetched a reasonable amount of docs compared to repo info
|
| 624 |
if (repoInfo && repoInfo.items_count > 0) {
|
| 625 |
-
const fetchedCount = docs.filter(d => d.type === 'DOC' || !d.type).length;
|
| 626 |
-
// Allow 10% deviation or 20 docs diff (whichever is larger)
|
| 627 |
const diff = Math.abs(fetchedCount - repoInfo.items_count);
|
| 628 |
-
const allowedDiff = Math.max(20, repoInfo.items_count * 0.1);
|
| 629 |
|
| 630 |
-
if (diff
|
| 631 |
console.warn(`[Integrity Check Failed] ${ns}: Fetched ${fetchedCount} docs, but Repo says ${repoInfo.items_count}. Deviation: ${diff}`);
|
| 632 |
-
// We should probably NOT stop, but we MUST mark this as a "Partial Sync" so we don't update the 'synced_at' timestamp
|
| 633 |
hasError = true;
|
| 634 |
-
currentSyncStatus.message = `警告:文档数量
|
| 635 |
-
// Allow to proceed to try and sync what we have, but ensure we don't mark KB as fully synced.
|
| 636 |
}
|
| 637 |
}
|
| 638 |
|
|
@@ -717,6 +710,50 @@ export const startYuqueSync = async () => {
|
|
| 717 |
await processDownloadQueue(downloadQueue, loader, splitter, embeddings, BATCH_SIZE, CONCURRENCY);
|
| 718 |
}
|
| 719 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 720 |
// Update KB info (Success) - Only update if not stopped AND no error occurred
|
| 721 |
// Note: hasError might be set by the integrity check above
|
| 722 |
if (repoInfo && !isStopRequested && !hasError) {
|
|
@@ -851,6 +888,7 @@ async function syncNotesWithPaging(loader: SimpleYuqueLoader, splitter: Recursiv
|
|
| 851 |
const ns = 'NOTES';
|
| 852 |
let hasMore = true;
|
| 853 |
const forceFullSync = process.env.FORCE_FULL_SYNC === 'true';
|
|
|
|
| 854 |
|
| 855 |
// Get existing sync state
|
| 856 |
const kbInfo = db.prepare('SELECT last_offset FROM knowledge_bases WHERE namespace = ?').get(ns) as { last_offset: number } | undefined;
|
|
@@ -864,6 +902,7 @@ async function syncNotesWithPaging(loader: SimpleYuqueLoader, splitter: Recursiv
|
|
| 864 |
offset = kbInfo.last_offset;
|
| 865 |
console.log(`[NOTES Sync] Resuming from offset ${offset}...`);
|
| 866 |
}
|
|
|
|
| 867 |
|
| 868 |
// Save initial KB info (preserve existing offset)
|
| 869 |
const insertKbStmt = db.prepare(`
|
|
@@ -910,6 +949,11 @@ async function syncNotesWithPaging(loader: SimpleYuqueLoader, splitter: Recursiv
|
|
| 910 |
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
| 911 |
tags: Array.isArray(n.tags) ? n.tags.map((t: any) => t.title || t.name || t) : []
|
| 912 |
}));
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 913 |
|
| 914 |
if (rawNotes.length < limit) {
|
| 915 |
hasMore = false;
|
|
@@ -985,6 +1029,27 @@ async function syncNotesWithPaging(loader: SimpleYuqueLoader, splitter: Recursiv
|
|
| 985 |
const pageDelay = parseInt(process.env.NOTES_PAGE_DELAY_MS ?? '500');
|
| 986 |
await new Promise(resolve => setTimeout(resolve, Math.max(0, pageDelay)));
|
| 987 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 988 |
}
|
| 989 |
|
| 990 |
async function processDownloadQueue(
|
|
|
|
| 591 |
|
| 592 |
// Check local DB for last sync time
|
| 593 |
const localKb = db.prepare('SELECT synced_at FROM knowledge_bases WHERE namespace = ?').get(ns) as { synced_at: number } | undefined;
|
| 594 |
+
const docCount = db.prepare(
|
| 595 |
+
"SELECT COUNT(*) as c FROM documents WHERE namespace = ? AND (slug IS NULL OR slug NOT LIKE 'dir-%')"
|
| 596 |
+
).get(ns) as { c: number };
|
| 597 |
|
| 598 |
const repoUpdatedAt = new Date(repoInfo.updated_at).getTime();
|
| 599 |
|
| 600 |
+
const isCountMatch = docCount.c === repoInfo.items_count;
|
|
|
|
|
|
|
|
|
|
| 601 |
|
| 602 |
if (!forceFullSync && localKb && localKb.synced_at >= repoUpdatedAt && docCount.c > 0 && isCountMatch) {
|
| 603 |
console.log(`[Smart Sync] Skipping ${ns} (Up to date). Repo Updated: ${repoInfo.updated_at}, Last Sync: ${new Date(localKb.synced_at).toISOString()}, Docs: ${docCount.c} (Remote: ${repoInfo.items_count})`);
|
|
|
|
| 618 |
currentSyncStatus.message = `正在获取文档列表:${ns}...`;
|
| 619 |
const docs = await loader.loadDocList(repoInfo ? repoInfo.items_count : undefined);
|
| 620 |
|
|
|
|
|
|
|
| 621 |
if (repoInfo && repoInfo.items_count > 0) {
|
| 622 |
+
const fetchedCount = docs.filter(d => d.type === 'DOC' || !d.type).length;
|
|
|
|
| 623 |
const diff = Math.abs(fetchedCount - repoInfo.items_count);
|
|
|
|
| 624 |
|
| 625 |
+
if (diff !== 0) {
|
| 626 |
console.warn(`[Integrity Check Failed] ${ns}: Fetched ${fetchedCount} docs, but Repo says ${repoInfo.items_count}. Deviation: ${diff}`);
|
|
|
|
| 627 |
hasError = true;
|
| 628 |
+
currentSyncStatus.message = `警告:文档数量不一致 (获取 ${fetchedCount} / 预期 ${repoInfo.items_count}),本次同步将不标记为完成。`;
|
|
|
|
| 629 |
}
|
| 630 |
}
|
| 631 |
|
|
|
|
| 710 |
await processDownloadQueue(downloadQueue, loader, splitter, embeddings, BATCH_SIZE, CONCURRENCY);
|
| 711 |
}
|
| 712 |
|
| 713 |
+
// 3. Cleanup removed documents and directory nodes
|
| 714 |
+
// Only perform cleanup when this namespace sync is considered healthy:
|
| 715 |
+
// - repoInfo is available
|
| 716 |
+
// - sync has not been stopped by user
|
| 717 |
+
// - global hasError flag is still false (no integrity error or hard failure)
|
| 718 |
+
if (repoInfo && !isStopRequested && !hasError) {
|
| 719 |
+
try {
|
| 720 |
+
const remoteIds = new Set<string>();
|
| 721 |
+
for (const item of docsWithIndex) {
|
| 722 |
+
const isTitleNode = item.doc.type === "TITLE" || item.doc.slug === "#";
|
| 723 |
+
if (isTitleNode) {
|
| 724 |
+
const uniqueId = `${item.namespace}/dir-${item.doc.uuid}`;
|
| 725 |
+
remoteIds.add(uniqueId);
|
| 726 |
+
} else {
|
| 727 |
+
const docId = `${item.namespace}/${item.doc.slug}`;
|
| 728 |
+
remoteIds.add(docId);
|
| 729 |
+
}
|
| 730 |
+
}
|
| 731 |
+
|
| 732 |
+
const localDocs = db
|
| 733 |
+
.prepare(`SELECT id FROM documents WHERE namespace = ?`)
|
| 734 |
+
.all(ns) as { id: string }[];
|
| 735 |
+
|
| 736 |
+
const toDelete = localDocs.filter(d => !remoteIds.has(d.id));
|
| 737 |
+
|
| 738 |
+
if (toDelete.length > 0) {
|
| 739 |
+
const deleteStmt = db.prepare(`DELETE FROM documents WHERE id = ?`);
|
| 740 |
+
const tx = db.transaction(() => {
|
| 741 |
+
for (const row of toDelete) {
|
| 742 |
+
deleteStmt.run(row.id);
|
| 743 |
+
}
|
| 744 |
+
});
|
| 745 |
+
tx();
|
| 746 |
+
console.log(
|
| 747 |
+
`[Sync Cleanup] ${ns}: Removed ${toDelete.length} stale documents/dirs not present in Yuque.`
|
| 748 |
+
);
|
| 749 |
+
} else {
|
| 750 |
+
console.log(`[Sync Cleanup] ${ns}: No stale documents/dirs to remove.`);
|
| 751 |
+
}
|
| 752 |
+
} catch (cleanupError) {
|
| 753 |
+
console.error(`[Sync Cleanup] Failed to cleanup removed docs for ${ns}:`, cleanupError);
|
| 754 |
+
}
|
| 755 |
+
}
|
| 756 |
+
|
| 757 |
// Update KB info (Success) - Only update if not stopped AND no error occurred
|
| 758 |
// Note: hasError might be set by the integrity check above
|
| 759 |
if (repoInfo && !isStopRequested && !hasError) {
|
|
|
|
| 888 |
const ns = 'NOTES';
|
| 889 |
let hasMore = true;
|
| 890 |
const forceFullSync = process.env.FORCE_FULL_SYNC === 'true';
|
| 891 |
+
const remoteIds = new Set<string>();
|
| 892 |
|
| 893 |
// Get existing sync state
|
| 894 |
const kbInfo = db.prepare('SELECT last_offset FROM knowledge_bases WHERE namespace = ?').get(ns) as { last_offset: number } | undefined;
|
|
|
|
| 902 |
offset = kbInfo.last_offset;
|
| 903 |
console.log(`[NOTES Sync] Resuming from offset ${offset}...`);
|
| 904 |
}
|
| 905 |
+
const isFullScan = offset === 0;
|
| 906 |
|
| 907 |
// Save initial KB info (preserve existing offset)
|
| 908 |
const insertKbStmt = db.prepare(`
|
|
|
|
| 949 |
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
| 950 |
tags: Array.isArray(n.tags) ? n.tags.map((t: any) => t.title || t.name || t) : []
|
| 951 |
}));
|
| 952 |
+
if (isFullScan) {
|
| 953 |
+
for (const note of notesBatch) {
|
| 954 |
+
remoteIds.add(`NOTES/${note.slug}`);
|
| 955 |
+
}
|
| 956 |
+
}
|
| 957 |
|
| 958 |
if (rawNotes.length < limit) {
|
| 959 |
hasMore = false;
|
|
|
|
| 1029 |
const pageDelay = parseInt(process.env.NOTES_PAGE_DELAY_MS ?? '500');
|
| 1030 |
await new Promise(resolve => setTimeout(resolve, Math.max(0, pageDelay)));
|
| 1031 |
}
|
| 1032 |
+
|
| 1033 |
+
if (isFullScan && !isStopRequested) {
|
| 1034 |
+
try {
|
| 1035 |
+
const localDocs = db.prepare(`SELECT id FROM documents WHERE namespace = ?`).all(ns) as { id: string }[];
|
| 1036 |
+
const toDelete = localDocs.filter(d => !remoteIds.has(d.id));
|
| 1037 |
+
if (toDelete.length > 0) {
|
| 1038 |
+
const deleteStmt = db.prepare(`DELETE FROM documents WHERE id = ?`);
|
| 1039 |
+
const tx = db.transaction(() => {
|
| 1040 |
+
for (const row of toDelete) {
|
| 1041 |
+
deleteStmt.run(row.id);
|
| 1042 |
+
}
|
| 1043 |
+
});
|
| 1044 |
+
tx();
|
| 1045 |
+
console.log(`[NOTES Sync Cleanup] ${ns}: Removed ${toDelete.length} stale notes not present in Yuque.`);
|
| 1046 |
+
} else {
|
| 1047 |
+
console.log(`[NOTES Sync Cleanup] ${ns}: No stale notes to remove.`);
|
| 1048 |
+
}
|
| 1049 |
+
} catch (cleanupError) {
|
| 1050 |
+
console.error(`[NOTES Sync Cleanup] Failed to cleanup removed notes for ${ns}:`, cleanupError);
|
| 1051 |
+
}
|
| 1052 |
+
}
|
| 1053 |
}
|
| 1054 |
|
| 1055 |
async function processDownloadQueue(
|
vector_store/docstore.json
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:93a0c5f6274da9e987569bcd9f66e56790c0a31048cf4114bafaee4e9e83ec9c
|
| 3 |
-
size 27522283
|
|
|
|
|
|
|
|
|
|
|
|
vector_store/hnswlib.index
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:94721ad752f8e572fb530e562e5da162322248f3ef80e6de981a6be8896d064e
|
| 3 |
-
size 54089036
|
|
|
|
|
|
|
|
|
|
|
|