scvcoder commited on
Commit
ca5c4c2
·
verified ·
1 Parent(s): 57094bd

Hybrid RAG: BM25+Dense (sqlite-vec/BGE-M3) + cross-encoder reranker (bge-reranker-v2-m3)

Browse files
Files changed (1) hide show
  1. src/kpaa/embeddings/__init__.py +15 -0
src/kpaa/embeddings/__init__.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """KPAA hybrid retrieval — dense embeddings + cross-encoder reranker.
2
+
3
+ BM25(FTS5) 인덱스(`kpaa.guides.index`, `kpaa.cases.index`)와 별개로 dense 인덱스를
4
+ `data/embeddings.sqlite` (sqlite-vec) 에 둠. retriever 단에서 BM25 + dense 결과를
5
+ RRF 로 결합 → reranker 로 재정렬.
6
+
7
+ 모델 기본값:
8
+ - 임베딩: BAAI/bge-m3 (1024 dim, multilingual)
9
+ - 리랭커: BAAI/bge-reranker-v2-m3
10
+
11
+ 환경변수:
12
+ - KPAA_EMBEDDER 기본 BAAI/bge-m3
13
+ - KPAA_RERANKER 기본 BAAI/bge-reranker-v2-m3, "off" 면 비활성
14
+ - KPAA_EMBED_DEVICE "auto"(기본) | "cuda" | "mps" | "cpu"
15
+ """