Spaces:
Sleeping
System Architecture
Table of Contents
- High-Level Overview
- Module Dependency Graph
- Core Components
- Data Flow
- File Organization
- Design Patterns
- Quick Reference
High-Level Overview
The system implements a dual-path recommendation architecture:
flowchart TB
subgraph Entry
A[User Request]
end
A --> B{Has Query?}
B -->|Yes| RAG
B -->|No| Rec
subgraph RAG[RAG Path]
R1[Query Router]
R2[Hybrid Search]
R3[Reranking]
R4[LLM Gen]
R1 --> R2 --> R3 --> R4
end
subgraph Rec[RecSys Path]
C1[Multi-Channel Recall]
C2[Feature Eng]
C3[LGBMRanker]
C1 --> C2 --> C3
end
RAG --> Res[Top-K Results]
Rec --> Res
Key Insight: Users with explicit queries use RAG (semantic search + LLM); users without queries use collaborative filtering + ranking.
| Block | Responsibility |
|---|---|
| Query Router | Chooses strategy (EXACT/FAST/DEEP) and whether to rerank based on intent (ISBN, keywords, or natural language). |
| Hybrid Search | ChromaDB (semantic) + SQLite FTS5 (keyword), merged via RRF. |
| Reranking | Cross-Encoder reranks candidates to reduce semantic drift. |
| Multi-Channel Recall | 7 channels (ItemCF, SASRec, YoutubeDNN, etc.) fused with RRF. |
| Feature Eng + LGBMRanker | Feature engineering and LambdaRank-style ranking. |
Module Dependency Graph
Layer 1: Entry Points (API)
src/main.py (FastAPI app)
βββ src/api/chat.py # Chat API (single router)
βββ main.py # All other endpoints defined inline
Layer 2: Service Orchestration
Service Layer
βββ src/services/chat_service.py
β βββ Orchestrates RAG pipeline for "Chat with Book"
β
βββ src/services/recommend_service.py
β βββ Orchestrates RecSys pipeline (7-channel recall β LGBMRanker)
β
βββ src/services/personal_recommend_handler.py
βββ Handles /api/recommend/personal (params, cold-start intent probing, enrichment)
Layer 3: Core Logic (RAG Path)
Layer 3: Core Logic (RAG Path)
RAG Pipeline
βββ src/core/router.py (QueryRouter)
β βββ Classifies query intent (EXACT/FAST/DEEP)
β βββ Routes to appropriate retrieval strategy
β
βββ src/vector_db.py (VectorDB - Singleton)
β βββ ChromaDB for dense retrieval
β βββ SQLite FTS5 for sparse retrieval
β βββ Hybrid Search (RRF fusion)
β βββ Small-to-Big retrieval (sentence β book)
β
βββ src/core/reranker.py (Reranker)
β βββ Cross-encoder (ms-marco-MiniLM) for top-K precision
β
βββ src/core/temporal.py (TemporalRanker)
β βββ Recency boosting for "latest/new" queries
β
βββ src/core/diversity_reranker.py (optional)
β βββ MMR + category constraint when ENABLE_RAG_DIVERSITY
β
βββ src/core/llm.py (LLMFactory)
βββ OpenAI / Ollama / Groq integration for generation
Layer 4: Core Logic (RecSys Path)
RecSys Pipeline
βββ src/recall/fusion.py (RecallFusion)
β βββ Coordinates 7 recall channels via RRF
β βββ Channels:
β βββ src/recall/itemcf.py (ItemCF - direction-weighted)
β βββ src/recall/usercf.py (UserCF - user-user co-occurrence)
β βββ src/recall/swing.py (Swing - user-pair overlap)
β βββ src/recall/sasrec_recall.py (SASRec - Transformer)
β βββ src/recall/item2vec.py (Item2Vec - Word2Vec)
β βββ src/recall/embedding.py (YoutubeDNN - Two-tower)
β βββ src/recall/popularity.py (Popularity - cold-start)
β
βββ src/ranking/features.py (FeatureEngineer)
β βββ 17 ranking features (user stats, CF scores, SASRec scores)
β
βββ LGBMRanker (lgbm_ranker.txt, loaded in recommend_service.py)
β βββ LambdaRank optimization for NDCG
β
βββ src/ranking/din.py (DINRanker - optional)
β βββ Deep Interest Network for sequence modeling
β
βββ src/core/diversity_reranker.py (DiversityReranker)
βββ MMR + category constraints for diverse results
Layer 5: Data Access
Data Layer
βββ src/core/metadata_store.py (MetadataStore)
β βββ SQLite (books.db) for book metadata (Zero-RAM mode)
β βββ FTS5 (books_fts) for full-text search
β
βββ src/core/online_books_store.py (OnlineBooksStore - Singleton)
β βββ SQLite (online_books.db) for web-discovered books (freshness_fallback)
β
βββ src/user/profile_store.py (module)
β βββ JSON (user_profiles.json) for user favorites and reading history
β
βββ src/data/repository.py (DataRepository - Singleton)
βββ Unified interface: metadata (MetadataStore) + user interaction history (recall_models.db)
Supporting Modules
Utilities & Extensions
βββ src/core/context_compressor.py
β βββ Chat history compression (LLM summarization)
β
βββ src/core/freshness_monitor.py
β βββ Detects when local data is stale (used to trigger web search)
β
βββ src/core/web_search.py
β βββ Google Books API fallback for missing/fresh books
β
βββ src/marketing/persona.py
β βββ User persona generation for chat context
β
βββ src/config.py
βββ Global configuration (pydantic-settings, paths, hyperparameters)
Core Components
1. QueryRouter (src/core/router.py)
Responsibility: Intent classification for RAG queries.
Strategies:
- EXACT: ISBN/exact match β BM25 only (no reranking)
- FAST: 1-2 keywords β Hybrid search (no reranking)
- DEEP: Complex queries β Hybrid + Cross-encoder reranking
Key Logic:
if re.match(r"^\d{10,13}$", query):
return "EXACT"
elif len(words) <= 2:
return "FAST"
else:
return "DEEP"
Dependencies:
src/core/intent_classifier.py(optional ML classifier)config/router.json(detail keywords: "twist", "ending", etc.)
2. VectorDB (src/vector_db.py)
Responsibility: Hybrid retrieval (BM25 + Dense embeddings).
Components:
- ChromaDB: Dense retrieval (all-MiniLM-L6-v2, 384-dim)
- BM25Okapi: Sparse retrieval for exact/keyword matches
- RRF: Reciprocal Rank Fusion for merging results
Key Method:
def hybrid_search(query, k=10, alpha=0.5):
dense_results = chroma.search(query, k=50)
sparse_results = bm25.get_top_n(query, k=50)
merged = rrf_fusion(dense_results, sparse_results, alpha)
return merged[:k]
Dependencies:
rank_bm25librarychromadblibrarysentence-transformers
3. RecallFusion (src/recall/fusion.py)
Responsibility: Multi-channel recall aggregation.
Algorithm: Reciprocal Rank Fusion (RRF)
for rank, (item, score) in enumerate(channel_results):
rrf_score = weight * (1.0 / (k + rank + 1))
candidates[item] += rrf_score
Channels (configurable):
- ItemCF (weight=1.0) β co-occurrence with direction weight
- SASRec (weight=1.0) β sequential Transformer
- YoutubeDNN (weight=1.0) β two-tower neural
- Item2Vec (weight=0.8) β Word2Vec on sequences
- UserCF (weight=1.0) β Jaccard similarity
- Swing (weight=1.0) β user-pair overlap weighting
- Popularity (weight=0.5) β cold-start fallback
Dependencies: All recall modules in src/recall/
4. FeatureEngineer (src/ranking/features.py)
Responsibility: Generate 17 ranking features from recall candidates.
Feature Groups:
- User Stats:
u_cnt,u_mean,u_std - Item Stats:
i_cnt,i_mean,i_std - Cross Features:
len_diff,u_auth_avg,u_auth_match,is_cat_hob - Sequence:
sasrec_score,sim_max,sim_min,sim_mean - CF Scores:
icf_sum,icf_max,ucf_sum
Dependencies:
- Recall models for CF scores
- SASRec embeddings
data/rec/train.csvfor statistics
5. MetadataStore (src/core/metadata_store.py)
Responsibility: Zero-RAM book metadata lookup.
Implementation:
- SQLite with FTS5 index
- Lazy connection (opens on first query)
- Singleton pattern for global access
Key Methods:
def get_book_metadata(isbn: str) -> dict
def search_books_fts(query: str) -> List[dict]
def get_books_batch(isbns: List[str]) -> List[dict]
Dependencies:
data/books.db(main store)data/online_books.db(staging store for API fetches)
Data Flow
RAG Flow (User Query β LLM Response)
1. User Query β QueryRouter
ββ Classify intent: EXACT / FAST / DEEP
2. VectorDB.hybrid_search()
ββ BM25 sparse search (k=50)
ββ Chroma dense search (k=50)
ββ RRF fusion β Top-50 candidates
3. [Optional] Reranker.rerank() (DEEP only)
ββ Cross-encoder scoring β Top-10
4. [Optional] TemporalBooster.boost() (if "new"/"latest" detected)
ββ Recency scoring
5. MetadataStore.get_books_batch()
ββ Enrich with full metadata
6. LLM Generation (streaming)
ββ SSE response to frontend
RecSys Flow (User β Personalized Recommendations)
1. RecallFusion.get_recall_items(user_id)
ββ ItemCF.recommend() β RRF
ββ SASRec.recommend() β RRF
ββ YoutubeDNN.recommend() β RRF
ββ Popularity.recommend() β RRF
ββ Merge β Top-100 candidates
2. FeatureEngineer.extract_features(user_id, candidates)
ββ 17 features per candidate
3. LGBMRanker.predict(features)
ββ Ranking scores
4. DiversityReranker.rerank() [P0]
ββ MMR for diversity (Ξ»=0.75)
ββ Popularity penalty (Ξ³=0.3)
ββ Max 3 per category in top-10
5. MetadataStore.get_books_batch()
ββ Return enriched results
File Organization
src/
βββ main.py # FastAPI app entry point (669 lines)
βββ config.py # Global configuration
βββ utils.py # Logging, helpers
β
βββ api/ # REST API endpoints
β βββ chat.py
β
βββ services/ # Business logic orchestration
β βββ chat_service.py # RAG pipeline (157 lines)
β βββ recommend_service.py # RecSys pipeline (313 lines)
β
βββ core/ # Domain logic
β βββ router.py # Query intent classification (177 lines)
β βββ reranker.py # Cross-encoder reranking
β βββ temporal.py # Recency boosting
β βββ llm.py # LLM factory (OpenAI/Ollama)
β βββ metadata_store.py # SQLite metadata access (288 lines)
β βββ diversity_reranker.py # MMR diversity (204 lines)
β βββ context_compressor.py # Chat history compression
β βββ freshness_monitor.py # Staleness detection (231 lines)
β βββ web_search.py # Google Books API (427 lines)
β βββ online_books_store.py # Staging DB (220 lines)
β
βββ vector_db.py # Hybrid search (368 lines)
β
βββ recall/ # Recall channels
β βββ fusion.py # RRF fusion (173 lines)
β βββ itemcf.py # Item CF (221 lines)
β βββ usercf.py # User CF (150 lines)
β βββ swing.py # Swing (163 lines)
β βββ sasrec_recall.py # SASRec (337 lines)
β βββ item2vec.py # Item2Vec (122 lines)
β βββ embedding.py # YoutubeDNN (333 lines)
β βββ popularity.py # Popularity (59 lines)
β
βββ ranking/ # Ranking stage
β βββ features.py # Feature engineering (470 lines)
β βββ din.py # DIN ranker (220 lines)
β βββ explainer.py # SHAP explainability
β
βββ data/ # Data access layer
β βββ repository.py # Unified data interface
β
βββ user/ # User management
β βββ profile_store.py # User profiles (246 lines)
β
βββ model/ # Model training
βββ sasrec.py # SASRec PyTorch model
Design Patterns
1. Singleton Pattern
Where: VectorDB, MetadataStore, ChatService
Why:
- Share heavy resources (embeddings, DB connections)
- Avoid reloading models on each request
Example:
class VectorDB:
_instance = None
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
2. Repository Pattern
Where: DataRepository (src/data/repository.py)
Why:
- Abstract data access (SQLite, ChromaDB, profile store)
- Single source of truth for book metadata
Example:
class DataRepository:
def get_book_metadata(self, isbn: str) -> dict:
# Check main store β online store β return None
return metadata_store.get_book_metadata(isbn)
3. Strategy Pattern
Where: QueryRouter (RAG strategies), RecallFusion (channel selection)
Why:
- Different retrieval strategies based on query type
- Configurable channel weights
Example:
class QueryRouter:
def route(self, query: str) -> dict:
if self._is_isbn(query):
return {"strategy": "EXACT", "alpha": 1.0, "rerank": False}
elif self._is_keyword(query):
return {"strategy": "FAST", "alpha": 0.5, "rerank": False}
else:
return {"strategy": "DEEP", "alpha": 0.3, "rerank": True}
4. Factory Pattern
Where: LLMFactory (src/core/llm.py)
Why:
- Support multiple LLM providers (OpenAI, Ollama)
- Runtime provider selection via API key
Example:
class LLMFactory:
@staticmethod
def create(provider: str, **kwargs):
if provider == "openai":
return ChatOpenAI(**kwargs)
elif provider == "ollama":
return ChatOllama(**kwargs)
5. Lazy Loading
Where: Recall models, rankers, embeddings
Why:
- Reduce startup time
- Load only when first request comes in
Example:
class RecallFusion:
def load_models(self):
if self.models_loaded:
return
# Load all recall models
self.models_loaded = True
Key Configuration Files
| File | Purpose |
|---|---|
| src/config.py | App config (pydantic-settings, paths, hyperparameters, env overrides) |
| config/router.json | Router keywords (detail, freshness, natural-language queries) |
| requirements.txt | Pip dependencies (single source of truth) |
| environment.yml | Conda env (installs from requirements.txt) |
| Makefile | Common commands (make run, make test) |
| Dockerfile | Production deployment |
Common Workflows
Start API Server:
make run
# or: uvicorn src.main:app --reload --port 6006
Run Tests:
make test
# or: pytest tests/
Rebuild Data Pipeline:
make data-pipeline
# or: python scripts/run_pipeline.py
Train Recall Models:
python scripts/model/build_recall_models.py
Evaluate RecSys:
python scripts/model/evaluate.py