book-rec-with-LLMs / docs /architecture.md
ymlin105's picture
chore: remove obsolete files and update project structure
6ad997d

System Architecture

Table of Contents

  1. High-Level Overview
  2. Module Dependency Graph
  3. Core Components
  4. Data Flow
  5. File Organization
  6. Design Patterns
  7. Quick Reference

High-Level Overview

The system implements a dual-path recommendation architecture:

flowchart TB
    subgraph Entry
        A[User Request]
    end

    A --> B{Has Query?}
    B -->|Yes| RAG
    B -->|No| Rec

    subgraph RAG[RAG Path]
        R1[Query Router]
        R2[Hybrid Search]
        R3[Reranking]
        R4[LLM Gen]
        R1 --> R2 --> R3 --> R4
    end

    subgraph Rec[RecSys Path]
        C1[Multi-Channel Recall]
        C2[Feature Eng]
        C3[LGBMRanker]
        C1 --> C2 --> C3
    end

    RAG --> Res[Top-K Results]
    Rec --> Res

Key Insight: Users with explicit queries use RAG (semantic search + LLM); users without queries use collaborative filtering + ranking.

Block Responsibility
Query Router Chooses strategy (EXACT/FAST/DEEP) and whether to rerank based on intent (ISBN, keywords, or natural language).
Hybrid Search ChromaDB (semantic) + SQLite FTS5 (keyword), merged via RRF.
Reranking Cross-Encoder reranks candidates to reduce semantic drift.
Multi-Channel Recall 7 channels (ItemCF, SASRec, YoutubeDNN, etc.) fused with RRF.
Feature Eng + LGBMRanker Feature engineering and LambdaRank-style ranking.

Module Dependency Graph

Layer 1: Entry Points (API)

src/main.py (FastAPI app)
    β”œβ”€β”€ src/api/chat.py     # Chat API (single router)
    └── main.py             # All other endpoints defined inline

Layer 2: Service Orchestration

Service Layer
β”œβ”€β”€ src/services/chat_service.py
β”‚   └── Orchestrates RAG pipeline for "Chat with Book"
β”‚
β”œβ”€β”€ src/services/recommend_service.py
β”‚   └── Orchestrates RecSys pipeline (7-channel recall β†’ LGBMRanker)
β”‚
└── src/services/personal_recommend_handler.py
    └── Handles /api/recommend/personal (params, cold-start intent probing, enrichment)

Layer 3: Core Logic (RAG Path)

Layer 3: Core Logic (RAG Path)
RAG Pipeline
β”œβ”€β”€ src/core/router.py (QueryRouter)
β”‚   β”œβ”€β”€ Classifies query intent (EXACT/FAST/DEEP)
β”‚   └── Routes to appropriate retrieval strategy
β”‚
β”œβ”€β”€ src/vector_db.py (VectorDB - Singleton)
β”‚   β”œβ”€β”€ ChromaDB for dense retrieval
β”‚   β”œβ”€β”€ SQLite FTS5 for sparse retrieval
β”‚   β”œβ”€β”€ Hybrid Search (RRF fusion)
β”‚   └── Small-to-Big retrieval (sentence β†’ book)
β”‚
β”œβ”€β”€ src/core/reranker.py (Reranker)
β”‚   └── Cross-encoder (ms-marco-MiniLM) for top-K precision
β”‚
β”œβ”€β”€ src/core/temporal.py (TemporalRanker)
β”‚   └── Recency boosting for "latest/new" queries
β”‚
β”œβ”€β”€ src/core/diversity_reranker.py (optional)
β”‚   └── MMR + category constraint when ENABLE_RAG_DIVERSITY
β”‚
└── src/core/llm.py (LLMFactory)
    └── OpenAI / Ollama / Groq integration for generation

Layer 4: Core Logic (RecSys Path)

RecSys Pipeline
β”œβ”€β”€ src/recall/fusion.py (RecallFusion)
β”‚   β”œβ”€β”€ Coordinates 7 recall channels via RRF
β”‚   └── Channels:
β”‚       β”œβ”€β”€ src/recall/itemcf.py (ItemCF - direction-weighted)
β”‚       β”œβ”€β”€ src/recall/usercf.py (UserCF - user-user co-occurrence)
β”‚       β”œβ”€β”€ src/recall/swing.py (Swing - user-pair overlap)
β”‚       β”œβ”€β”€ src/recall/sasrec_recall.py (SASRec - Transformer)
β”‚       β”œβ”€β”€ src/recall/item2vec.py (Item2Vec - Word2Vec)
β”‚       β”œβ”€β”€ src/recall/embedding.py (YoutubeDNN - Two-tower)
β”‚       └── src/recall/popularity.py (Popularity - cold-start)
β”‚
β”œβ”€β”€ src/ranking/features.py (FeatureEngineer)
β”‚   └── 17 ranking features (user stats, CF scores, SASRec scores)
β”‚
β”œβ”€β”€ LGBMRanker (lgbm_ranker.txt, loaded in recommend_service.py)
β”‚   └── LambdaRank optimization for NDCG
β”‚
β”œβ”€β”€ src/ranking/din.py (DINRanker - optional)
β”‚   └── Deep Interest Network for sequence modeling
β”‚
└── src/core/diversity_reranker.py (DiversityReranker)
    └── MMR + category constraints for diverse results

Layer 5: Data Access

Data Layer
β”œβ”€β”€ src/core/metadata_store.py (MetadataStore)
β”‚   β”œβ”€β”€ SQLite (books.db) for book metadata (Zero-RAM mode)
β”‚   └── FTS5 (books_fts) for full-text search
β”‚
β”œβ”€β”€ src/core/online_books_store.py (OnlineBooksStore - Singleton)
β”‚   └── SQLite (online_books.db) for web-discovered books (freshness_fallback)
β”‚
β”œβ”€β”€ src/user/profile_store.py (module)
β”‚   └── JSON (user_profiles.json) for user favorites and reading history
β”‚
└── src/data/repository.py (DataRepository - Singleton)
    └── Unified interface: metadata (MetadataStore) + user interaction history (recall_models.db)

Supporting Modules

Utilities & Extensions
β”œβ”€β”€ src/core/context_compressor.py
β”‚   └── Chat history compression (LLM summarization)
β”‚
β”œβ”€β”€ src/core/freshness_monitor.py
β”‚   └── Detects when local data is stale (used to trigger web search)
β”‚
β”œβ”€β”€ src/core/web_search.py
β”‚   └── Google Books API fallback for missing/fresh books
β”‚
β”œβ”€β”€ src/marketing/persona.py
β”‚   └── User persona generation for chat context
β”‚
└── src/config.py
    └── Global configuration (pydantic-settings, paths, hyperparameters)

Core Components

1. QueryRouter (src/core/router.py)

Responsibility: Intent classification for RAG queries.

Strategies:

  • EXACT: ISBN/exact match β†’ BM25 only (no reranking)
  • FAST: 1-2 keywords β†’ Hybrid search (no reranking)
  • DEEP: Complex queries β†’ Hybrid + Cross-encoder reranking

Key Logic:

if re.match(r"^\d{10,13}$", query):
    return "EXACT"
elif len(words) <= 2:
    return "FAST"
else:
    return "DEEP"

Dependencies:

  • src/core/intent_classifier.py (optional ML classifier)
  • config/router.json (detail keywords: "twist", "ending", etc.)

2. VectorDB (src/vector_db.py)

Responsibility: Hybrid retrieval (BM25 + Dense embeddings).

Components:

  • ChromaDB: Dense retrieval (all-MiniLM-L6-v2, 384-dim)
  • BM25Okapi: Sparse retrieval for exact/keyword matches
  • RRF: Reciprocal Rank Fusion for merging results

Key Method:

def hybrid_search(query, k=10, alpha=0.5):
    dense_results = chroma.search(query, k=50)
    sparse_results = bm25.get_top_n(query, k=50)
    merged = rrf_fusion(dense_results, sparse_results, alpha)
    return merged[:k]

Dependencies:

  • rank_bm25 library
  • chromadb library
  • sentence-transformers

3. RecallFusion (src/recall/fusion.py)

Responsibility: Multi-channel recall aggregation.

Algorithm: Reciprocal Rank Fusion (RRF)

for rank, (item, score) in enumerate(channel_results):
    rrf_score = weight * (1.0 / (k + rank + 1))
    candidates[item] += rrf_score

Channels (configurable):

  • ItemCF (weight=1.0) β€” co-occurrence with direction weight
  • SASRec (weight=1.0) β€” sequential Transformer
  • YoutubeDNN (weight=1.0) β€” two-tower neural
  • Item2Vec (weight=0.8) β€” Word2Vec on sequences
  • UserCF (weight=1.0) β€” Jaccard similarity
  • Swing (weight=1.0) β€” user-pair overlap weighting
  • Popularity (weight=0.5) β€” cold-start fallback

Dependencies: All recall modules in src/recall/


4. FeatureEngineer (src/ranking/features.py)

Responsibility: Generate 17 ranking features from recall candidates.

Feature Groups:

  1. User Stats: u_cnt, u_mean, u_std
  2. Item Stats: i_cnt, i_mean, i_std
  3. Cross Features: len_diff, u_auth_avg, u_auth_match, is_cat_hob
  4. Sequence: sasrec_score, sim_max, sim_min, sim_mean
  5. CF Scores: icf_sum, icf_max, ucf_sum

Dependencies:

  • Recall models for CF scores
  • SASRec embeddings
  • data/rec/train.csv for statistics

5. MetadataStore (src/core/metadata_store.py)

Responsibility: Zero-RAM book metadata lookup.

Implementation:

  • SQLite with FTS5 index
  • Lazy connection (opens on first query)
  • Singleton pattern for global access

Key Methods:

def get_book_metadata(isbn: str) -> dict
def search_books_fts(query: str) -> List[dict]
def get_books_batch(isbns: List[str]) -> List[dict]

Dependencies:

  • data/books.db (main store)
  • data/online_books.db (staging store for API fetches)

Data Flow

RAG Flow (User Query β†’ LLM Response)

1. User Query β†’ QueryRouter
   └─ Classify intent: EXACT / FAST / DEEP

2. VectorDB.hybrid_search()
   β”œβ”€ BM25 sparse search (k=50)
   β”œβ”€ Chroma dense search (k=50)
   └─ RRF fusion β†’ Top-50 candidates

3. [Optional] Reranker.rerank() (DEEP only)
   └─ Cross-encoder scoring β†’ Top-10

4. [Optional] TemporalBooster.boost() (if "new"/"latest" detected)
   └─ Recency scoring

5. MetadataStore.get_books_batch()
   └─ Enrich with full metadata

6. LLM Generation (streaming)
   └─ SSE response to frontend

RecSys Flow (User β†’ Personalized Recommendations)

1. RecallFusion.get_recall_items(user_id)
   β”œβ”€ ItemCF.recommend() β†’ RRF
   β”œβ”€ SASRec.recommend() β†’ RRF
   β”œβ”€ YoutubeDNN.recommend() β†’ RRF
   β”œβ”€ Popularity.recommend() β†’ RRF
   └─ Merge β†’ Top-100 candidates

2. FeatureEngineer.extract_features(user_id, candidates)
   └─ 17 features per candidate

3. LGBMRanker.predict(features)
   └─ Ranking scores

4. DiversityReranker.rerank() [P0]
   β”œβ”€ MMR for diversity (Ξ»=0.75)
   β”œβ”€ Popularity penalty (Ξ³=0.3)
   └─ Max 3 per category in top-10

5. MetadataStore.get_books_batch()
   └─ Return enriched results

File Organization

src/
β”œβ”€β”€ main.py                    # FastAPI app entry point (669 lines)
β”œβ”€β”€ config.py                  # Global configuration
β”œβ”€β”€ utils.py                   # Logging, helpers
β”‚
β”œβ”€β”€ api/                       # REST API endpoints
β”‚   └── chat.py
β”‚
β”œβ”€β”€ services/                  # Business logic orchestration
β”‚   β”œβ”€β”€ chat_service.py        # RAG pipeline (157 lines)
β”‚   └── recommend_service.py   # RecSys pipeline (313 lines)
β”‚
β”œβ”€β”€ core/                      # Domain logic
β”‚   β”œβ”€β”€ router.py              # Query intent classification (177 lines)
β”‚   β”œβ”€β”€ reranker.py            # Cross-encoder reranking
β”‚   β”œβ”€β”€ temporal.py            # Recency boosting
β”‚   β”œβ”€β”€ llm.py                 # LLM factory (OpenAI/Ollama)
β”‚   β”œβ”€β”€ metadata_store.py      # SQLite metadata access (288 lines)
β”‚   β”œβ”€β”€ diversity_reranker.py  # MMR diversity (204 lines)
β”‚   β”œβ”€β”€ context_compressor.py  # Chat history compression
β”‚   β”œβ”€β”€ freshness_monitor.py   # Staleness detection (231 lines)
β”‚   β”œβ”€β”€ web_search.py          # Google Books API (427 lines)
β”‚   └── online_books_store.py  # Staging DB (220 lines)
β”‚
β”œβ”€β”€ vector_db.py               # Hybrid search (368 lines)
β”‚
β”œβ”€β”€ recall/                    # Recall channels
β”‚   β”œβ”€β”€ fusion.py              # RRF fusion (173 lines)
β”‚   β”œβ”€β”€ itemcf.py              # Item CF (221 lines)
β”‚   β”œβ”€β”€ usercf.py              # User CF (150 lines)
β”‚   β”œβ”€β”€ swing.py               # Swing (163 lines)
β”‚   β”œβ”€β”€ sasrec_recall.py       # SASRec (337 lines)
β”‚   β”œβ”€β”€ item2vec.py            # Item2Vec (122 lines)
β”‚   β”œβ”€β”€ embedding.py           # YoutubeDNN (333 lines)
β”‚   └── popularity.py          # Popularity (59 lines)
β”‚
β”œβ”€β”€ ranking/                   # Ranking stage
β”‚   β”œβ”€β”€ features.py            # Feature engineering (470 lines)
β”‚   β”œβ”€β”€ din.py                 # DIN ranker (220 lines)
β”‚   └── explainer.py           # SHAP explainability
β”‚
β”œβ”€β”€ data/                      # Data access layer
β”‚   └── repository.py          # Unified data interface
β”‚
β”œβ”€β”€ user/                      # User management
β”‚   └── profile_store.py       # User profiles (246 lines)
β”‚
└── model/                     # Model training
    └── sasrec.py              # SASRec PyTorch model

Design Patterns

1. Singleton Pattern

Where: VectorDB, MetadataStore, ChatService

Why:

  • Share heavy resources (embeddings, DB connections)
  • Avoid reloading models on each request

Example:

class VectorDB:
    _instance = None

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance

2. Repository Pattern

Where: DataRepository (src/data/repository.py)

Why:

  • Abstract data access (SQLite, ChromaDB, profile store)
  • Single source of truth for book metadata

Example:

class DataRepository:
    def get_book_metadata(self, isbn: str) -> dict:
        # Check main store β†’ online store β†’ return None
        return metadata_store.get_book_metadata(isbn)

3. Strategy Pattern

Where: QueryRouter (RAG strategies), RecallFusion (channel selection)

Why:

  • Different retrieval strategies based on query type
  • Configurable channel weights

Example:

class QueryRouter:
    def route(self, query: str) -> dict:
        if self._is_isbn(query):
            return {"strategy": "EXACT", "alpha": 1.0, "rerank": False}
        elif self._is_keyword(query):
            return {"strategy": "FAST", "alpha": 0.5, "rerank": False}
        else:
            return {"strategy": "DEEP", "alpha": 0.3, "rerank": True}

4. Factory Pattern

Where: LLMFactory (src/core/llm.py)

Why:

  • Support multiple LLM providers (OpenAI, Ollama)
  • Runtime provider selection via API key

Example:

class LLMFactory:
    @staticmethod
    def create(provider: str, **kwargs):
        if provider == "openai":
            return ChatOpenAI(**kwargs)
        elif provider == "ollama":
            return ChatOllama(**kwargs)

5. Lazy Loading

Where: Recall models, rankers, embeddings

Why:

  • Reduce startup time
  • Load only when first request comes in

Example:

class RecallFusion:
    def load_models(self):
        if self.models_loaded:
            return
        # Load all recall models
        self.models_loaded = True

Key Configuration Files

File Purpose
src/config.py App config (pydantic-settings, paths, hyperparameters, env overrides)
config/router.json Router keywords (detail, freshness, natural-language queries)
requirements.txt Pip dependencies (single source of truth)
environment.yml Conda env (installs from requirements.txt)
Makefile Common commands (make run, make test)
Dockerfile Production deployment

Common Workflows

Start API Server:

make run
# or: uvicorn src.main:app --reload --port 6006

Run Tests:

make test
# or: pytest tests/

Rebuild Data Pipeline:

make data-pipeline
# or: python scripts/run_pipeline.py

Train Recall Models:

python scripts/model/build_recall_models.py

Evaluate RecSys:

python scripts/model/evaluate.py