| --- |
| license: mit |
| datasets: |
| - hiiamkik/kuai-rec-data |
| language: |
| - en |
| pipeline_tag: tabular-classification |
| tags: |
| - recommender-system |
| - pytorch |
| - lightgcn |
| - mmoe |
| - faiss |
| - causal-inference |
| --- |
| # GraphRecSys |
|
|
| Production-style recommendation system that combines graph retrieval, causal debiasing, multi-objective ranking, calibrated probabilities, vector search, and low-latency serving. |
|
|
| This project is designed as an end-to-end recommender systems portfolio piece: it starts from raw KuaiRec interaction logs, trains a debiased LightGCN retrieval model, indexes item embeddings with FAISS, ranks candidates with an MMoE multi-task model, calibrates click probabilities, and serves personalized recommendations through FastAPI with Redis-backed embedding caching. |
|
|
| ## Why This Project Exists |
|
|
| Most recommender demos stop at model training. Real recommendation systems are pipelines: data quality, retrieval, ranking, calibration, serving latency, offline evaluation, and product trade-offs all matter at the same time. |
|
|
| GraphRec-MultiOpt demonstrates those production concerns in one coherent system: |
|
|
| - **Retrieval:** graph collaborative filtering with LightGCN. |
| - **Debiasing:** inverse propensity weighting to reduce exposure bias. |
| - **Ranking:** multi-task MMoE for click probability and expected value. |
| - **Calibration:** Platt scaling and reliability diagrams for trustworthy probabilities. |
| - **Serving:** FastAPI endpoint with FAISS candidate retrieval, Redis cache, scalarization, and diversity reranking. |
| - **Decision support:** mock A/B simulation and Pareto frontier analysis for engagement vs. value trade-offs. |
|
|
| ## System Architecture |
|
|
| ```mermaid |
| flowchart LR |
| raw["KuaiRec raw logs"] --> loader["Schema validation + labels"] |
| loader --> split["Temporal train/val/test split"] |
| split --> graph_data["PyG bipartite graph"] |
| split --> propensity["Item propensity estimates"] |
| |
| graph_data --> lightgcn["LightGCN retrieval"] |
| propensity --> lightgcn |
| lightgcn --> embeddings["User/item embeddings"] |
| |
| embeddings --> faiss["FAISS IVF-PQ index"] |
| embeddings --> features["Ranking feature builder"] |
| split --> features |
| features --> mmoe["MMoE ranker"] |
| mmoe --> calibration["Platt calibration"] |
| |
| faiss --> api["FastAPI /recommend"] |
| mmoe --> api |
| calibration --> api |
| redis["Redis embedding cache"] --> api |
| api --> response["Top-10 recommendations"] |
| |
| mmoe --> ab["Mock A/B simulation"] |
| ab --> pareto["Pareto frontier"] |
| ``` |
|
|
| ## Technical Highlights |
|
|
| | Area | Implementation | |
| |---|---| |
| | Dataset | KuaiRec dense multi-action logs | |
| | Retrieval | LightGCN with 3 graph propagation layers | |
| | Retrieval loss | BPR with optional inverse propensity weighting | |
| | Negative sampling | Uniform sampler with API reserved for hard negatives | |
| | Vector search | FAISS IVF-PQ, configurable `nprobe` | |
| | Ranking model | Multi-gate Mixture-of-Experts with click and value towers | |
| | Ranking targets | `label_click = watch_ratio >= 0.5`, `label_value = log1p(watch_ratio)` | |
| | Calibration | Platt scaling on validation logits | |
| | Diversity | Maximal Marginal Relevance reranking | |
| | Serving | Async FastAPI app with latency breakdown | |
| | Cache | Redis user embedding cache with TTL | |
| | Evaluation | Recall@K, NDCG@K, AUC, MSE/RMSE, ECE, latency, Pareto sweep | |
| | Tracking | MLflow metrics and artifacts | |
|
|
| ## Repository Layout |
|
|
| ```text |
| . |
| ├── data/ |
| │ ├── download.py |
| │ ├── raw/ |
| │ └── processed/ |
| ├── src/ |
| │ ├── data/ # loading, splitting, graph construction, propensity |
| │ ├── retrieval/ # LightGCN, BPR, negative sampling, retrieval eval |
| │ ├── indexing/ # FAISS index build/query/benchmark |
| │ ├── ranking/ # feature builder, MMoE, calibration, ranking eval |
| │ ├── serving/ # FastAPI, Redis cache, schemas, scoring |
| │ └── evaluation/ # A/B simulation, Pareto frontier, results report |
| ├── configs/ |
| ├── tests/ |
| ├── scripts/ |
| ├── outputs/ |
| ├── checkpoints/ |
| ├── Dockerfile |
| ├── implementation_plan.md |
| └── recsys_architecture.md |
| ``` |
|
|
| ## Modeling Approach |
|
|
| ### 1. Data And Labels |
|
|
| The data layer validates KuaiRec interaction logs, derives model targets, and creates train/validation/test splits. |
|
|
| ```python |
| label_click = (watch_ratio >= 0.5).astype(int) |
| label_value = np.log1p(watch_ratio) |
| ``` |
|
|
| The graph builder creates a PyTorch Geometric `HeteroData` bipartite graph: |
|
|
| - Node types: `user`, `item` |
| - Edge type: `("user", "interacts", "item")` |
| - Reverse edge type for message passing |
| - Edge weights from clipped watch ratio |
|
|
| ### 2. Debiased Retrieval |
|
|
| The retrieval stage trains LightGCN using Bayesian Personalized Ranking: |
|
|
| ```text |
| loss = -mean(IPS(item) * log sigmoid(score(user, positive) - score(user, negative))) |
| ``` |
|
|
| The IPS term upweights less frequently exposed items, reducing the tendency of the retrieval model to overfit historical exposure patterns. |
|
|
| ### 3. Multi-Objective Ranking |
|
|
| The ranking model uses MMoE to optimize two related objectives: |
|
|
| - **pClick tower:** calibrated probability that the user meaningfully engages. |
| - **E-value tower:** expected value proxy based on watch ratio. |
|
|
| Ranking features combine: |
|
|
| - user embedding |
| - item embedding |
| - time/session context |
| - item duration |
| - category representation |
|
|
| Total feature dimension: `1046`. |
|
|
| ### 4. Serving-Time Optimization |
|
|
| The serving endpoint follows the same shape used by production recommendation stacks: |
|
|
| 1. Fetch user embedding from Redis or local embedding table. |
| 2. Retrieve top-K candidates from FAISS. |
| 3. Build ranking features for candidates. |
| 4. Score candidates with MMoE. |
| 5. Apply Platt calibration. |
| 6. Scalarize engagement and value. |
| 7. Apply MMR diversity reranking. |
| 8. Return top-10 items with latency breakdown. |
|
|
| ## Quickstart |
|
|
| ### Install |
|
|
| ```bash |
| python -m venv .venv |
| source .venv/bin/activate |
| pip install -r requirements.txt |
| ``` |
|
|
| ### Run The Pipeline |
|
|
| ```bash |
| bash scripts/run_pipeline.sh |
| ``` |
|
|
| The pipeline follows the architecture sequence: |
|
|
| ```text |
| download -> preprocess -> graph -> propensity -> LightGCN -> FAISS -> ranking -> calibration -> evaluation -> serving |
| ``` |
|
|
| For raw data without timestamps, the split script can use a deterministic fallback: |
|
|
| ```bash |
| python -m src.data.splits --allow_no_timestamp |
| ``` |
|
|
| ### Run FAISS Benchmark |
|
|
| ```bash |
| bash scripts/run_benchmark.sh |
| ``` |
|
|
| Benchmark output is written to: |
|
|
| ```text |
| outputs/faiss_benchmark.csv |
| ``` |
|
|
| ## Serving API |
|
|
| Start the service: |
|
|
| ```bash |
| uvicorn src.serving.app:app --host 0.0.0.0 --port 8000 |
| ``` |
|
|
| Health check: |
|
|
| ```bash |
| curl http://localhost:8000/health |
| ``` |
|
|
| Recommendation request: |
|
|
| ```bash |
| curl http://localhost:8000/recommend/0 |
| ``` |
|
|
| Example response shape: |
|
|
| ```json |
| { |
| "user_id": 0, |
| "items": [ |
| { |
| "item_id": 123, |
| "p_click": 0.71, |
| "e_value": 1.42, |
| "final_score": 0.82 |
| } |
| ], |
| "retrieval_latency_ms": 6.4, |
| "ranking_latency_ms": 14.8, |
| "total_latency_ms": 23.1, |
| "cache_hit": true |
| } |
| ``` |
|
|
| Prometheus-compatible metrics: |
|
|
| ```bash |
| curl http://localhost:8000/metrics |
| ``` |
|
|
| Reload model artifacts: |
|
|
| ```bash |
| curl -X POST http://localhost:8000/reload |
| ``` |
|
|
| ## Evaluation |
|
|
| The project evaluates recommender quality at multiple layers. |
|
|
| | Layer | Metrics | |
| |---|---| |
| | Retrieval | Recall@10, Recall@20, Recall@50, Recall@500, NDCG@10 | |
| | Ranking | ROC-AUC, MSE, RMSE | |
| | Calibration | ECE before/after Platt scaling, reliability curve | |
| | Serving | p50, p95, p99 latency | |
| | Product trade-off | Simulated CTR, GMV proxy, diversity, Pareto frontier | |
|
|
| Generate the final results table: |
|
|
| ```bash |
| python -m src.evaluation.report |
| ``` |
|
|
| Outputs: |
|
|
| ```text |
| outputs/results_table.csv |
| outputs/results_table.md |
| outputs/calibration_curve.png |
| outputs/pareto_curve.png |
| ``` |
|
|
| ## Results |
|
|
| Metrics are generated after running the full pipeline. This table is intentionally artifact-driven so reported numbers come from reproducible runs rather than hand-edited README values. |
|
|
| | Metric | LightGCN + IPS | MMoE single-task | MMoE multi-task | |
| |:-----------------|:-----------------|:-------------------|:------------------| |
| | Recall@500 | 0.0011 | - | - | |
| | NDCG@10 | 0.0443 | - | - | |
| | AUC (pClick) | - | 0.8319 | 0.8223 | |
| | ECE (after cal.) | - | - | 0.0677 | |
| | MSE (E-value) | - | 0.1172 | 0.0787 | |
| | p50 latency ms | 0.04 | - | - | |
| | p99 latency ms | 0.13 | - | - | |
|
|
| ## Configuration |
|
|
| The system is config-driven: |
|
|
| - `configs/retrieval.yaml` |
| - `configs/ranking.yaml` |
| - `configs/serving.yaml` |
|
|
| Examples: |
|
|
| ```yaml |
| model: |
| emb_dim: 512 |
| num_layers: 3 |
| |
| training: |
| lr: 1.0e-3 |
| batch_size: 4096 |
| epochs: 100 |
| |
| ips: |
| clip_max: 10.0 |
| ``` |
|
|
| Serving trade-offs can be tuned without changing model code: |
|
|
| ```yaml |
| scoring: |
| w_engagement: 0.6 |
| w_revenue: 0.4 |
| lambda_diversity: 0.3 |
| top_n_serve: 10 |
| ``` |
|
|
| ## Docker |
|
|
| Build: |
|
|
| ```bash |
| docker build -t graphrec-multiopt . |
| ``` |
|
|
| Run: |
|
|
| ```bash |
| docker run -p 8000:8000 graphrec-multiopt |
| ``` |
|
|
| For real experiments, mount model artifacts and processed data as volumes: |
|
|
| ```bash |
| docker run \ |
| -p 8000:8000 \ |
| -v "$(pwd)/data:/app/data" \ |
| -v "$(pwd)/checkpoints:/app/checkpoints" \ |
| graphrec-multiopt |
| ``` |
|
|
| ## Engineering Notes |
|
|
| This repository is structured to show senior-level recommender systems judgment: |
|
|
| - Separates retrieval and ranking instead of forcing one model to do both. |
| - Includes causal debiasing through IPS rather than optimizing only observed engagement. |
| - Treats probability calibration as a first-class serving concern. |
| - Uses vector search and caching to reflect real serving constraints. |
| - Adds diversity reranking to avoid purely exploitative recommendations. |
| - Exposes business-level trade-offs through scalarization and Pareto analysis. |
| - Keeps training, serving, and evaluation configuration outside model code. |
|
|
| ## Known Limitations |
|
|
| - KuaiRec timestamp availability varies by source file; the splitter supports temporal mode when timestamps are present and an explicit deterministic fallback otherwise. |
| - The current hard-negative sampling interface is reserved, while uniform negative sampling is implemented. |
| - Full reported metrics require running the pipeline on the downloaded dataset. |
| - Redis is optional for local development but recommended for serving realism. |
| - FAISS IVF-PQ configuration may need scaling down for tiny smoke-test datasets. |
|
|
| ## Roadmap |
|
|
| - Add hard negative sampling from FAISS retrieval misses. |
| - Add popularity and matrix-factorization baselines. |
| - Add online feature store abstraction for serving-time context. |
| - Add load tests for concurrent recommendation traffic. |
| - [x] Add Docker Compose for API + Redis + MLflow. |
| - [x] Add CI workflow for unit tests, linting, and smoke-mode pipeline execution. |
|
|
| ## Resume Summary |
|
|
| Built an end-to-end production-style recommendation system using PyTorch, PyTorch Geometric, FAISS, Redis, FastAPI, and MLflow. Implemented LightGCN retrieval with IPS debiasing, MMoE multi-task ranking, Platt calibration, MMR diversity reranking, vector-search serving, offline A/B simulation, and Pareto frontier analysis for engagement/value trade-off optimization. |