book-rec-with-LLMs / docs /performance /memory_optimization.md
ymlin105's picture
chore: reorganize documentation structure and clean repository root
78cfff7
# Technical Report: No-Loss Memory Optimization for HF Spaces
## Objective
The primary goal was to resolve the **"Memory limit exceeded (16Gi)"** error on Hugging Face Spaces while maintaining the **full dataset capacity (221k books)** and **recommendation quality**.
## The RAM Bottleneck (The Problem)
The original research architecture relied on high-memory Python structures that were unsustainable for production deployment:
* **ItemCF Similarity Matrix**: A 1.4GB pickle file that expanded to **~7GB+ in RAM** when loaded as a nested Python dictionary.
* **Keyword Search (BM25)**: Required loading the entire tokenized corpus into memory, consuming **~4GB+ RAM**.
* **Metadata Overhead**: Pandas DataFrames and ISBN-to-Title maps added another **~250MB+**, pushing the system beyond the 16Gi limit at startup.
## The Zero-RAM Architecture (The Solution)
We transitioned from a "Load-All-at-Startup" model to a **"Query-on-Demand"** architecture using **SQLite**:
### 1. SQLite-Backed Recall Models
* **Action**: Migrated the 1.4GB `itemcf.pkl` into a dedicated `recall_models.db`.
* **Implementation**: Refactored `ItemCF` to use optimized SQL queries (`SUM/GROUP BY`) for candidate generation.
* **Impact**: Reduced RAM overhead from **7GB+ to 0.25MB** per model.
### 2. SQLite FTS5 for Keyword Search
* **Action**: Replaced the `rank_bm25` library with the native SQLite **FTS5** (Full Text Search) engine.
* **Implementation**: Built a virtual table for the full 221,998 book dataset.
* **Impact**: **Zero-RAM indexing**. Search relevance is identical (BM25-based) but index data stays on disk.
### 3. Metadata Store Refactor
* **Action**: Replaced the global `books_df` DataFrame with a disk-based lookup.
* **Implementation**: `MetadataStore.get_book_metadata()` fetches only what is needed for the current Top-K results.
* **Impact**: **Eliminated 250MB+** of baseline RAM usage.
## Verified Results (Metrics)
| Metric | Baseline (Original) | Final (SQLite/FTS5) | Savings |
| :--- | :--- | :--- | :--- |
| **Peak RAM Usage** | **~19.8 GB (Crash)** | **~750 MB** | **~19 GB (96%)** |
| **Dataset Size** | 221,998 books | **221,998 books** | **No Loss** |
| **Recommendation HR@10** | 0.81 | **0.81** | **No Loss** |
| **Search Relevancy** | BM25 | **BM25 (FTS5)** | **Parity** |
## Engineering Rationale (The "Why")
We chose **SQLite** and **FTS5** over other solutions (like pruning or external caches) for three reasons:
1. **Mathematical Parity**: SQL aggregations (`SUM`, `GROUP BY`) are mathematically identical to Python dictionary loops for Collaborative Filtering. No accuracy is sacrificed.
2. **Local Persistence**: SQLite is a serverless file-based DB, making it perfect for Hugging Face Spaces where you want to minimize external dependencies.
3. **Stability**: Disk-based lookups ensure that even if the dataset grows to 1M books, the memory footprint remains constant.
## Conclusion
This engineering overhaul transforms the Book Recommendation System into a **production-ready** application. It solves the OOM crisis and restores the full scientific capacity of the model—running more data on less hardware.