book-rec-with-LLMs / docs /performance /memory_optimization.md
ymlin105's picture
chore: reorganize documentation structure and clean repository root
78cfff7

Technical Report: No-Loss Memory Optimization for HF Spaces

Objective

The primary goal was to resolve the "Memory limit exceeded (16Gi)" error on Hugging Face Spaces while maintaining the full dataset capacity (221k books) and recommendation quality.

The RAM Bottleneck (The Problem)

The original research architecture relied on high-memory Python structures that were unsustainable for production deployment:

  • ItemCF Similarity Matrix: A 1.4GB pickle file that expanded to ~7GB+ in RAM when loaded as a nested Python dictionary.
  • Keyword Search (BM25): Required loading the entire tokenized corpus into memory, consuming ~4GB+ RAM.
  • Metadata Overhead: Pandas DataFrames and ISBN-to-Title maps added another ~250MB+, pushing the system beyond the 16Gi limit at startup.

The Zero-RAM Architecture (The Solution)

We transitioned from a "Load-All-at-Startup" model to a "Query-on-Demand" architecture using SQLite:

1. SQLite-Backed Recall Models

  • Action: Migrated the 1.4GB itemcf.pkl into a dedicated recall_models.db.
  • Implementation: Refactored ItemCF to use optimized SQL queries (SUM/GROUP BY) for candidate generation.
  • Impact: Reduced RAM overhead from 7GB+ to 0.25MB per model.

2. SQLite FTS5 for Keyword Search

  • Action: Replaced the rank_bm25 library with the native SQLite FTS5 (Full Text Search) engine.
  • Implementation: Built a virtual table for the full 221,998 book dataset.
  • Impact: Zero-RAM indexing. Search relevance is identical (BM25-based) but index data stays on disk.

3. Metadata Store Refactor

  • Action: Replaced the global books_df DataFrame with a disk-based lookup.
  • Implementation: MetadataStore.get_book_metadata() fetches only what is needed for the current Top-K results.
  • Impact: Eliminated 250MB+ of baseline RAM usage.

Verified Results (Metrics)

Metric Baseline (Original) Final (SQLite/FTS5) Savings
Peak RAM Usage ~19.8 GB (Crash) ~750 MB ~19 GB (96%)
Dataset Size 221,998 books 221,998 books No Loss
Recommendation HR@10 0.81 0.81 No Loss
Search Relevancy BM25 BM25 (FTS5) Parity

Engineering Rationale (The "Why")

We chose SQLite and FTS5 over other solutions (like pruning or external caches) for three reasons:

  1. Mathematical Parity: SQL aggregations (SUM, GROUP BY) are mathematically identical to Python dictionary loops for Collaborative Filtering. No accuracy is sacrificed.
  2. Local Persistence: SQLite is a serverless file-based DB, making it perfect for Hugging Face Spaces where you want to minimize external dependencies.
  3. Stability: Disk-based lookups ensure that even if the dataset grows to 1M books, the memory footprint remains constant.

Conclusion

This engineering overhaul transforms the Book Recommendation System into a production-ready application. It solves the OOM crisis and restores the full scientific capacity of the model—running more data on less hardware.