Spaces:

ymlin105
/

book-rec-with-LLMs

Sleeping

App Files Files Community

book-rec-with-LLMs / docs /performance /memory_optimization.md

ymlin105

chore: reorganize documentation structure and clean repository root

78cfff7 13 days ago

preview code

raw

history blame contribute delete

3.18 kB

	# Technical Report: No-Loss Memory Optimization for HF Spaces

	## Objective
	The primary goal was to resolve the "Memory limit exceeded (16Gi)" error on Hugging Face Spaces while maintaining the full dataset capacity (221k books) and recommendation quality.

	## The RAM Bottleneck (The Problem)
	The original research architecture relied on high-memory Python structures that were unsustainable for production deployment:
	* ItemCF Similarity Matrix: A 1.4GB pickle file that expanded to ~7GB+ in RAM when loaded as a nested Python dictionary.
	* Keyword Search (BM25): Required loading the entire tokenized corpus into memory, consuming ~4GB+ RAM.
	* Metadata Overhead: Pandas DataFrames and ISBN-to-Title maps added another ~250MB+, pushing the system beyond the 16Gi limit at startup.

	## The Zero-RAM Architecture (The Solution)
	We transitioned from a "Load-All-at-Startup" model to a "Query-on-Demand" architecture using SQLite:

	### 1. SQLite-Backed Recall Models
	* Action: Migrated the 1.4GB `itemcf.pkl` into a dedicated `recall_models.db`.
	* Implementation: Refactored `ItemCF` to use optimized SQL queries (`SUM/GROUP BY`) for candidate generation.
	* Impact: Reduced RAM overhead from 7GB+ to 0.25MB per model.

	### 2. SQLite FTS5 for Keyword Search
	* Action: Replaced the `rank_bm25` library with the native SQLite FTS5 (Full Text Search) engine.
	* Implementation: Built a virtual table for the full 221,998 book dataset.
	* Impact: Zero-RAM indexing. Search relevance is identical (BM25-based) but index data stays on disk.

	### 3. Metadata Store Refactor
	* Action: Replaced the global `books_df` DataFrame with a disk-based lookup.
	* Implementation: `MetadataStore.get_book_metadata()` fetches only what is needed for the current Top-K results.
	* Impact: Eliminated 250MB+ of baseline RAM usage.

	## Verified Results (Metrics)

	\| Metric \| Baseline (Original) \| Final (SQLite/FTS5) \| Savings \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| Peak RAM Usage \| ~19.8 GB (Crash) \| ~750 MB \| ~19 GB (96%) \|
	\| Dataset Size \| 221,998 books \| 221,998 books \| No Loss \|
	\| Recommendation HR@10 \| 0.81 \| 0.81 \| No Loss \|
	\| Search Relevancy \| BM25 \| BM25 (FTS5) \| Parity \|

	## Engineering Rationale (The "Why")
	We chose SQLite and FTS5 over other solutions (like pruning or external caches) for three reasons:
	1. Mathematical Parity: SQL aggregations (`SUM`, `GROUP BY`) are mathematically identical to Python dictionary loops for Collaborative Filtering. No accuracy is sacrificed.
	2. Local Persistence: SQLite is a serverless file-based DB, making it perfect for Hugging Face Spaces where you want to minimize external dependencies.
	3. Stability: Disk-based lookups ensure that even if the dataset grows to 1M books, the memory footprint remains constant.

	## Conclusion
	This engineering overhaul transforms the Book Recommendation System into a production-ready application. It solves the OOM crisis and restores the full scientific capacity of the model—running more data on less hardware.

	# Technical Report: No-Loss Memory Optimization for HF Spaces

	## Objective
	The primary goal was to resolve the "Memory limit exceeded (16Gi)" error on Hugging Face Spaces while maintaining the full dataset capacity (221k books) and recommendation quality.

	## The RAM Bottleneck (The Problem)
	The original research architecture relied on high-memory Python structures that were unsustainable for production deployment:
	* ItemCF Similarity Matrix: A 1.4GB pickle file that expanded to ~7GB+ in RAM when loaded as a nested Python dictionary.
	* Keyword Search (BM25): Required loading the entire tokenized corpus into memory, consuming ~4GB+ RAM.
	* Metadata Overhead: Pandas DataFrames and ISBN-to-Title maps added another ~250MB+, pushing the system beyond the 16Gi limit at startup.

	## The Zero-RAM Architecture (The Solution)
	We transitioned from a "Load-All-at-Startup" model to a "Query-on-Demand" architecture using SQLite:

	### 1. SQLite-Backed Recall Models
	* Action: Migrated the 1.4GB `itemcf.pkl` into a dedicated `recall_models.db`.
	* Implementation: Refactored `ItemCF` to use optimized SQL queries (`SUM/GROUP BY`) for candidate generation.
	* Impact: Reduced RAM overhead from 7GB+ to 0.25MB per model.

	### 2. SQLite FTS5 for Keyword Search
	* Action: Replaced the `rank_bm25` library with the native SQLite FTS5 (Full Text Search) engine.
	* Implementation: Built a virtual table for the full 221,998 book dataset.
	* Impact: Zero-RAM indexing. Search relevance is identical (BM25-based) but index data stays on disk.

	### 3. Metadata Store Refactor
	* Action: Replaced the global `books_df` DataFrame with a disk-based lookup.
	* Implementation: `MetadataStore.get_book_metadata()` fetches only what is needed for the current Top-K results.
	* Impact: Eliminated 250MB+ of baseline RAM usage.

	## Verified Results (Metrics)

	\| Metric \| Baseline (Original) \| Final (SQLite/FTS5) \| Savings \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| Peak RAM Usage \| ~19.8 GB (Crash) \| ~750 MB \| ~19 GB (96%) \|
	\| Dataset Size \| 221,998 books \| 221,998 books \| No Loss \|
	\| Recommendation HR@10 \| 0.81 \| 0.81 \| No Loss \|
	\| Search Relevancy \| BM25 \| BM25 (FTS5) \| Parity \|

	## Engineering Rationale (The "Why")
	We chose SQLite and FTS5 over other solutions (like pruning or external caches) for three reasons:
	1. Mathematical Parity: SQL aggregations (`SUM`, `GROUP BY`) are mathematically identical to Python dictionary loops for Collaborative Filtering. No accuracy is sacrificed.
	2. Local Persistence: SQLite is a serverless file-based DB, making it perfect for Hugging Face Spaces where you want to minimize external dependencies.
	3. Stability: Disk-based lookups ensure that even if the dataset grows to 1M books, the memory footprint remains constant.

	## Conclusion
	This engineering overhaul transforms the Book Recommendation System into a production-ready application. It solves the OOM crisis and restores the full scientific capacity of the model—running more data on less hardware.