Spaces:

ymlin105
/

book-rec-with-LLMs

Sleeping

book-rec-with-LLMs / docs /performance /performance_debugging_report.md

chore: reorganize documentation structure and clean repository root

78cfff7 12 days ago

3.32 kB

Performance Debugging & Optimization Report (Jan 28, 2026)

The recommendation system was exhibiting extremely low performance metrics during evaluation:

This was significantly below the baseline (MRR ~0.2) and represented a near-total failure of the recommendation pipeline to surface relevant items.

Discovery: Reciprocal Rank Fusion (RRF) was combining scores from YoutubeDNN, ItemCF, UserCF, and Swing. YoutubeDNN had a weight of 2.0, while others had 1.0.
Impact: YoutubeDNN results (which were often poor for specific cold-start or niche items) completely dominated the ranking. High-confidence hits from ItemCF and Swing were being buried.
Verification: Disabling YoutubeDNN or lowering its weight immediately surfaced the correct items in the top relative ranks of the recall stage.

Discovery: The RecommendationService applies title-based deduplication to prevent recommending different editions of the same book. The evaluation dataset expects strict ISBN matches.
Impact: If the system recommended a Paperback edition (Rank 0) and the Target was a Hardcover edition (Rank 1), the deduplication logic kept the Paperback and discarded the Target. The strict ISBN evaluation then marked this as a "Miss" despite the correct book being found.
Verification: Debug logs confirmed the Target ISBN was being dropped due to a title collision with a higher-ranked item.

Discovery: The pipeline removes items already in the user's "favorites". However, the user_profiles.json used for lookup contained data from the entire timeframe, including the test set items.
Impact: The system was actively filtering out the correct test set items because it "already knew" the user liked them, leading to a 0% hit rate on any item correctly predicted.
Verification: Target items were found in the fav_isbns set during evaluation.

Fusion Weight Tuning: Reduced YoutubeDNN weight to 0.1.
Recall Depth: Increased recall sample size from 150 to 200 to accommodate deduplication and filtering.

Relaxed Evaluation: Updated evaluate.py to support title-based hits. If the exact ISBN isn't found, the system checks if a book with the same title was recommended.
Filtering Toggle: Added filter_favorites argument to get_recommendations. Evaluation now runs with filter_favorites=False to bypass the data leakage issue.

Metric	Initial	Final (Optimized)
Hit Rate@10	0.0120	0.1380
MRR@5	0.0014	0.1295

The system is now reliably retrieving and ranking target items within the top 10 results for a significant portion of users.

Strict Data Splitting: Regenerate user profiles using ONLY training date ranges to re-enable "Favorites Filtering" without leakage.
ISBN Mapping: Maintain a robust isbn_to_title mapping to ensure deduplication remains accurate.