two_tower_recsys / README.md
minhajHP's picture
Updated README
0523b69
metadata
title: RecSys-HP
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 8000

RecSys-HP: Two-Tower Recommendation System

A production-ready recommendation system implementation using TensorFlow with an enhanced two-tower architecture. This system provides personalized item recommendations through collaborative filtering, content-based filtering, category-boosted recommendations, and hybrid approaches, featuring advanced training strategies.

πŸš€ Features

  • πŸ—οΈ Enhanced Two-Tower Architecture: 128D embeddings with temperature scaling and attention mechanisms
  • 🎯 Multiple Recommendation Engines:
    • Raw Two-Tower (Collaborative Filtering)
    • Content-Based Filtering
    • Hybrid Recommendations
    • Category-Boosted Recommendations
  • ⚑ Fast Inference: FAISS-powered similarity search with sub-100ms response times
  • 🎨 Interactive Frontend: React-based web interface with real-time recommendations
  • πŸ“Š Category Analysis: Intelligent category preference analysis and visualization
  • πŸ”„ Real User Profiles: Browse genuine user interaction histories
  • πŸŽͺ Category-Aware Similarity: 60/40 category split for balanced discovery

πŸ“ Project Structure

RecSys-HP/
β”œβ”€β”€ πŸ“Š datasets/                           # Training and validation data
β”‚   β”œβ”€β”€ items.csv                         # Product catalog (19K+ items)
β”‚   β”œβ”€β”€ users.csv                         # User profiles
β”‚   └── interactions.csv                  # User-item interaction logs
β”‚
β”œβ”€β”€ 🧠 src/                               # Core ML implementation
β”‚   β”œβ”€β”€ models/                           # Neural network architectures
β”‚   β”‚   β”œβ”€β”€ item_tower.py                # Original item embedding tower
β”‚   β”‚   β”œβ”€β”€ user_tower.py                # User embedding tower
β”‚   β”‚   └── improved_two_tower.py        # Advanced two-tower with improvements
β”‚   β”‚
β”‚   β”œβ”€β”€ preprocessing/                    # Data preparation pipeline
β”‚   β”‚   β”œβ”€β”€ data_loader.py               # Dataset loading and validation
β”‚   β”‚   └── user_data_preparation.py     # User feature engineering
β”‚   β”‚
β”‚   β”œβ”€β”€ training/                         # Model training pipeline
β”‚   β”‚   β”œβ”€β”€ item_pretraining.py          # Item tower pretraining
β”‚   β”‚   └── joint_training.py            # Joint user-item training
β”‚   β”‚
β”‚   β”œβ”€β”€ inference/                        # Recommendation engines
β”‚   β”‚   β”œβ”€β”€ recommendation_engine.py      # Main recommendation engine
β”‚   β”‚   └── faiss_index.py               # FAISS similarity search
β”‚   β”‚
β”‚   └── artifacts/                        # Trained models & indices
β”‚       β”œβ”€β”€ vocabularies.pkl              # Feature vocabularies
β”‚       β”œβ”€β”€ *_weights.*                   # Model weights
β”‚       └── faiss_*                       # FAISS index files
β”‚
β”œβ”€β”€ 🎨 frontend/                          # React web interface
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.js                       # Main React component
β”‚   β”‚   └── App.css                      # Styling
β”‚   └── build/                           # Production build
β”‚
β”œβ”€β”€ πŸ”— api/                              # FastAPI backend
β”‚   └── main.py                          # API server with static file serving
β”‚
└── πŸ“š Configuration
    β”œβ”€β”€ requirements.txt                  # Python dependencies
    β”œβ”€β”€ Dockerfile                       # Container configuration
    └── docker-build.md                 # Deployment guide

🎯 Recommendation Strategies

1. Raw Two-Tower (Collaborative Filtering) πŸ‘₯

  • Features: 128D embeddings, category boosting (1.6x), FAISS similarity search
  • Strengths: Superior personalization with behavioral signal focus
  • Algorithm: Two-tower neural collaborative filtering with category awareness

2. Content-Based Recommendations πŸ“‹

  • Method: Aggregated user history embedding with weighted mean pooling
  • Features: FAISS similarity search on aggregated user preferences
  • Benefits: Works for users with interaction history, fast inference

3. Hybrid Approach πŸ”—

  • Method: Weighted combination of collaborative and content-based
  • Features: Configurable weight mixing (default 70% collaborative, 30% content)
  • Benefits: Best of both approaches with balanced coverage
  • Algorithm: Score-based weighted combination

4. Category-Boosted Recommendations πŸŽͺ

  • Method: Intelligent category preference learning and boosting
  • Features: Dynamic category analysis from user interaction patterns
  • Benefits: Maintains user preferences while enabling discovery

πŸ”¬ Technical Deep Dive

Enhanced Training Process

Two-Phase Training Strategy

  1. Item Pretraining: Self-supervised learning on item features
  2. Joint Training: User-item interaction learning with contrastive loss

Architecture Improvements

  • User Tower: Demographics + 50-slot interaction history with attention
  • Item Tower: Optimized embeddings with smart dimensionality
  • Training: Contrastive learning with positive/negative pairs

πŸš€ Getting Started

The application runs automatically in this Hugging Face Space! The system includes:

  • Interactive Web Interface: Browse users, generate recommendations, analyze categories
  • Multiple Recommendation Types: Try different algorithms
  • Real User Data: Explore genuine user interaction patterns
  • Performance Monitoring: Real-time API response tracking

API Endpoints

Method Endpoint Description Features
GET / Web Interface Interactive React app
POST /recommendations Personalized recommendations Multi-strategy (collaborative/content/hybrid)
POST /item-similarity Category-balanced similar items 60% same category + ANN search
GET /real-users Browse real user profiles Genuine interaction histories
GET /health System health check API status monitoring

πŸ“Š Project Achievements

βœ… Enhanced Architecture: 128D embeddings, temperature scaling, contrastive learning
βœ… Category-Aware Recommendations: Intelligent personalization with diversity
βœ… Content-Based Filtering: Revolutionary user history aggregation approach
βœ… Enhanced Cold-Start Support: Improved new user handling
βœ… Production Ready: Scalable API with enhanced frontend features

πŸ”§ Advanced Configuration

Model Parameters

  • Embedding Dimension: 128 (upgraded from 64)
  • Hidden Layers: [256, 128] for both towers
  • Dropout Rate: 0.3 (increased for regularization)
  • Attention Heads: 8 (user tower), 4 (item tower)
  • Temperature Scaling: Learnable parameter (initial: 1.0)
  • Max History Length: 50 interactions per user

Training Configuration

  • Hard Negative Mining: Top-3 hardest negatives
  • Contrastive Weight: 0.3 (configurable)
  • Focal Loss: Alpha=0.25, Gamma=2.0

πŸš€ Production Deployment

Performance Optimizations

  • Two-Tower Architecture: Separates user and item processing for scalability
  • FAISS Integration: Sub-linear similarity search
  • Batch Inference: Vectorized computation for multiple users
  • Model Versioning: Support for A/B testing different model variants

πŸŽ‰ Ready to deliver next-generation personalized recommendations!