Ajayan's picture
Added Hugging Face Spaces configuration
47428ef

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: Book Recommendation System
emoji: ๐Ÿ“š
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 3.37.0
app_file: app.py
pinned: false

๐Ÿ“š Content-Based Book Recommendation System ๐Ÿ“–

This is a content-based book recommendation system that recommends books ๐Ÿ“•๐Ÿ“— similar to an input book title based on the similarity of book summaries. The system uses TF-IDF (๐Ÿ“Š Term Frequency-Inverse Document Frequency) and Cosine Similarity ๐Ÿงฎ to compare books and find the most relevant recommendations. It provides a user-friendly interface built with Gradio ๐Ÿ’ป, where users can enter a book title and get recommendations.

๐Ÿ“‚ Project Structure

.
โ”œโ”€โ”€ app.py                         # ๐Ÿš€ Main script that runs the app
โ”œโ”€โ”€ utils.py                       # ๐Ÿ› ๏ธ Helper functions (data loading, model loading)
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ books_summary.csv          # ๐Ÿ“‘ Actual dataset
โ”‚   โ”œโ”€โ”€ cleaned_books_summary.csv  # ๐Ÿงน Preprocessed dataset
โ”œโ”€โ”€ model/
โ”‚   โ”œโ”€โ”€ tfidf_vectorizer.pkl       # ๐Ÿค– Pre-trained TF-IDF vectorizer
โ”‚   โ”œโ”€โ”€ tfidf_matrix.pkl           # ๐Ÿ—‚๏ธ Pre-calculated TF-IDF matrix
โ”œโ”€โ”€ src/                           # ๐Ÿ“ฆ Source code folder
โ”‚   โ”œโ”€โ”€ data_loader.py             # ๐Ÿ“ฅ Module to load and preprocess data
โ”‚   โ”œโ”€โ”€ feature_engineering.py     # ๐Ÿงฌ Module to create TF-IDF/embedding vectors
โ”‚   โ”œโ”€โ”€ similarity_calculator.py   # ๐Ÿงฎ Module to calculate similarity matrix
โ”‚   โ”œโ”€โ”€ recommender.py             # ๐Ÿ“š Main logic to generate recommendations
โ”‚   โ”œโ”€โ”€ utils.py                   # โš™๏ธ Utility functions (e.g., cleaning text)
โ”œโ”€โ”€ requirements.txt               # ๐Ÿ“œ List of Python dependencies
โ””โ”€โ”€ README.md                      # ๐Ÿ“ Project overview and setup instructions

๐ŸŒŸ Features

  • ๐Ÿ“š Book Recommendation: Enter a book title, and the system will recommend the top 5 similar books based on their summaries.
  • ๐Ÿท๏ธ Categorization: Each recommended book displays its categories as clickable buttons for better user experience.
  • ๐Ÿ’ป Interactive UI: Simple and clean interface using Gradio.
  • ๐Ÿ”ง Modular Code: Functions for data loading, preprocessing, model training, and similarity calculation are separated into different files.

๐Ÿ’ป Technologies Used

  • ๐Ÿ Python: Core language used to build the system.
  • ๐Ÿ’ป Gradio: For creating a web-based user interface.
  • ๐Ÿ“Š Scikit-learn: For TF-IDF Vectorization and Cosine Similarity calculation.
  • ๐Ÿ—‚๏ธ Pandas: For data manipulation and preprocessing.
  • ๐Ÿ”ข NumPy: For numerical operations.

โš™๏ธ Setup Instructions

1. ๐Ÿงฌ Clone the Repository

git clone https://github.com/ajayansaroj17/book_title_recommender.git
cd book_title_recommender

2. ๐Ÿ“ฆ Install Dependencies

Make sure you have Python 3.7+ installed. Then, create a virtual environment and install the required libraries:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
pip install -r requirements.txt

3. ๐Ÿ“š Download or Prepare the Dataset

The dataset should contain columns for book_name, summaries, and categories. Store the preprocessed dataset as cleaned_books_summary.csv in the data/ folder.

4. ๐Ÿ‹๏ธโ€โ™‚๏ธ Pre-train the Model

Run the following script to train the TF-IDF model and create the TF-IDF matrix:

python train_tfidf_model.py

This will:

  • ๐Ÿง  Train the TF-IDF vectorizer on the summaries column.
  • ๐Ÿ—‚๏ธ Create the TF-IDF matrix for all books.
  • ๐Ÿ’พ Save the trained model and matrix as model/tfidf_vectorizer.pkl and model/tfidf_matrix.pkl.

5. ๐Ÿš€ Run the Application

Launch the Gradio-based web interface by running:

python app.py

The application will open in your browser, allowing you to enter a book title and receive recommendations.

๐Ÿค” How the System Works

  1. ๐Ÿ‘ค User Input: The user enters a book title in the input field.
  2. ๐Ÿ” Recommendation Logic:
    • The system searches for the input book in the dataset.
    • It calculates the TF-IDF vector of the input book's summary and compares it with the summaries of all other books using cosine similarity.
    • The top 5 books with the highest similarity scores are returned.
  3. ๐Ÿ“Š Output: Recommendations are displayed, including book titles, summaries, and categories as clickable buttons.

๐Ÿ“„ File Descriptions

  • app.py: ๐Ÿš€ Main script launching the Gradio UI and handling book recommendations.
  • utils.py: ๐Ÿ› ๏ธ Helper functions for loading models, data preprocessing, and utilities.
  • feature_engineering.py: ๐Ÿงฌ Trains the TF-IDF model and creates the TF-IDF matrix.
  • data/cleaned_books_summary.csv: ๐Ÿ“š Cleaned dataset used for training.
  • model/tfidf_vectorizer.pkl and model/tfidf_matrix.pkl: ๐Ÿค– Pre-trained TF-IDF model and matrix.

๐Ÿ“ฆ Dependencies

Install the following Python packages using:

pip install -r requirements.txt
  • ๐Ÿ’ป gradio: For the web interface.
  • ๐Ÿ“Š sklearn: For TF-IDF and cosine similarity calculations.
  • ๐Ÿ—‚๏ธ pandas: For data manipulation.
  • ๐Ÿ”ข numpy: For numerical operations.

๐Ÿš€ Potential Extensions and Improvements

  • ๐Ÿท๏ธ Category-Based Filtering: Filter recommendations by specific categories.
  • ๐Ÿค– Advanced NLP Techniques: Use embeddings like Word2Vec, GloVe, or transformer-based models like BERT.
  • ๐Ÿ‘ฅ Personalization: Implement a user profiling system for personalized recommendations.
  • โšก Scalability: Use Approximate Nearest Neighbors (ANN) for faster similarity calculation on large datasets.

๐Ÿ Conclusion

This project demonstrates building a content-based book recommendation system using TF-IDF and cosine similarity. The modular design ensures easy maintenance and extension, while Gradio simplifies deployment and user interaction!๐Ÿš€