Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
title: Book Recommendation System
emoji: ๐
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 3.37.0
app_file: app.py
pinned: false
๐ Content-Based Book Recommendation System ๐
This is a content-based book recommendation system that recommends books ๐๐ similar to an input book title based on the similarity of book summaries. The system uses TF-IDF (๐ Term Frequency-Inverse Document Frequency) and Cosine Similarity ๐งฎ to compare books and find the most relevant recommendations. It provides a user-friendly interface built with Gradio ๐ป, where users can enter a book title and get recommendations.
๐ Project Structure
.
โโโ app.py # ๐ Main script that runs the app
โโโ utils.py # ๐ ๏ธ Helper functions (data loading, model loading)
โโโ data/
โ โโโ books_summary.csv # ๐ Actual dataset
โ โโโ cleaned_books_summary.csv # ๐งน Preprocessed dataset
โโโ model/
โ โโโ tfidf_vectorizer.pkl # ๐ค Pre-trained TF-IDF vectorizer
โ โโโ tfidf_matrix.pkl # ๐๏ธ Pre-calculated TF-IDF matrix
โโโ src/ # ๐ฆ Source code folder
โ โโโ data_loader.py # ๐ฅ Module to load and preprocess data
โ โโโ feature_engineering.py # ๐งฌ Module to create TF-IDF/embedding vectors
โ โโโ similarity_calculator.py # ๐งฎ Module to calculate similarity matrix
โ โโโ recommender.py # ๐ Main logic to generate recommendations
โ โโโ utils.py # โ๏ธ Utility functions (e.g., cleaning text)
โโโ requirements.txt # ๐ List of Python dependencies
โโโ README.md # ๐ Project overview and setup instructions
๐ Features
- ๐ Book Recommendation: Enter a book title, and the system will recommend the top 5 similar books based on their summaries.
- ๐ท๏ธ Categorization: Each recommended book displays its categories as clickable buttons for better user experience.
- ๐ป Interactive UI: Simple and clean interface using Gradio.
- ๐ง Modular Code: Functions for data loading, preprocessing, model training, and similarity calculation are separated into different files.
๐ป Technologies Used
- ๐ Python: Core language used to build the system.
- ๐ป Gradio: For creating a web-based user interface.
- ๐ Scikit-learn: For TF-IDF Vectorization and Cosine Similarity calculation.
- ๐๏ธ Pandas: For data manipulation and preprocessing.
- ๐ข NumPy: For numerical operations.
โ๏ธ Setup Instructions
1. ๐งฌ Clone the Repository
git clone https://github.com/ajayansaroj17/book_title_recommender.git
cd book_title_recommender
2. ๐ฆ Install Dependencies
Make sure you have Python 3.7+ installed. Then, create a virtual environment and install the required libraries:
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
pip install -r requirements.txt
3. ๐ Download or Prepare the Dataset
The dataset should contain columns for book_name, summaries, and categories. Store the preprocessed dataset as cleaned_books_summary.csv in the data/ folder.
4. ๐๏ธโโ๏ธ Pre-train the Model
Run the following script to train the TF-IDF model and create the TF-IDF matrix:
python train_tfidf_model.py
This will:
- ๐ง Train the TF-IDF vectorizer on the summaries column.
- ๐๏ธ Create the TF-IDF matrix for all books.
- ๐พ Save the trained model and matrix as
model/tfidf_vectorizer.pklandmodel/tfidf_matrix.pkl.
5. ๐ Run the Application
Launch the Gradio-based web interface by running:
python app.py
The application will open in your browser, allowing you to enter a book title and receive recommendations.
๐ค How the System Works
- ๐ค User Input: The user enters a book title in the input field.
- ๐ Recommendation Logic:
- The system searches for the input book in the dataset.
- It calculates the TF-IDF vector of the input book's summary and compares it with the summaries of all other books using cosine similarity.
- The top 5 books with the highest similarity scores are returned.
- ๐ Output: Recommendations are displayed, including book titles, summaries, and categories as clickable buttons.
๐ File Descriptions
- app.py: ๐ Main script launching the Gradio UI and handling book recommendations.
- utils.py: ๐ ๏ธ Helper functions for loading models, data preprocessing, and utilities.
- feature_engineering.py: ๐งฌ Trains the TF-IDF model and creates the TF-IDF matrix.
- data/cleaned_books_summary.csv: ๐ Cleaned dataset used for training.
- model/tfidf_vectorizer.pkl and model/tfidf_matrix.pkl: ๐ค Pre-trained TF-IDF model and matrix.
๐ฆ Dependencies
Install the following Python packages using:
pip install -r requirements.txt
- ๐ป gradio: For the web interface.
- ๐ sklearn: For TF-IDF and cosine similarity calculations.
- ๐๏ธ pandas: For data manipulation.
- ๐ข numpy: For numerical operations.
๐ Potential Extensions and Improvements
- ๐ท๏ธ Category-Based Filtering: Filter recommendations by specific categories.
- ๐ค Advanced NLP Techniques: Use embeddings like Word2Vec, GloVe, or transformer-based models like BERT.
- ๐ฅ Personalization: Implement a user profiling system for personalized recommendations.
- โก Scalability: Use Approximate Nearest Neighbors (ANN) for faster similarity calculation on large datasets.
๐ Conclusion
This project demonstrates building a content-based book recommendation system using TF-IDF and cosine similarity. The modular design ensures easy maintenance and extension, while Gradio simplifies deployment and user interaction!๐