Spaces:

bt5153-books
/

README

Running

App Files Files Community

Yew Chong commited on May 5, 2024

Commit

6276d23

1 Parent(s): c6cc938

update readme

Browse files

Files changed (3) hide show

README.md +44 -31
raw-data/Books_rating.csv +0 -3
raw-data/books_data.csv +0 -3

README.md CHANGED Viewed

@@ -17,36 +17,17 @@ Link to files: [https://huggingface.co/spaces/bt5153-books/README/tree/main](htt
 Hello, and welcome to our books recommendation project for BT5153!
 # Project Directory
-## Front-end UI
-### Book Recommendation Ensemble Model Interface
-This interface generates recommendations, but only for a list of randomly sampled test users from our dataset.
-This interface was created on Python version 3.11.4, with requirements listed in `requirements.txt`.
-There may be some requirements missed, please install as needed.
-All sub-models and the final ensemble classifier model were trained in advance. They are included inside the Data folder.
-All data used for live recommendation is in the Data folder. Since the Data folder is too large to be submitted, we will submit a representative subset of the data.
-### To start the UI:
-**NOTE: Please only run this with the full dataset from [this git repository](https://huggingface.co/spaces/bt5153-books/README/tree/main)!!** If not, there will not be any results...
-Start the interface with `python -m flask run`.
-If for some reason app does not start, try running `python app.py`.
-Server should be running on `127.0.0.1:5000`
 ## Source Code
-Codes are stored under `./Books` as `.ipynb` files, and named according to the order they should be run.
 ## Data
-Data used for the project is stored in `./Data`.
-Raw data, retrieved from the Goodreads dataset [here](https://mengtingwan.github.io/data/goodreads.html), can be found under `./raw-data`.
-For our submission, we have created a representative subset of our dataset to be included in the zip submission, and can be found in `./Data-sub`.
 # To run our project in Windows:
@@ -60,10 +41,10 @@ Run these commands:
 All python notebooks can be found in the subdirectory `./Books/`.
 ## Data preprocessing
-Run all cells in the file `1_data_split.ipynb`.
 ## Generating recommendations
-Run all cells in the following files:
 * `2.1_users_similarity.ipynb`
 * `2.2_reviews_LDA.ipynb`
 * `2.3_description_s2v.ipynb`
@@ -71,18 +52,50 @@ Run all cells in the following files:
 * `2.5_titles_bge_faiss.ipynb`
 * `2.6_book_clustering.ipynb`
-Then, run the following file to generate recommendation for users:
 `3_book_to_user_converer.ipynb`
 ## Ensemble model
-Run this file: `4_ensemble_final.ipynb`
 ----------------------------------------------------------------
-# Project Description
-In response to the overwhelming number of book choices online, which often leads to decision paralysis and wasted time, we propose the implementation of a Natural Language Processing (NLP) powered recommendation system to address this challenge.
-For full project description, see the report file in submission.
 ## Members:
 * Ang Kai En (A0221945E)

 Hello, and welcome to our books recommendation project for BT5153!
 # Project Directory
 ## Source Code
+Model codes are stored under `./Books` as `.ipynb` files, and named according to the order they should be run.
+User interface codes for Flask are stored in the root `./` directory, and the html files can be found under `./templates`.
 ## Data
+All data used for the project is stored in `./Data` in the [Huggingface repository](https://huggingface.co/spaces/bt5153-books/README/tree/main).
+***WARNING: This huggingface repository is over 18GB, ensure that you have sufficient space on the disk before cloning***
+For our submission, we have created a representative subset of our dataset to be included in the zip submission, and can be found in `data05.zip` in the accompanying files. These sample subset files can also be found in the [Github repository](https://github.com/lyncsghrk/BT5153-Books), under the directory `./Data/Books/final_dataset`.
 # To run our project in Windows:
 All python notebooks can be found in the subdirectory `./Books/`.
 ## Data preprocessing
+***Run all cells in the file `1_data_split.ipynb`.***
 ## Generating recommendations
+***Run all cells in the following files:***
 * `2.1_users_similarity.ipynb`
 * `2.2_reviews_LDA.ipynb`
 * `2.3_description_s2v.ipynb`
 * `2.5_titles_bge_faiss.ipynb`
 * `2.6_book_clustering.ipynb`
+Do note that some notebooks may take up to a few hours to complete.
+The recommendations have been saved and stored under the directory `./Data/Books/Recommend Storage`, in numpy arrays.
+***Then, run all cells in the following file to generate recommendation for users:***
 `3_book_to_user_converer.ipynb`
 ## Ensemble model
+Run all cells in this file: `4_ensemble_final.ipynb`
+# Front-end UI
+## Book Recommendation Ensemble Model Interface
+This interface generates recommendations, but only for a list of randomly sampled test users from our dataset.
+This interface was created on Python version 3.11.4, with requirements listed in `requirements.txt`.
+There may be some requirements missed, please install as needed.
+All sub-models and the final ensemble classifier model were trained in advance. They are included inside the Data folder.
+All data used for live recommendation is in the Data folder. Since the Data folder is too large to be submitted, we will submit a representative subset of the data.
+## To start the UI:
+**NOTE: Please only run this with the full dataset from [this git repository](https://huggingface.co/spaces/bt5153-books/README/tree/main)!!** If not, an error will occur and there will not be any results...
+Start the interface with `python -m flask run`.
+If for some reason app does not start, try running `python app.py`.
+Server should be running on `127.0.0.1:5000`
 ----------------------------------------------------------------
+# Project Abstract
+This report presents an enhanced book recommendation system to improve recommendation precision. By integrating unstructured text data and diverse data sources, the proposed system offers more robust recommendations tailored to individual users, ultimately improving user retention rates.
+Utilizing a Goodreads dataset from 2017, comprising book information, user-tagged genres, and user reviews, we built and trained six distinct models based on different data types. These models were then combined into an ensemble logistic regression model, outperforming individual models in precision and exhibiting higher F1 scores in binary classification for book recommendations.
+While the ensemble model requires more computational resources than the user similarity model, it effectively mitigates popularity bias, a common issue in naive recommendation systems. Finally, the system's user interface, developed with Flask, offers transparent recommendations with explainability graphs, enhancing user trust and experience.
+Overall, the enhanced book recommendation system shows promising results and has the potential to outperform the naive user similarity model with further data refinement and model training.
+*For full project description, see the report file in submission.*
 ## Members:
 * Ang Kai En (A0221945E)

raw-data/Books_rating.csv DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:8aa5e49915bafa73b8dca93e05e458d2f130dc53bebf8c9ef1fa111964df67da
-size 2859504349

raw-data/books_data.csv DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:5c6fabf86f31cb78aee2ff82839c1e8bf31b048df3ae7b159f3eb7842eb4ee6b
-size 181348853