Yew Chong commited on
Commit
6276d23
·
1 Parent(s): c6cc938

update readme

Browse files
Files changed (3) hide show
  1. README.md +44 -31
  2. raw-data/Books_rating.csv +0 -3
  3. raw-data/books_data.csv +0 -3
README.md CHANGED
@@ -17,36 +17,17 @@ Link to files: [https://huggingface.co/spaces/bt5153-books/README/tree/main](htt
17
  Hello, and welcome to our books recommendation project for BT5153!
18
 
19
  # Project Directory
20
- ## Front-end UI
21
- ### Book Recommendation Ensemble Model Interface
22
-
23
- This interface generates recommendations, but only for a list of randomly sampled test users from our dataset.
24
-
25
- This interface was created on Python version 3.11.4, with requirements listed in `requirements.txt`.
26
- There may be some requirements missed, please install as needed.
27
-
28
- All sub-models and the final ensemble classifier model were trained in advance. They are included inside the Data folder.
29
-
30
- All data used for live recommendation is in the Data folder. Since the Data folder is too large to be submitted, we will submit a representative subset of the data.
31
-
32
- ### To start the UI:
33
- **NOTE: Please only run this with the full dataset from [this git repository](https://huggingface.co/spaces/bt5153-books/README/tree/main)!!** If not, there will not be any results...
34
-
35
- Start the interface with `python -m flask run`.
36
-
37
- If for some reason app does not start, try running `python app.py`.
38
-
39
- Server should be running on `127.0.0.1:5000`
40
 
41
  ## Source Code
42
- Codes are stored under `./Books` as `.ipynb` files, and named according to the order they should be run.
 
43
 
44
  ## Data
45
- Data used for the project is stored in `./Data`.
46
 
47
- Raw data, retrieved from the Goodreads dataset [here](https://mengtingwan.github.io/data/goodreads.html), can be found under `./raw-data`.
48
 
49
- For our submission, we have created a representative subset of our dataset to be included in the zip submission, and can be found in `./Data-sub`.
50
 
51
  # To run our project in Windows:
52
 
@@ -60,10 +41,10 @@ Run these commands:
60
  All python notebooks can be found in the subdirectory `./Books/`.
61
 
62
  ## Data preprocessing
63
- Run all cells in the file `1_data_split.ipynb`.
64
 
65
  ## Generating recommendations
66
- Run all cells in the following files:
67
  * `2.1_users_similarity.ipynb`
68
  * `2.2_reviews_LDA.ipynb`
69
  * `2.3_description_s2v.ipynb`
@@ -71,18 +52,50 @@ Run all cells in the following files:
71
  * `2.5_titles_bge_faiss.ipynb`
72
  * `2.6_book_clustering.ipynb`
73
 
74
- Then, run the following file to generate recommendation for users:
 
 
 
75
  `3_book_to_user_converer.ipynb`
76
 
77
  ## Ensemble model
78
- Run this file: `4_ensemble_final.ipynb`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
  ----------------------------------------------------------------
81
 
82
- # Project Description
83
- In response to the overwhelming number of book choices online, which often leads to decision paralysis and wasted time, we propose the implementation of a Natural Language Processing (NLP) powered recommendation system to address this challenge.
 
 
 
 
 
 
84
 
85
- For full project description, see the report file in submission.
86
 
87
  ## Members:
88
  * Ang Kai En (A0221945E)
 
17
  Hello, and welcome to our books recommendation project for BT5153!
18
 
19
  # Project Directory
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  ## Source Code
22
+ Model codes are stored under `./Books` as `.ipynb` files, and named according to the order they should be run.
23
+ User interface codes for Flask are stored in the root `./` directory, and the html files can be found under `./templates`.
24
 
25
  ## Data
26
+ All data used for the project is stored in `./Data` in the [Huggingface repository](https://huggingface.co/spaces/bt5153-books/README/tree/main).
27
 
28
+ ***WARNING: This huggingface repository is over 18GB, ensure that you have sufficient space on the disk before cloning***
29
 
30
+ For our submission, we have created a representative subset of our dataset to be included in the zip submission, and can be found in `data05.zip` in the accompanying files. These sample subset files can also be found in the [Github repository](https://github.com/lyncsghrk/BT5153-Books), under the directory `./Data/Books/final_dataset`.
31
 
32
  # To run our project in Windows:
33
 
 
41
  All python notebooks can be found in the subdirectory `./Books/`.
42
 
43
  ## Data preprocessing
44
+ ***Run all cells in the file `1_data_split.ipynb`.***
45
 
46
  ## Generating recommendations
47
+ ***Run all cells in the following files:***
48
  * `2.1_users_similarity.ipynb`
49
  * `2.2_reviews_LDA.ipynb`
50
  * `2.3_description_s2v.ipynb`
 
52
  * `2.5_titles_bge_faiss.ipynb`
53
  * `2.6_book_clustering.ipynb`
54
 
55
+ Do note that some notebooks may take up to a few hours to complete.
56
+ The recommendations have been saved and stored under the directory `./Data/Books/Recommend Storage`, in numpy arrays.
57
+
58
+ ***Then, run all cells in the following file to generate recommendation for users:***
59
  `3_book_to_user_converer.ipynb`
60
 
61
  ## Ensemble model
62
+ Run all cells in this file: `4_ensemble_final.ipynb`
63
+
64
+
65
+ # Front-end UI
66
+ ## Book Recommendation Ensemble Model Interface
67
+
68
+ This interface generates recommendations, but only for a list of randomly sampled test users from our dataset.
69
+
70
+ This interface was created on Python version 3.11.4, with requirements listed in `requirements.txt`.
71
+ There may be some requirements missed, please install as needed.
72
+
73
+ All sub-models and the final ensemble classifier model were trained in advance. They are included inside the Data folder.
74
+
75
+ All data used for live recommendation is in the Data folder. Since the Data folder is too large to be submitted, we will submit a representative subset of the data.
76
+
77
+ ## To start the UI:
78
+ **NOTE: Please only run this with the full dataset from [this git repository](https://huggingface.co/spaces/bt5153-books/README/tree/main)!!** If not, an error will occur and there will not be any results...
79
+
80
+ Start the interface with `python -m flask run`.
81
+
82
+ If for some reason app does not start, try running `python app.py`.
83
+
84
+ Server should be running on `127.0.0.1:5000`
85
+
86
 
87
  ----------------------------------------------------------------
88
 
89
+ # Project Abstract
90
+ This report presents an enhanced book recommendation system to improve recommendation precision. By integrating unstructured text data and diverse data sources, the proposed system offers more robust recommendations tailored to individual users, ultimately improving user retention rates.
91
+
92
+ Utilizing a Goodreads dataset from 2017, comprising book information, user-tagged genres, and user reviews, we built and trained six distinct models based on different data types. These models were then combined into an ensemble logistic regression model, outperforming individual models in precision and exhibiting higher F1 scores in binary classification for book recommendations.
93
+
94
+ While the ensemble model requires more computational resources than the user similarity model, it effectively mitigates popularity bias, a common issue in naive recommendation systems. Finally, the system's user interface, developed with Flask, offers transparent recommendations with explainability graphs, enhancing user trust and experience.
95
+
96
+ Overall, the enhanced book recommendation system shows promising results and has the potential to outperform the naive user similarity model with further data refinement and model training.
97
 
98
+ *For full project description, see the report file in submission.*
99
 
100
  ## Members:
101
  * Ang Kai En (A0221945E)
raw-data/Books_rating.csv DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:8aa5e49915bafa73b8dca93e05e458d2f130dc53bebf8c9ef1fa111964df67da
3
- size 2859504349
 
 
 
 
raw-data/books_data.csv DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:5c6fabf86f31cb78aee2ff82839c1e8bf31b048df3ae7b159f3eb7842eb4ee6b
3
- size 181348853