DevnilMaster1's picture
Update README.md
3d46772 verified
---
title: Bangla Book Recommender
emoji: πŸ“š
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
license: cc-by-nc-4.0
short_description: Cold-start recommendations from 127K Bangla books
tags:
- recommendation
- bangla
- lightgcn
- two-tower
---
# πŸ“š Bangla Book Recommender
A cold-start book recommendation interface for the **RokomariBG** dataset β€”
the first large-scale multi-entity heterogeneous graph dataset for Bangla book
recommendation, built from publicly scraped reviews and metadata on
[Rokomari.com](https://rokomari.com), Bangladesh's largest online bookstore.
The Space lets a visitor pick a few books they have enjoyed and receive
recommendations from either of two benchmarked models from the paper:
| Model | Notes |
|---|---|
| **Neural Two-Tower** | Best benchmarked model. Item tower fuses ID, content (title, summary, author, publisher), and metadata (price, rating, pages); user tower combines a user-ID embedding with pooled history. Strongest at cold-start. Covers all 127K books. |
| **LightGCN** | Pure graph collaborative filtering with 4 GCN layers and uniform layer averaging. Trained on the 16K-book subset with sufficient interaction history. |
## How recommendation works (cold-start mode)
A new visitor has no `user_id` in the trained models. So instead of looking up
a stored user embedding, the Space derives one on the fly:
1. The visitor selects **N** books they have enjoyed (Bangla or English).
2. Each model has a precomputed **book embedding** for every book in its
training catalogue.
3. The visitor's **taste vector** is computed as the L2-normalised mean of the
selected books' embeddings.
4. **Cosine similarity** is computed between the taste vector and every book
embedding. The seed books are masked out, and the top-K nearest books are
returned.
5. Liked recommendations can be **promoted into the seed set**, after which
re-running yields progressively more personalised results.
The pipeline is pure NumPy at inference time β€” no model is loaded at runtime,
only the precomputed embeddings β€” so the Space is fast and cheap to host on
the free CPU tier.
## File layout
All artefacts live in the Space root:
```
.
β”œβ”€β”€ app.py # Gradio app
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md # this file
β”œβ”€β”€ books_metadata.parquet # title, author, category, rating, summary, url
β”œβ”€β”€ two_tower_book_emb.npy # (N, 256) float32, L2-normalisable
β”œβ”€β”€ two_tower_book_ids.npy # (N,) book_id strings, row-aligned
β”œβ”€β”€ lightgcn_book_emb.npy # (16266, 64) float32, L2-normalisable
└── lightgcn_book_ids.npy # (16266,) book_id strings, row-aligned
```
## Citation
```bibtex
@misc{ahmed2026personalizedbanglabookrecommendation,
title = {Towards Personalized Bangla Book Recommendation:
A Large-Scale Multi-Entity Book Graph Dataset},
author = {Rahin Arefin Ahmed and Md. Anik Chowdhury and
Sakil Ahmed Sheikh Reza and Devnil Bhattacharjee and
Muhammad Abdullah Adnan and Nafis Sadeq},
year = {2026},
eprint = {2602.12129},
archivePrefix = {arXiv},
primaryClass = {cs.IR},
url = {https://arxiv.org/abs/2602.12129}
}
```
## Links
- πŸ“„ Paper: <https://arxiv.org/abs/2602.12129>
- πŸ’Ύ Dataset: <https://huggingface.co/datasets/DevnilMaster1/Bangla-Book-Recommendation-Dataset>
- πŸ™ Code: <https://github.com/DevnilMaster/Bangla-Book-Recommendation-Dataset>
License: **CC BY-NC 4.0** (matches the dataset licence)