| --- |
| title: Bangla Book Recommender |
| emoji: π |
| colorFrom: purple |
| colorTo: blue |
| sdk: gradio |
| sdk_version: 5.45.0 |
| app_file: app.py |
| pinned: false |
| license: cc-by-nc-4.0 |
| short_description: Cold-start recommendations from 127K Bangla books |
| tags: |
| - recommendation |
| - bangla |
| - lightgcn |
| - two-tower |
| --- |
| |
| # π Bangla Book Recommender |
|
|
| A cold-start book recommendation interface for the **RokomariBG** dataset β |
| the first large-scale multi-entity heterogeneous graph dataset for Bangla book |
| recommendation, built from publicly scraped reviews and metadata on |
| [Rokomari.com](https://rokomari.com), Bangladesh's largest online bookstore. |
|
|
| The Space lets a visitor pick a few books they have enjoyed and receive |
| recommendations from either of two benchmarked models from the paper: |
|
|
| | Model | Notes | |
| |---|---| |
| | **Neural Two-Tower** | Best benchmarked model. Item tower fuses ID, content (title, summary, author, publisher), and metadata (price, rating, pages); user tower combines a user-ID embedding with pooled history. Strongest at cold-start. Covers all 127K books. | |
| | **LightGCN** | Pure graph collaborative filtering with 4 GCN layers and uniform layer averaging. Trained on the 16K-book subset with sufficient interaction history. | |
|
|
| ## How recommendation works (cold-start mode) |
|
|
| A new visitor has no `user_id` in the trained models. So instead of looking up |
| a stored user embedding, the Space derives one on the fly: |
|
|
| 1. The visitor selects **N** books they have enjoyed (Bangla or English). |
| 2. Each model has a precomputed **book embedding** for every book in its |
| training catalogue. |
| 3. The visitor's **taste vector** is computed as the L2-normalised mean of the |
| selected books' embeddings. |
| 4. **Cosine similarity** is computed between the taste vector and every book |
| embedding. The seed books are masked out, and the top-K nearest books are |
| returned. |
| 5. Liked recommendations can be **promoted into the seed set**, after which |
| re-running yields progressively more personalised results. |
|
|
| The pipeline is pure NumPy at inference time β no model is loaded at runtime, |
| only the precomputed embeddings β so the Space is fast and cheap to host on |
| the free CPU tier. |
|
|
| ## File layout |
|
|
| All artefacts live in the Space root: |
|
|
| ``` |
| . |
| βββ app.py # Gradio app |
| βββ requirements.txt |
| βββ README.md # this file |
| βββ books_metadata.parquet # title, author, category, rating, summary, url |
| βββ two_tower_book_emb.npy # (N, 256) float32, L2-normalisable |
| βββ two_tower_book_ids.npy # (N,) book_id strings, row-aligned |
| βββ lightgcn_book_emb.npy # (16266, 64) float32, L2-normalisable |
| βββ lightgcn_book_ids.npy # (16266,) book_id strings, row-aligned |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{ahmed2026personalizedbanglabookrecommendation, |
| title = {Towards Personalized Bangla Book Recommendation: |
| A Large-Scale Multi-Entity Book Graph Dataset}, |
| author = {Rahin Arefin Ahmed and Md. Anik Chowdhury and |
| Sakil Ahmed Sheikh Reza and Devnil Bhattacharjee and |
| Muhammad Abdullah Adnan and Nafis Sadeq}, |
| year = {2026}, |
| eprint = {2602.12129}, |
| archivePrefix = {arXiv}, |
| primaryClass = {cs.IR}, |
| url = {https://arxiv.org/abs/2602.12129} |
| } |
| ``` |
|
|
| ## Links |
|
|
| - π Paper: <https://arxiv.org/abs/2602.12129> |
| - πΎ Dataset: <https://huggingface.co/datasets/DevnilMaster1/Bangla-Book-Recommendation-Dataset> |
| - π Code: <https://github.com/DevnilMaster/Bangla-Book-Recommendation-Dataset> |
|
|
| License: **CC BY-NC 4.0** (matches the dataset licence) |