--- title: Bangla Book Recommender emoji: 📚 colorFrom: purple colorTo: blue sdk: gradio sdk_version: 5.45.0 app_file: app.py pinned: false license: cc-by-nc-4.0 short_description: Cold-start recommendations from 127K Bangla books tags: - recommendation - bangla - lightgcn - two-tower --- # 📚 Bangla Book Recommender A cold-start book recommendation interface for the **RokomariBG** dataset — the first large-scale multi-entity heterogeneous graph dataset for Bangla book recommendation, built from publicly scraped reviews and metadata on [Rokomari.com](https://rokomari.com), Bangladesh's largest online bookstore. The Space lets a visitor pick a few books they have enjoyed and receive recommendations from either of two benchmarked models from the paper: | Model | Notes | |---|---| | **Neural Two-Tower** | Best benchmarked model. Item tower fuses ID, content (title, summary, author, publisher), and metadata (price, rating, pages); user tower combines a user-ID embedding with pooled history. Strongest at cold-start. Covers all 127K books. | | **LightGCN** | Pure graph collaborative filtering with 4 GCN layers and uniform layer averaging. Trained on the 16K-book subset with sufficient interaction history. | ## How recommendation works (cold-start mode) A new visitor has no `user_id` in the trained models. So instead of looking up a stored user embedding, the Space derives one on the fly: 1. The visitor selects **N** books they have enjoyed (Bangla or English). 2. Each model has a precomputed **book embedding** for every book in its training catalogue. 3. The visitor's **taste vector** is computed as the L2-normalised mean of the selected books' embeddings. 4. **Cosine similarity** is computed between the taste vector and every book embedding. The seed books are masked out, and the top-K nearest books are returned. 5. Liked recommendations can be **promoted into the seed set**, after which re-running yields progressively more personalised results. The pipeline is pure NumPy at inference time — no model is loaded at runtime, only the precomputed embeddings — so the Space is fast and cheap to host on the free CPU tier. ## File layout All artefacts live in the Space root: ``` . ├── app.py # Gradio app ├── requirements.txt ├── README.md # this file ├── books_metadata.parquet # title, author, category, rating, summary, url ├── two_tower_book_emb.npy # (N, 256) float32, L2-normalisable ├── two_tower_book_ids.npy # (N,) book_id strings, row-aligned ├── lightgcn_book_emb.npy # (16266, 64) float32, L2-normalisable └── lightgcn_book_ids.npy # (16266,) book_id strings, row-aligned ``` ## Citation ```bibtex @misc{ahmed2026personalizedbanglabookrecommendation, title = {Towards Personalized Bangla Book Recommendation: A Large-Scale Multi-Entity Book Graph Dataset}, author = {Rahin Arefin Ahmed and Md. Anik Chowdhury and Sakil Ahmed Sheikh Reza and Devnil Bhattacharjee and Muhammad Abdullah Adnan and Nafis Sadeq}, year = {2026}, eprint = {2602.12129}, archivePrefix = {arXiv}, primaryClass = {cs.IR}, url = {https://arxiv.org/abs/2602.12129} } ``` ## Links - 📄 Paper: - 💾 Dataset: - 🐙 Code: License: **CC BY-NC 4.0** (matches the dataset licence)