Spaces:

DevnilMaster1
/

Bangla-Book-Recommender

Running

App Files Files Community

Bangla-Book-Recommender / README.md

DevnilMaster1

Update README.md

3d46772 verified 26 days ago

preview code

raw

history blame contribute delete

3.61 kB

	---
	title: Bangla Book Recommender
	emoji: 📚
	colorFrom: purple
	colorTo: blue
	sdk: gradio
	sdk_version: 5.45.0
	app_file: app.py
	pinned: false
	license: cc-by-nc-4.0
	short_description: Cold-start recommendations from 127K Bangla books
	tags:
	- recommendation
	- bangla
	- lightgcn
	- two-tower
	---

	# 📚 Bangla Book Recommender

	A cold-start book recommendation interface for the RokomariBG dataset —
	the first large-scale multi-entity heterogeneous graph dataset for Bangla book
	recommendation, built from publicly scraped reviews and metadata on
	[Rokomari.com](https://rokomari.com), Bangladesh's largest online bookstore.

	The Space lets a visitor pick a few books they have enjoyed and receive
	recommendations from either of two benchmarked models from the paper:

	\| Model \| Notes \|
	\|---\|---\|
	\| Neural Two-Tower \| Best benchmarked model. Item tower fuses ID, content (title, summary, author, publisher), and metadata (price, rating, pages); user tower combines a user-ID embedding with pooled history. Strongest at cold-start. Covers all 127K books. \|
	\| LightGCN \| Pure graph collaborative filtering with 4 GCN layers and uniform layer averaging. Trained on the 16K-book subset with sufficient interaction history. \|

	## How recommendation works (cold-start mode)

	A new visitor has no `user_id` in the trained models. So instead of looking up
	a stored user embedding, the Space derives one on the fly:

	1. The visitor selects N books they have enjoyed (Bangla or English).
	2. Each model has a precomputed book embedding for every book in its
	training catalogue.
	3. The visitor's taste vector is computed as the L2-normalised mean of the
	selected books' embeddings.
	4. Cosine similarity is computed between the taste vector and every book
	embedding. The seed books are masked out, and the top-K nearest books are
	returned.
	5. Liked recommendations can be promoted into the seed set, after which
	re-running yields progressively more personalised results.

	The pipeline is pure NumPy at inference time — no model is loaded at runtime,
	only the precomputed embeddings — so the Space is fast and cheap to host on
	the free CPU tier.

	## File layout

	All artefacts live in the Space root:

	```
	.
	├── app.py # Gradio app
	├── requirements.txt
	├── README.md # this file
	├── books_metadata.parquet # title, author, category, rating, summary, url
	├── two_tower_book_emb.npy # (N, 256) float32, L2-normalisable
	├── two_tower_book_ids.npy # (N,) book_id strings, row-aligned
	├── lightgcn_book_emb.npy # (16266, 64) float32, L2-normalisable
	└── lightgcn_book_ids.npy # (16266,) book_id strings, row-aligned
	```

	## Citation

	```bibtex
	@misc{ahmed2026personalizedbanglabookrecommendation,
	title = {Towards Personalized Bangla Book Recommendation:
	A Large-Scale Multi-Entity Book Graph Dataset},
	author = {Rahin Arefin Ahmed and Md. Anik Chowdhury and
	Sakil Ahmed Sheikh Reza and Devnil Bhattacharjee and
	Muhammad Abdullah Adnan and Nafis Sadeq},
	year = {2026},
	eprint = {2602.12129},
	archivePrefix = {arXiv},
	primaryClass = {cs.IR},
	url = {https://arxiv.org/abs/2602.12129}
	}
	```

	## Links

	- 📄 Paper: <https://arxiv.org/abs/2602.12129>
	- 💾 Dataset: <https://huggingface.co/datasets/DevnilMaster1/Bangla-Book-Recommendation-Dataset>
	- 🐙 Code: <https://github.com/DevnilMaster/Bangla-Book-Recommendation-Dataset>

	License: CC BY-NC 4.0 (matches the dataset licence)