🕊️ TagoreX – A Bengali Text Generator Inspired by Tagore
Model name: SwastikGuhaRoy/TagoreX
Base model: GPT-2 with LoRA adapters (based on AddaGPT2.0)
Language: Bengali
Author: Swastik Guha Roy (@SwastikGuhaRoy)
License: MIT
Model size: ~124M parameters
Trained on: Curated (but imperfect) corpus of Rabindranath Tagore’s writings
Intended use: Poetic and philosophical Bengali text generation
Demo app: TagoreX + Gemini Streamlit App
📘 Model Description
TagoreX is a fine-tuned version of AddaGPT2.0 — a small GPT-2 model adapted for Bengali using LoRA (Low-Rank Adaptation).
This model was trained on literary works of Rabindranath Tagore as a tribute.
The model continues a given Bengali prompt in a Tagore-like poetic tone. It generates ~256 tokens, which are then optionally refined by Gemini AI in a downstream application.
🔧 Technical Details
- Architecture: GPT-2 (117M parameters)
- Training strategy: Full fine-tuning
- Epochs: 22 (symbolically referencing “২২শে শ্রাবণ”)
- Max sequence length: 256 tokens
- Tokenizer: AutoTokenizer from the base model
- Framework: PyTorch + Transformers
📂 Training Data
The dataset includes poems, prose and other works from Rabindranath Tagore which is publicly available. The dataset can be accessed in a consolidated .txt format from here
⚠️ Note: The data may and DOES contain:
- Typos, formatting errors
- OCR issues
- Incomplete or duplicated lines
This model is not a scholarly curation, but an experimental artistic rendering.
🎯 Intended Use
You can use this model to:
- Experiment with Bengali poetic text generation
- Create creative writing prompts in Bengali
- Explore Indic LLM capabilities in low-resource settings
This model is not suitable for:
- Any commercial or sensitive deployment
- Factual or linguistic accuracy tasks
- Scholarly representation of Tagore’s works
💬 How to Prompt
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SwastikGuhaRoy/TagoreX")
model = AutoModelForCausalLM.from_pretrained("SwastikGuhaRoy/TagoreX")
prompt = "তুমি রবে নীরবে"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
🚫 Limitations & Disclaimer
- Not aligned, filtered, or safety-trained.
- Most outputs may be incoherent, repetitive, or nonsensical.
- This is not meant to reproduce or replace Tagore's literary work.
- The generation reflects training data and randomness — not any human author.
🌏 Why It Matters
TagoreX demonstrates how even small-scale, open models can express poetic and cultural essence in Indic languages — using limited compute and a lot of curiosity.
It aims to inspire communities to build Indic LLMs, especially in low-resource and rural settings.
"AI doesn’t have to be massive. It can be local, soulful, and deeply human."
📫 Contact
📧 Email: swastikguharoy@googlemail.com
💬 Feedback, bugs, or nice generations? I'd love to hear from you!