๐๏ธ TagoreX โ A Bengali Text Generator Inspired by Tagore
Model name: SwastikGuhaRoy/TagoreX
Base model: GPT-2 with LoRA adapters (based on AddaGPT2.0)
Language: Bengali
Author: Swastik Guha Roy (@SwastikGuhaRoy)
License: MIT
Model size: ~124M parameters
Trained on: Curated (but imperfect) corpus of Rabindranath Tagoreโs writings
Intended use: Poetic and philosophical Bengali text generation
Demo app: TagoreX + Gemini Streamlit App
๐ Model Description
TagoreX is a fine-tuned version of AddaGPT2.0 โ a small GPT-2 model adapted for Bengali using LoRA (Low-Rank Adaptation).
This model was trained on literary works of Rabindranath Tagore as a tribute.
The model continues a given Bengali prompt in a Tagore-like poetic tone. It generates ~256 tokens, which are then optionally refined by Gemini AI in a downstream application.
๐ง Technical Details
- Architecture: GPT-2 (117M parameters)
- Training strategy: Full fine-tuning
- Epochs: 22 (symbolically referencing โเงจเงจเฆถเง เฆถเงเฆฐเฆพเฆฌเฆฃโ)
- Max sequence length: 256 tokens
- Tokenizer: AutoTokenizer from the base model
- Framework: PyTorch + Transformers
๐ Training Data
The dataset includes poems, prose and other works from Rabindranath Tagore which is publicly available. The dataset can be accessed in a consolidated .txt format from here
โ ๏ธ Note: The data may and DOES contain:
- Typos, formatting errors
- OCR issues
- Incomplete or duplicated lines
This model is not a scholarly curation, but an experimental artistic rendering.
๐ฏ Intended Use
You can use this model to:
- Experiment with Bengali poetic text generation
- Create creative writing prompts in Bengali
- Explore Indic LLM capabilities in low-resource settings
This model is not suitable for:
- Any commercial or sensitive deployment
- Factual or linguistic accuracy tasks
- Scholarly representation of Tagoreโs works
๐ฌ How to Prompt
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SwastikGuhaRoy/TagoreX")
model = AutoModelForCausalLM.from_pretrained("SwastikGuhaRoy/TagoreX")
prompt = "เฆคเงเฆฎเฆฟ เฆฐเฆฌเง เฆจเงเฆฐเฆฌเง"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
๐ซ Limitations & Disclaimer
- Not aligned, filtered, or safety-trained.
- Most outputs may be incoherent, repetitive, or nonsensical.
- This is not meant to reproduce or replace Tagore's literary work.
- The generation reflects training data and randomness โ not any human author.
๐ Why It Matters
TagoreX demonstrates how even small-scale, open models can express poetic and cultural essence in Indic languages โ using limited compute and a lot of curiosity.
It aims to inspire communities to build Indic LLMs, especially in low-resource and rural settings.
"AI doesnโt have to be massive. It can be local, soulful, and deeply human."
๐ซ Contact
๐ง Email: swastikguharoy@googlemail.com
๐ฌ Feedback, bugs, or nice generations? I'd love to hear from you!
- Downloads last month
- 3