## 🕊️ TagoreX – A Bengali Text Generator Inspired by Tagore **Model name:** `SwastikGuhaRoy/TagoreX` **Base model:** `GPT-2` with LoRA adapters [(based on `AddaGPT2.0`)](https://huggingface.co/SwastikGuhaRoy/AddaGPT2.0) **Language:** Bengali **Author:** Swastik Guha Roy (`@SwastikGuhaRoy`) **License:** MIT **Model size:** \~124M parameters **Trained on:** Curated (but imperfect) corpus of Rabindranath Tagore’s writings **Intended use:** Poetic and philosophical Bengali text generation **Demo app:** [TagoreX + Gemini Streamlit App](https://tagorexgemini.streamlit.app) --- ### 📘 Model Description **TagoreX** is a fine-tuned version of `AddaGPT2.0` — a small GPT-2 model adapted for Bengali using LoRA (Low-Rank Adaptation). This model was trained on literary works of Rabindranath Tagore as a tribute. The model continues a given Bengali prompt in a Tagore-like poetic tone. It generates \~256 tokens, which are then optionally refined by Gemini AI in a downstream application. --- ### 🔧 Technical Details * **Architecture**: GPT-2 (117M parameters) * **Training strategy**: Full fine-tuning * **Epochs**: 22 (symbolically referencing “২২শে শ্রাবণ”) * **Max sequence length**: 256 tokens * **Tokenizer**: AutoTokenizer from the base model * **Framework**: PyTorch + Transformers --- ### 📂 Training Data The dataset includes poems, prose and other works from Rabindranath Tagore which is [publicly available](https://archive.org/details/RABINDRARACHANABALI/). [The dataset can be accessed in a consolidated .txt format from here](https://huggingface.co/datasets/SwastikGuhaRoy/WorksofTagore) ⚠️ **Note**: The data may and DOES contain: * Typos, formatting errors * OCR issues * Incomplete or duplicated lines This model is not a scholarly curation, but an experimental artistic rendering. --- ### 🎯 Intended Use **You can use this model to:** * Experiment with Bengali poetic text generation * Create creative writing prompts in Bengali * Explore Indic LLM capabilities in low-resource settings This model is **not suitable** for: * Any commercial or sensitive deployment * Factual or linguistic accuracy tasks * Scholarly representation of Tagore’s works --- ### 💬 How to Prompt ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SwastikGuhaRoy/TagoreX") model = AutoModelForCausalLM.from_pretrained("SwastikGuhaRoy/TagoreX") prompt = "তুমি রবে নীরবে" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, temperature=0.7) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ### 🚫 Limitations & Disclaimer * Not aligned, filtered, or safety-trained. * Most outputs may be incoherent, repetitive, or nonsensical. * This is **not** meant to reproduce or replace Tagore's literary work. * The generation reflects training data and randomness — not any human author. --- ### 🌏 Why It Matters TagoreX demonstrates how even small-scale, open models can express poetic and cultural essence in Indic languages — using limited compute and a lot of curiosity. It aims to inspire communities to build **Indic LLMs**, especially in low-resource and rural settings. > *"AI doesn’t have to be massive. It can be local, soulful, and deeply human."* --- --- ### 📫 Contact 📧 Email: `swastikguharoy@googlemail.com` 💬 Feedback, bugs, or nice generations? I'd love to hear from you! ---