SwastikGuhaRoy
/

TagoreX

Safetensors

gpt2

Model card Files Files and versions

xet

Community

SwastikGuhaRoy commited on Jul 12, 2025

Commit

2ddb225

verified ·

1 Parent(s): d8eca4a

Update README.md

Browse files

Files changed (1) hide show

README.md +112 -1

README.md CHANGED Viewed

@@ -3,4 +3,115 @@ license: apache-2.0
 language:
 - bn
 pipeline_tag: text-generation
----

 language:
 - bn
 pipeline_tag: text-generation
+---
+-------
+## 🕊️ TagoreX – A Bengali Text Generator Inspired by Tagore
+**Model name:** `SwastikGuhaRoy/TagoreX`
+**Base model:** `GPT-2` with LoRA adapters [(based on `AddaGPT2.0`)](https://huggingface.co/SwastikGuhaRoy/AddaGPT2.0)
+**Language:** Bengali
+**Author:** Swastik Guha Roy (`@SwastikGuhaRoy`)
+**License:** MIT
+**Model size:** \~124M parameters
+**Trained on:** Curated (but imperfect) corpus of Rabindranath Tagore’s writings
+**Intended use:** Poetic and philosophical Bengali text generation
+**Demo app:** [TagoreX + Gemini Streamlit App](https://tagorexgemini.streamlit.app)
+---
+### 📘 Model Description
+**TagoreX** is a fine-tuned version of `AddaGPT2.0` — a small GPT-2 model adapted for Bengali using LoRA (Low-Rank Adaptation).
+This model was trained on literary works of Rabindranath Tagore  as a tribute.
+The model continues a given Bengali prompt in a Tagore-like poetic tone. It generates \~256 tokens, which are then optionally refined by Gemini AI in a downstream application.
+---
+### 🔧 Technical Details
+* **Architecture**: GPT-2 (117M parameters)
+* **Training strategy**: Full fine-tuning
+* **Epochs**: 22 (symbolically referencing “২২শে শ্রাবণ”)
+* **Max sequence length**: 256 tokens
+* **Tokenizer**: AutoTokenizer from the base model
+* **Framework**: PyTorch + Transformers
+---
+### 📂 Training Data
+The dataset includes poems,  prose and other works from Rabindranath Tagore which is [publicly available](https://archive.org/details/RABINDRARACHANABALI/). [The dataset can be accessed in a consolidated .txt format from here :](https://huggingface.co/datasets/SwastikGuhaRoy/WorksofTagore)
+⚠️ **Note**: The data may contain:
+* Typos, formatting errors
+* OCR issues
+* Incomplete or duplicated lines
+This model is not a scholarly curation, but an experimental artistic rendering.
+---
+### 🎯 Intended Use
+**You can use this model to:**
+* Experiment with Bengali poetic text generation
+* Create creative writing prompts in Bengali
+* Explore Indic LLM capabilities in low-resource settings
+This model is **not suitable** for:
+* Any commercial or sensitive deployment
+* Factual or linguistic accuracy tasks
+* Scholarly representation of Tagore’s works
+---
+### 💬 How to Prompt
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("SwastikGuhaRoy/TagoreX")
+model = AutoModelForCausalLM.from_pretrained("SwastikGuhaRoy/TagoreX")
+prompt = "তুমি রবে নীরবে"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+---
+### 🚫 Limitations & Disclaimer
+* Not aligned, filtered, or safety-trained.
+* Most outputs may be incoherent, repetitive, or nonsensical.
+* This is **not** meant to reproduce or replace Tagore's literary work.
+* The generation reflects training data and randomness — not any human author.
+---
+### 🌏 Why It Matters
+TagoreX demonstrates how even small-scale, open models can express poetic and cultural essence in Indic languages — using limited compute and a lot of curiosity.
+It aims to inspire communities to build **Indic LLMs**, especially in low-resource and rural settings.
+> *"AI doesn’t have to be massive. It can be local, soulful, and deeply human."*
+---
+---
+### 📫 Contact
+📧 Email: `swastikguharoy@googlemail.com`
+💬 Feedback, bugs, or nice generations? I'd love to hear from you!
+---