--- library_name: transformers tags: [] --- # HinDiffusionLM: Diffusion Language Model for Hindi Language Turning BERT-based model into an instruct-tuned LLADA-style Diffusion LLM on Hindi instruction data using a masked language modeling approach with diffusion-style generation. The model learns to iteratively denoise masked tokens to generate coherent responses in Hindi (trained on Kaggle GPU T4*2). ## Experiments ### Models Evaluated | Model | Performance | |-------|-------------| | `google/muril-base-cased` | **Best** | | `google/muril-large-cased` | Poor | | `ai4bharat/indic-bert` | Moderate | ### Datasets Tested | Dataset | Subset | Status | Notes | |---------|--------|--------|-------| | `ai4bharat/indic-instruct-data-v0.1` | `anudesh` | **Used** | Primary dataset for demonstration | | `ai4bharat/indic-instruct-data-v0.1` | `lm_sys` | Skipped | Too time-intensive for training & GPU constraints|