Mathiarasi
/

TMod

Fill-Mask

Telugu

Model card Files Files and versions

xet

Community

Mathiarasi commited on Feb 24, 2025

Commit

2913f77

verified ·

1 Parent(s): 76b5267

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -61

README.md CHANGED Viewed

@@ -8,137 +8,85 @@ pipeline_tag: fill-mask
 ---
 Model Card for Telugu BERT Model
 This model is a BERT-based language model trained for Masked Language Modeling (MLM) in Telugu. It is designed to understand and generate Telugu text effectively.
 Model Details
 Model Description
 Developed by: MATHI
 Model type: Transformer-based Masked Language Model (MLM)
 Language(s) (NLP): Telugu
-License: [MIT, Apache 2.0, or your chosen license]
 Model Sources
-Repository: [GitHub/Hugging Face Model Repo]
-Paper [optional]: [If applicable]
-Demo [optional]: Colab Notebook
 Uses
 Direct Use
-This model can be used for:
-Text completion in Telugu
 Fill-mask prediction (predict missing words in a sentence)
 Pretraining or fine-tuning for Telugu NLP tasks
 Downstream Use
 Fine-tuned versions of this model can be used for:
 Named Entity Recognition (NER)
 Sentiment Analysis
 Machine Translation
 Text Summarization
 Out-of-Scope Use
 Not suitable for real-time dialogue generation
 Not trained for code-mixing (Telugu + English)
 Bias, Risks, and Limitations
 The model may reflect biases present in the training data.
 Accuracy may vary for dialectal variations of Telugu.
 May generate incorrect or misleading predictions.
 Recommendations
 Users should verify the model's outputs before relying on them for critical applications.
 How to Get Started with the Model
 Use the code below to get started:
-from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
-  model_name = "Mathiarasi/TMod"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForMaskedLM.from_pretrained(model_name)
 fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer)
 print(fill_mask("మక్దూంపల్లి పేరుతో చాలా [MASK] ఉన్నాయి."))
 Training Details
 Training Data
 The model is trained on a Telugu corpus containing diverse text sources.
 Data preprocessing included text normalization, cleaning, and tokenization.
 Training Procedure
 Preprocessing
 Used WordPiece Tokenizer with a vocabulary of 30,000 tokens.
 Training Hyperparameters
 Batch Size: 16
 Learning Rate: 5e-5
 Epochs: 3
 Optimizer: AdamW
 Speeds, Sizes, Times
 Testing Data
 Evaluated on a held-out dataset of Telugu text.
 Technical Specifications
 Model Architecture and Objective
 Model Type: BERT (Bidirectional Encoder Representations from Transformers)
 Training Objective: Masked Language Modeling (MLM)
-Compute Infrastructure
-Hardware
-Trained on [Hardware Details]
 Dataset library: datasets
 Citation
 If you use this model, please cite:
-@article{YourName2025,
   title={Telugu BERT: A Transformer-Based Language Model for Telugu},
-  author={Your Name},
   journal={Hugging Face Models},
   year={2025}
 }
@@ -146,5 +94,4 @@ If you use this model, please cite:
 Model Card Authors : MATHIARASI
 Model Card Contact
 For questions, contact mathiarasie1710@gmail.com

 ---
 Model Card for Telugu BERT Model
 This model is a BERT-based language model trained for Masked Language Modeling (MLM) in Telugu. It is designed to understand and generate Telugu text effectively.
 Model Details
 Model Description
 Developed by: MATHI
 Model type: Transformer-based Masked Language Model (MLM)
 Language(s) (NLP): Telugu
+License: MIT
 Model Sources
+Repository: Hugging Face Model Repo
+Demo : Colab Notebook
 Uses
 Direct Use
+This model can be used for: Text completion in Telugu
 Fill-mask prediction (predict missing words in a sentence)
 Pretraining or fine-tuning for Telugu NLP tasks
 Downstream Use
 Fine-tuned versions of this model can be used for:
 Named Entity Recognition (NER)
 Sentiment Analysis
 Machine Translation
 Text Summarization
 Out-of-Scope Use
 Not suitable for real-time dialogue generation
 Not trained for code-mixing (Telugu + English)
 Bias, Risks, and Limitations
 The model may reflect biases present in the training data.
 Accuracy may vary for dialectal variations of Telugu.
 May generate incorrect or misleading predictions.
 Recommendations
 Users should verify the model's outputs before relying on them for critical applications.
 How to Get Started with the Model
 Use the code below to get started:
+from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
+model_name = "Mathiarasi/TMod"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForMaskedLM.from_pretrained(model_name)
 fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer)
 print(fill_mask("మక్దూంపల్లి పేరుతో చాలా [MASK] ఉన్నాయి."))
 Training Details
 Training Data
 The model is trained on a Telugu corpus containing diverse text sources.
 Data preprocessing included text normalization, cleaning, and tokenization.
 Training Procedure
 Preprocessing
 Used WordPiece Tokenizer with a vocabulary of 30,000 tokens.
 Training Hyperparameters
 Batch Size: 16
 Learning Rate: 5e-5
 Epochs: 3
 Optimizer: AdamW
 Speeds, Sizes, Times
 Testing Data
 Evaluated on a held-out dataset of Telugu text.
 Technical Specifications
 Model Architecture and Objective
 Model Type: BERT (Bidirectional Encoder Representations from Transformers)
 Training Objective: Masked Language Modeling (MLM)
 Dataset library: datasets
 Citation
 If you use this model, please cite:
+@article{Mathiarasi2025,
   title={Telugu BERT: A Transformer-Based Language Model for Telugu},
+  author={Mathiarasi},
   journal={Hugging Face Models},
   year={2025}
 }
 Model Card Authors : MATHIARASI
 Model Card Contact
 For questions, contact mathiarasie1710@gmail.com