Mathiarasi commited on
Commit
2913f77
·
verified ·
1 Parent(s): 76b5267

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -61
README.md CHANGED
@@ -8,137 +8,85 @@ pipeline_tag: fill-mask
8
  ---
9
 
10
  Model Card for Telugu BERT Model
11
-
12
  This model is a BERT-based language model trained for Masked Language Modeling (MLM) in Telugu. It is designed to understand and generate Telugu text effectively.
13
 
14
  Model Details
15
-
16
  Model Description
17
-
18
  Developed by: MATHI
19
-
20
  Model type: Transformer-based Masked Language Model (MLM)
21
-
22
  Language(s) (NLP): Telugu
23
-
24
- License: [MIT, Apache 2.0, or your chosen license]
25
-
26
-
27
  Model Sources
28
-
29
- Repository: [GitHub/Hugging Face Model Repo]
30
-
31
- Paper [optional]: [If applicable]
32
-
33
- Demo [optional]: Colab Notebook
34
 
35
  Uses
36
-
37
  Direct Use
38
-
39
- This model can be used for:
40
-
41
- Text completion in Telugu
42
-
43
  Fill-mask prediction (predict missing words in a sentence)
44
-
45
  Pretraining or fine-tuning for Telugu NLP tasks
46
 
47
  Downstream Use
48
-
49
  Fine-tuned versions of this model can be used for:
50
-
51
  Named Entity Recognition (NER)
52
-
53
  Sentiment Analysis
54
-
55
  Machine Translation
56
-
57
  Text Summarization
58
 
59
  Out-of-Scope Use
60
-
61
  Not suitable for real-time dialogue generation
62
-
63
  Not trained for code-mixing (Telugu + English)
64
 
65
  Bias, Risks, and Limitations
66
-
67
  The model may reflect biases present in the training data.
68
-
69
  Accuracy may vary for dialectal variations of Telugu.
70
-
71
  May generate incorrect or misleading predictions.
72
 
73
  Recommendations
74
-
75
  Users should verify the model's outputs before relying on them for critical applications.
76
 
77
  How to Get Started with the Model
78
-
79
  Use the code below to get started:
80
-
81
- from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
82
-
83
- model_name = "Mathiarasi/TMod"
84
  tokenizer = AutoTokenizer.from_pretrained(model_name)
85
  model = AutoModelForMaskedLM.from_pretrained(model_name)
86
-
87
  fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer)
88
  print(fill_mask("మక్దూంపల్లి పేరుతో చాలా [MASK] ఉన్నాయి."))
89
 
90
  Training Details
91
-
92
  Training Data
93
-
94
  The model is trained on a Telugu corpus containing diverse text sources.
95
-
96
  Data preprocessing included text normalization, cleaning, and tokenization.
97
-
98
  Training Procedure
99
-
100
  Preprocessing
101
-
102
  Used WordPiece Tokenizer with a vocabulary of 30,000 tokens.
103
 
104
  Training Hyperparameters
105
-
106
  Batch Size: 16
107
-
108
  Learning Rate: 5e-5
109
-
110
  Epochs: 3
111
-
112
  Optimizer: AdamW
113
 
114
  Speeds, Sizes, Times
115
-
116
  Testing Data
117
-
118
  Evaluated on a held-out dataset of Telugu text.
119
 
120
  Technical Specifications
121
-
122
  Model Architecture and Objective
123
 
124
  Model Type: BERT (Bidirectional Encoder Representations from Transformers)
125
 
126
  Training Objective: Masked Language Modeling (MLM)
127
 
128
- Compute Infrastructure
129
-
130
- Hardware
131
-
132
- Trained on [Hardware Details]
133
  Dataset library: datasets
134
 
135
  Citation
136
-
137
  If you use this model, please cite:
138
 
139
- @article{YourName2025,
140
  title={Telugu BERT: A Transformer-Based Language Model for Telugu},
141
- author={Your Name},
142
  journal={Hugging Face Models},
143
  year={2025}
144
  }
@@ -146,5 +94,4 @@ If you use this model, please cite:
146
  Model Card Authors : MATHIARASI
147
 
148
  Model Card Contact
149
-
150
  For questions, contact mathiarasie1710@gmail.com
 
8
  ---
9
 
10
  Model Card for Telugu BERT Model
 
11
  This model is a BERT-based language model trained for Masked Language Modeling (MLM) in Telugu. It is designed to understand and generate Telugu text effectively.
12
 
13
  Model Details
 
14
  Model Description
 
15
  Developed by: MATHI
 
16
  Model type: Transformer-based Masked Language Model (MLM)
 
17
  Language(s) (NLP): Telugu
18
+ License: MIT
 
 
 
19
  Model Sources
20
+ Repository: Hugging Face Model Repo
21
+ Demo : Colab Notebook
 
 
 
 
22
 
23
  Uses
 
24
  Direct Use
25
+ This model can be used for: Text completion in Telugu
 
 
 
 
26
  Fill-mask prediction (predict missing words in a sentence)
 
27
  Pretraining or fine-tuning for Telugu NLP tasks
28
 
29
  Downstream Use
 
30
  Fine-tuned versions of this model can be used for:
 
31
  Named Entity Recognition (NER)
 
32
  Sentiment Analysis
 
33
  Machine Translation
 
34
  Text Summarization
35
 
36
  Out-of-Scope Use
 
37
  Not suitable for real-time dialogue generation
 
38
  Not trained for code-mixing (Telugu + English)
39
 
40
  Bias, Risks, and Limitations
 
41
  The model may reflect biases present in the training data.
 
42
  Accuracy may vary for dialectal variations of Telugu.
 
43
  May generate incorrect or misleading predictions.
44
 
45
  Recommendations
 
46
  Users should verify the model's outputs before relying on them for critical applications.
47
 
48
  How to Get Started with the Model
 
49
  Use the code below to get started:
50
+ from transformers import AutoModelForMaskedLM, AutoTokenizer, pipeline
51
+ model_name = "Mathiarasi/TMod"
 
 
52
  tokenizer = AutoTokenizer.from_pretrained(model_name)
53
  model = AutoModelForMaskedLM.from_pretrained(model_name)
 
54
  fill_mask = pipeline("fill-mask", model=model, tokenizer=tokenizer)
55
  print(fill_mask("మక్దూంపల్లి పేరుతో చాలా [MASK] ఉన్నాయి."))
56
 
57
  Training Details
 
58
  Training Data
 
59
  The model is trained on a Telugu corpus containing diverse text sources.
 
60
  Data preprocessing included text normalization, cleaning, and tokenization.
 
61
  Training Procedure
 
62
  Preprocessing
 
63
  Used WordPiece Tokenizer with a vocabulary of 30,000 tokens.
64
 
65
  Training Hyperparameters
 
66
  Batch Size: 16
 
67
  Learning Rate: 5e-5
 
68
  Epochs: 3
 
69
  Optimizer: AdamW
70
 
71
  Speeds, Sizes, Times
 
72
  Testing Data
 
73
  Evaluated on a held-out dataset of Telugu text.
74
 
75
  Technical Specifications
 
76
  Model Architecture and Objective
77
 
78
  Model Type: BERT (Bidirectional Encoder Representations from Transformers)
79
 
80
  Training Objective: Masked Language Modeling (MLM)
81
 
 
 
 
 
 
82
  Dataset library: datasets
83
 
84
  Citation
 
85
  If you use this model, please cite:
86
 
87
+ @article{Mathiarasi2025,
88
  title={Telugu BERT: A Transformer-Based Language Model for Telugu},
89
+ author={Mathiarasi},
90
  journal={Hugging Face Models},
91
  year={2025}
92
  }
 
94
  Model Card Authors : MATHIARASI
95
 
96
  Model Card Contact
 
97
  For questions, contact mathiarasie1710@gmail.com