tanuj437 commited on
Commit
2279ad8
·
verified ·
1 Parent(s): 9ea3987

Upload 7 files

Browse files
Files changed (7) hide show
  1. README.md +81 -1
  2. config.json +24 -0
  3. model.safetensors +3 -0
  4. optimizer.pt +3 -0
  5. rng_state.pth +3 -0
  6. scheduler.pt +3 -0
  7. sp_unigram_64k.model +3 -0
README.md CHANGED
@@ -1,3 +1,83 @@
1
  ---
2
- license: mit
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: sa
3
+ tags:
4
+ - sanskrit
5
+ - bert
6
+ - masked-lm
7
+ - transformers
8
+ license: apache-2.0
9
+ datasets:
10
+ - sanskrit-corpus
11
+ widget:
12
+ - text: "सत्यमेव जयते [MASK]"
13
+ inference: true
14
  ---
15
+
16
+ # SanskritBERT (Light)
17
+
18
+ **SanskritBERT** is a lightweight Transformer model trained specifically for the Sanskrit language. It is based on the BERT architecture and trained using the Masked Language Modeling (MLM) objective.
19
+
20
+ ## Model Description
21
+
22
+ - **Shared by:** [Your Name / Organization]
23
+ - **Model type:** Transformers Encoder (BERT-like)
24
+ - **Language:** Sanskrit
25
+ - **License:** Apache 2.0
26
+ - **Finetuned from model:** None (Trained from scratch)
27
+
28
+ ### Model Architecture
29
+ - **Layers**: 6
30
+ - **Hidden Size**: 256
31
+ - **Attention Heads**: 4
32
+ - **Feedforward Size**: 1024
33
+ - **Max Sequence Length**: 512
34
+ - **Vocab Size**: 120,000
35
+ - **Parameters**: ~15M
36
+
37
+ ## Intended Uses & Limitations
38
+
39
+ ### Intended Uses
40
+ - Masked Word Prediction
41
+ - Fine-tuning for Sanskrit NLP tasks involves (POS Tagging, NER, Text Classification)
42
+ - Research into low-resource language modeling
43
+
44
+ ### Limitations
45
+ - The model is "Light" (mobile-friendly), so it may not capture as much nuance as a `bert-base` or `bert-large` model.
46
+ - Performance depends heavily on the domain of the downstream task relative to the pre-training corpus.
47
+
48
+ ## Training Data
49
+
50
+ Trained on a corpus of Sanskrit texts including general literature, wikis, and classical texts.
51
+
52
+ ## Training Procedure
53
+
54
+ - **Optimizer**: AdamW
55
+ - **Precision**: Mixed Precision (bf16)
56
+ - **Batch Size**: 16
57
+ - **Epochs**: 6
58
+
59
+ ## How to Get Started
60
+
61
+ You can use the model directly with the Hugging Face `transformers` library:
62
+
63
+ ```python
64
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
65
+
66
+ tokenizer = AutoTokenizer.from_pretrained("YOUR-USERNAME/SanskritBERT")
67
+ model = AutoModelForMaskedLM.from_pretrained("YOUR-USERNAME/SanskritBERT")
68
+
69
+ text = "सत्यमेव जयते [MASK]"
70
+ inputs = tokenizer(text, return_tensors="pt")
71
+ outputs = model(**inputs)
72
+ ```
73
+
74
+ ## Citation
75
+
76
+ ```bibtex
77
+ @misc{sanskritbert2024,
78
+ title={SanskritBERT: A Light Transformer Model for Sanskrit},
79
+ author={[Your Name]},
80
+ year={2024},
81
+ publisher={Hugging Face}
82
+ }
83
+ ```
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForMaskedLM"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "dtype": "float32",
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 256,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 1024,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 4,
17
+ "num_hidden_layers": 6,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "transformers_version": "4.56.0",
21
+ "type_vocab_size": 8,
22
+ "use_cache": true,
23
+ "vocab_size": 120000
24
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1886bef8fda23e1c3c346d78a073be916dca561a4933b3fd52186232f76a715f
3
+ size 143126368
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6434bdc90fc1fd3bff989181ee2229b8317a8a217cf51a974e811ae740b816c5
3
+ size 286318795
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98297750ec38d0e967da7997ba5e9fa0179b503372743eabc305abfeb5669800
3
+ size 14645
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5cdaf34fc8f85feab5c90bce4d80ad546f5f7957e46984713c176e02ab08b5b7
3
+ size 1465
sp_unigram_64k.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93e037cf410c9bf924c6bd7e6b373b624be51f4e4dadde1d9b2a69f2fdf713ac
3
+ size 2071529