Upload KenLM model
Browse files- BoKenlm-sp.arpa +2 -2
- README.md +17 -17
BoKenlm-sp.arpa
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:afae57135f995f6e5817bbb9d77bb9bf8879ad40e3b1353713660cbd7d938612
|
| 3 |
+
size 1522379816
|
README.md
CHANGED
|
@@ -10,39 +10,39 @@ A KenLM n-gram language model trained on Tibetan text, tokenized with sentencepi
|
|
| 10 |
| **Tokenizer** | [openpecha/BoSentencePiece](https://huggingface.co/openpecha/BoSentencePiece) (Unigram, 20k vocab) |
|
| 11 |
| **Training Corpus** | `bo_corpus.txt` |
|
| 12 |
| **Pruning** | 0 0 1 |
|
| 13 |
-
| **Tokens** |
|
| 14 |
-
| **Vocabulary Size** |
|
| 15 |
|
| 16 |
## N-gram Statistics
|
| 17 |
|
| 18 |
| Order | Count | D1 | D2 | D3+ |
|
| 19 |
| --- | --- | --- | --- | --- |
|
| 20 |
-
| 1 |
|
| 21 |
-
| 2 | 6,
|
| 22 |
-
| 3 | 4,
|
| 23 |
-
| 4 |
|
| 24 |
-
| 5 |
|
| 25 |
|
| 26 |
## Memory Estimates
|
| 27 |
|
| 28 |
| Type | MB | Details |
|
| 29 |
| --- | --- | --- |
|
| 30 |
-
| probing |
|
| 31 |
-
| probing |
|
| 32 |
-
| trie |
|
| 33 |
-
| trie |
|
| 34 |
-
| trie |
|
| 35 |
-
| trie |
|
| 36 |
|
| 37 |
## Training Resources
|
| 38 |
|
| 39 |
| Metric | Value |
|
| 40 |
| --- | --- |
|
| 41 |
| **Peak Virtual Memory** | 12,333 MB |
|
| 42 |
-
| **Peak RSS** |
|
| 43 |
-
| **Wall Time** |
|
| 44 |
-
| **User Time** |
|
| 45 |
-
| **System Time** |
|
| 46 |
|
| 47 |
## Usage
|
| 48 |
|
|
|
|
| 10 |
| **Tokenizer** | [openpecha/BoSentencePiece](https://huggingface.co/openpecha/BoSentencePiece) (Unigram, 20k vocab) |
|
| 11 |
| **Training Corpus** | `bo_corpus.txt` |
|
| 12 |
| **Pruning** | 0 0 1 |
|
| 13 |
+
| **Tokens** | 42,010,347 |
|
| 14 |
+
| **Vocabulary Size** | 20,003 |
|
| 15 |
|
| 16 |
## N-gram Statistics
|
| 17 |
|
| 18 |
| Order | Count | D1 | D2 | D3+ |
|
| 19 |
| --- | --- | --- | --- | --- |
|
| 20 |
+
| 1 | 20,003 | 0.4921 | 0.3393 | 1.0317 |
|
| 21 |
+
| 2 | 6,945,893 | 0.6676 | 1.1495 | 1.5504 |
|
| 22 |
+
| 3 | 4,960,553 | 0.8443 | 1.2638 | 1.4835 |
|
| 23 |
+
| 4 | 4,211,842 | 0.9154 | 1.3888 | 1.5332 |
|
| 24 |
+
| 5 | 3,276,583 | 0.8525 | 1.5142 | 1.6453 |
|
| 25 |
|
| 26 |
## Memory Estimates
|
| 27 |
|
| 28 |
| Type | MB | Details |
|
| 29 |
| --- | --- | --- |
|
| 30 |
+
| probing | 425 | assuming -p 1.5 |
|
| 31 |
+
| probing | 517 | assuming -r models -p 1.5 |
|
| 32 |
+
| trie | 211 | without quantization |
|
| 33 |
+
| trie | 112 | assuming -q 8 -b 8 quantization |
|
| 34 |
+
| trie | 180 | assuming -a 22 array pointer compression |
|
| 35 |
+
| trie | 81 | assuming -a 22 -q 8 -b 8 array pointer compression and quantization |
|
| 36 |
|
| 37 |
## Training Resources
|
| 38 |
|
| 39 |
| Metric | Value |
|
| 40 |
| --- | --- |
|
| 41 |
| **Peak Virtual Memory** | 12,333 MB |
|
| 42 |
+
| **Peak RSS** | 3,578 MB |
|
| 43 |
+
| **Wall Time** | 42.9s |
|
| 44 |
+
| **User Time** | 48.5s |
|
| 45 |
+
| **System Time** | 19.7s |
|
| 46 |
|
| 47 |
## Usage
|
| 48 |
|