pt4c commited on
Commit
ca8a310
·
verified ·
1 Parent(s): f720cce

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -3,7 +3,7 @@
3
  ## Model Description
4
  This tokenizer was developed for Yambeta, a Bantu language from Cameroon. The tokenizer is based on the WordPiece model architecture and has been fine-tuned to handle the unique phonetic and diacritical features of the Yambeta language.
5
 
6
- - **Developed by**: DS4H-ICTU Research Group
7
  - **Language(s)**: Yambeta (Bantu language from Cameroon)
8
  - **License**: Apache 2.0 (or specify if different)
9
  - **Model Type**: Tokenizer (WordPiece)
@@ -24,7 +24,7 @@ This tokenizer was developed for Yambeta, a Bantu language from Cameroon. The to
24
  ## Training Details
25
  - **Training Data**: Extracted from Yambeta Bible text corpus (final_dataset.xlsx).
26
  - **Training Procedure**: Preprocessing of text involved normalization of diacritics, tokenization using WordPiece, and post-processing to handle special tokens.
27
- - **Training Hyperparameters**:
28
  - Vocabulary Size: 25,000
29
  - Special Tokens: [UNK], [PAD], [CLS], [SEP], [MASK]
30
 
 
3
  ## Model Description
4
  This tokenizer was developed for Yambeta, a Bantu language from Cameroon. The tokenizer is based on the WordPiece model architecture and has been fine-tuned to handle the unique phonetic and diacritical features of the Yambeta language.
5
 
6
+ - **Developed by**: DS4H-ICTU Research Group in Cooperation with the
7
  - **Language(s)**: Yambeta (Bantu language from Cameroon)
8
  - **License**: Apache 2.0 (or specify if different)
9
  - **Model Type**: Tokenizer (WordPiece)
 
24
  ## Training Details
25
  - **Training Data**: Extracted from Yambeta Bible text corpus (final_dataset.xlsx).
26
  - **Training Procedure**: Preprocessing of text involved normalization of diacritics, tokenization using WordPiece, and post-processing to handle special tokens.
27
+ - **Training Hyperparameters**:
28
  - Vocabulary Size: 25,000
29
  - Special Tokens: [UNK], [PAD], [CLS], [SEP], [MASK]
30