Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

README.md +139 -0
added_tokens.json +3 -0
special_tokens_map.json +30 -0
spm.model +3 -0
spm.vocab +206 -0
tokenizer.json +779 -0
tokenizer.model +3 -0
tokenizer_config.json +59 -0

README.md ADDED Viewed

	@@ -0,0 +1,139 @@

+```markdown
+---
+language:
+- baq
+- bci
+- fr
+tags:
+- African NLP
+- low-resource language
+- sentencepiece
+- tokenizer
+- Baoulé
+- Côte d'Ivoire
+- translation
+- tonal language
+datasets:
+- custom
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text2text-generation
+widget:
+- text: "Wafa sɛ yɛ ɔ fata kɛ be nga be lafi su kɛ bé trán asiɛ’n su wa’n, be bu be nga bé kɔ́ ɲanmiɛn"
+  example_title: "Traduction de base"
+---
+# Tokenizer Baoulé : Modèle de Traduction Français-Baoulé
+🌍 Premier tokenizer SentencePiece spécialisé pour la langue Baoulé (Côte d'Ivoire) 🇨🇮
+[![Hugging Face Hub](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20Hub-blue)](https://huggingface.co/votre_username/baoule-tokenizer)
+## Fonctionnalités Clés
+✅ Prise en charge complète des caractères tonals Baoulé (ɛ́, ɩ̄, ɔ̀, etc.)
+✅ Optimisé pour les modèles de traduction automatique (Transformer)
+✅ Vocabulaire de 206 tokens avec couverture linguistique complète
+✅ Intégration native avec 🤗 Transformers et Tokenizers
+✅ Compatible avec Google Traduction Custom Model et Amazon Translate
+## Installation et Utilisation
+```python
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("Adjoumani/BaouleTokenizer_V1")
+# Utilisation du tokenizer
+text = "Wafa sɛ yɛ ɔ fata kɛ be nga be lafi su kɛ bé trán asiɛ’n su wa’n, be bu be nga bé kɔ́ ɲanmiɛn"
+encoded = tokenizer.encode(text)
+decoded = tokenizer.decode(encoded)
+print(f"Tokens: {tokenizer.tokenize(text)}")
+# Output: ['W', 'a', 'f', 'a', '▁s', 'ɛ', '▁y', 'ɛ', '▁ɔ', '▁f', 'a', 't', 'a', '▁k', 'ɛ', '▁b', 'e', '▁n', 'g', 'a', '▁b', 'e', '▁l', 'a', 'f', 'i', '▁s', 'u', '▁k', 'ɛ', '▁b', 'é', '▁t', 'r', 'á', 'n', '▁a', 's', 'i', 'ɛ', '’', 'n', '▁s', 'u', '▁w', 'a', '’', 'n', ',', '▁b', 'e', '▁b', 'u', '▁b', 'e', '▁n', 'g', 'a', '▁b', 'é', '▁k', 'ɔ', '́', '▁ɲ', 'a', 'n', 'm', 'i', 'ɛ', 'n']
+```
+## Détails Techniques
+| Paramètre          | Valeur               |
+|--------------------|----------------------|
+| Architecture       | SentencePiece BPE    |
+| Taille du vocabulaire | 206                |
+| Caractères couverts | 1.0 (Unicode)        |
+| Tokens spéciaux    | [BOS], [EOS], [UNK], [PAD] |
+| Langues cibles     | Français ↔ Baoulé    |
+| Encodage           | UTF-8                |
+## Tons Supportés
+Le tokenizer gère tous les tons Baoulé selon la norme Unicode :
+| Caractère | Code Unicode | Exemple |
+|-----------|--------------|---------|
+| ɛ́         | U+025B U+0301| Mɔ́kɛ́    |
+| ɩ̄         | U+0269 U+0304| Ɩ̄tɩ̄     |
+| ɔ̀         | U+0254 U+0300| Kɔ̀lɔ̀    |
+| ɛ̂         | U+025B U+0302| Ɛ̂sɛ̂     |
+## Cas d'Usage Recommandés
+- Traduction automatique Français-Baoulé
+- Synthèse vocale pour systèmes d'assistance vocale
+- Reconnaissance de la parole Baoulé
+- Outils éducatifs numériques
+- Préservation du patrimoine linguistique
+## Meilleures Pratiques
+```python
+# Pour gérer les phrases longues
+tokenizer.model_max_length = 512
+# Ajout de tokens personnalisés
+new_tokens = ["<dialect:NDÊ>", "<dialect:SAFOUÈ>"]
+tokenizer.add_tokens(new_tokens)
+```
+## Jeu de Données d'Entraînement
+Données collectées grâce à :
+- Traductions de textes bibliques : Les données ont été extraites en grande partie depuis [Glosbe](https://www.glosbe.com/) et structurées manuellement pour assurer une qualité et une précision optimales. Le contenu a été nettoyé pour supprimer les balises HTML indésirables et formaté de manière cohérente.
+- Corpus oral transcrit (projet UNESCO)
+- Phrases quotidiennes annotées
+- Textes gouvernementaux bilingues
+**Taille du corpus** : 1500 phrases alignées (en cours d'expansion)
+## Citation
+Si vous utilisez ce tokenizer dans vos recherches, merci de citer :
+```bibtex
+@misc{BaouleTokenizer2023,
+  author = {Votre Nom},
+  title = {Baoulé Tokenizer for Low-Resource Machine Translation},
+  year = {2023},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/Adjoumani/BaouleTokenizer_V1}}
+}
+```
+## Licence
+Apache 2.0 - [Voir la licence complète](LICENSE)
+## Contribuer
+Nous encourageons les contributions notamment pour :
+- L'expansion du vocabulaire
+- L'annotation des tons
+- L'ajout de dialectes régionaux
+Contact : [contact@les-experts-en-solutions-digitales.com](mailto:contact@les-experts-en-solutions-digitales.com)
+---
+**Mots-clés SEO** : Tokenizer Baoulé, Traduction Français-Baoulé, NLP Africain, Langues Tonales, Côte d'Ivoire AI, Modèle Linguistique Basse Ressource, SentencePiece Baoulé, Préservation Langue Africaine
+```

added_tokens.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "<pad>": 206
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

spm.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9d8c538d19bc460d3b0730c26645f6b0ede1f462c5cbf9bc6cf5dc578f0049f1
+size 240063

spm.vocab ADDED Viewed

	@@ -0,0 +1,206 @@

+<unk>	0
+<s>	0
+</s>	0
+▁n	-0
+▁a	-1
+▁i	-2
+▁ɛ	-3
+▁u	-4
+▁e	-5
+▁l	-6
+▁’	-7
+▁k	-8
+▁m	-9
+▁s	-10
+▁b	-11
+▁ɔ	-12
+▁,	-13
+▁'	-14
+▁o	-15
+▁w	-16
+▁f	-17
+▁t	-18
+▁g	-19
+▁y	-20
+▁.	-21
+▁r	-22
+▁d	-23
+▁p	-24
+▁z	-25
+▁j	-26
+▁:	-27
+▁1	-28
+▁Z	-29
+▁c	-30
+▁?	-31
+▁S	-32
+▁A	-33
+▁N	-34
+▁K	-35
+▁v	-36
+▁á	-37
+▁2	-38
+▁Ɲ	-39
+▁M	-40
+▁ɲ	-41
+▁B	-42
+▁é	-43
+▁“	-44
+▁”	-45
+▁I	-46
+▁́	-47
+▁‘	-48
+▁3	-49
+▁Ɔ	-50
+▁4	-51
+▁0	-52
+▁)	-53
+▁E	-54
+▁5	-55
+▁-	-56
+▁(	-57
+▁L	-58
+▁—	-59
+▁Y	-60
+▁F	-61
+▁ó	-62
+▁í	-63
+▁ú	-64
+▁6	-65
+▁W	-66
+▁7	-67
+▁9	-68
+▁D	-69
+▁T	-70
+▁8	-71
+▁P	-72
+▁;	-73
+▁Ɛ	-74
+▁!	-75
+▁J	-76
+▁ń	-77
+▁G	-78
+▁R	-79
+▁U	-80
+▁[	-81
+▁]	-82
+▁C	-83
+▁O	-84
+▁h	-85
+▁•	-86
+▁É	-87
+▁▪	-88
+▁*	-89
+▁/	-90
+▁Ń	-91
+▁q	-92
+▁|	-93
+▁V	-94
+▁ḿ	-95
+▁–	-96
+▁è	-97
+▁Ḿ	-98
+▁H	-99
+▁Á	-100
+▁	-101
+n	-102
+a	-103
+i	-104
+ɛ	-105
+u	-106
+e	-107
+l	-108
+’	-109
+k	-110
+m	-111
+s	-112
+b	-113
+ɔ	-114
+,	-115
+'	-116
+o	-117
+w	-118
+f	-119
+t	-120
+g	-121
+y	-122
+.	-123
+r	-124
+d	-125
+p	-126
+z	-127
+j	-128
+:	-129
+1	-130
+Z	-131
+c	-132
+?	-133
+S	-134
+A	-135
+N	-136
+K	-137
+v	-138
+á	-139
+2	-140
+Ɲ	-141
+M	-142
+ɲ	-143
+B	-144
+é	-145
+“	-146
+”	-147
+I	-148
+́	-149
+‘	-150
+3	-151
+Ɔ	-152
+4	-153
+0	-154
+)	-155
+E	-156
+5	-157
+-	-158
+(	-159
+L	-160
+—	-161
+Y	-162
+F	-163
+ó	-164
+í	-165
+ú	-166
+6	-167
+W	-168
+7	-169
+9	-170
+D	-171
+T	-172
+8	-173
+P	-174
+;	-175
+Ɛ	-176
+!	-177
+J	-178
+ń	-179
+G	-180
+R	-181
+U	-182
+[	-183
+]	-184
+C	-185
+O	-186
+h	-187
+•	-188
+É	-189
+▪	-190
+*	-191
+/	-192
+Ń	-193
+q	-194
+|	-195
+V	-196
+ḿ	-197
+–	-198
+è	-199
+Ḿ	-200
+H	-201
+Á	-202

tokenizer.json ADDED Viewed

	@@ -0,0 +1,779 @@

+{
+  "version": "1.0",
+  "truncation": null,
+  "padding": null,
+  "added_tokens": [
+    {
+      "id": 0,
+      "content": "<unk>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 1,
+      "content": "<s>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 2,
+      "content": "</s>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 206,
+      "content": "<pad>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    }
+  ],
+  "normalizer": {
+    "type": "Sequence",
+    "normalizers": [
+      {
+        "type": "Replace",
+        "pattern": {
+          "String": " "
+        },
+        "content": "▁"
+      }
+    ]
+  },
+  "pre_tokenizer": null,
+  "post_processor": {
+    "type": "TemplateProcessing",
+    "single": [
+      {
+        "SpecialToken": {
+          "id": "<s>",
+          "type_id": 0
+        }
+      },
+      {
+        "Sequence": {
+          "id": "A",
+          "type_id": 0
+        }
+      },
+      {
+        "SpecialToken": {
+          "id": "</s>",
+          "type_id": 0
+        }
+      }
+    ],
+    "pair": [
+      {
+        "SpecialToken": {
+          "id": "<s>",
+          "type_id": 0
+        }
+      },
+      {
+        "Sequence": {
+          "id": "A",
+          "type_id": 0
+        }
+      },
+      {
+        "SpecialToken": {
+          "id": "</s>",
+          "type_id": 0
+        }
+      },
+      {
+        "SpecialToken": {
+          "id": "<s>",
+          "type_id": 1
+        }
+      },
+      {
+        "Sequence": {
+          "id": "B",
+          "type_id": 1
+        }
+      },
+      {
+        "SpecialToken": {
+          "id": "</s>",
+          "type_id": 1
+        }
+      }
+    ],
+    "special_tokens": {
+      "</s>": {
+        "id": "</s>",
+        "ids": [
+          2
+        ],
+        "tokens": [
+          "</s>"
+        ]
+      },
+      "<s>": {
+        "id": "<s>",
+        "ids": [
+          1
+        ],
+        "tokens": [
+          "<s>"
+        ]
+      }
+    }
+  },
+  "decoder": {
+    "type": "Sequence",
+    "decoders": [
+      {
+        "type": "Replace",
+        "pattern": {
+          "String": "▁"
+        },
+        "content": " "
+      },
+      {
+        "type": "ByteFallback"
+      },
+      {
+        "type": "Fuse"
+      }
+    ]
+  },
+  "model": {
+    "type": "BPE",
+    "dropout": null,
+    "unk_token": "<unk>",
+    "continuing_subword_prefix": null,
+    "end_of_word_suffix": null,
+    "fuse_unk": true,
+    "byte_fallback": true,
+    "ignore_merges": false,
+    "vocab": {
+      "<unk>": 0,
+      "<s>": 1,
+      "</s>": 2,
+      "▁n": 3,
+      "▁a": 4,
+      "▁i": 5,
+      "▁ɛ": 6,
+      "▁u": 7,
+      "▁e": 8,
+      "▁l": 9,
+      "▁’": 10,
+      "▁k": 11,
+      "▁m": 12,
+      "▁s": 13,
+      "▁b": 14,
+      "▁ɔ": 15,
+      "▁,": 16,
+      "▁'": 17,
+      "▁o": 18,
+      "▁w": 19,
+      "▁f": 20,
+      "▁t": 21,
+      "▁g": 22,
+      "▁y": 23,
+      "▁.": 24,
+      "▁r": 25,
+      "▁d": 26,
+      "▁p": 27,
+      "▁z": 28,
+      "▁j": 29,
+      "▁:": 30,
+      "▁1": 31,
+      "▁Z": 32,
+      "▁c": 33,
+      "▁?": 34,
+      "▁S": 35,
+      "▁A": 36,
+      "▁N": 37,
+      "▁K": 38,
+      "▁v": 39,
+      "▁á": 40,
+      "▁2": 41,
+      "▁Ɲ": 42,
+      "▁M": 43,
+      "▁ɲ": 44,
+      "▁B": 45,
+      "▁é": 46,
+      "▁“": 47,
+      "▁”": 48,
+      "▁I": 49,
+      "▁́": 50,
+      "▁‘": 51,
+      "▁3": 52,
+      "▁Ɔ": 53,
+      "▁4": 54,
+      "▁0": 55,
+      "▁)": 56,
+      "▁E": 57,
+      "▁5": 58,
+      "▁-": 59,
+      "▁(": 60,
+      "▁L": 61,
+      "▁—": 62,
+      "▁Y": 63,
+      "▁F": 64,
+      "▁ó": 65,
+      "▁í": 66,
+      "▁ú": 67,
+      "▁6": 68,
+      "▁W": 69,
+      "▁7": 70,
+      "▁9": 71,
+      "▁D": 72,
+      "▁T": 73,
+      "▁8": 74,
+      "▁P": 75,
+      "▁;": 76,
+      "▁Ɛ": 77,
+      "▁!": 78,
+      "▁J": 79,
+      "▁ń": 80,
+      "▁G": 81,
+      "▁R": 82,
+      "▁U": 83,
+      "▁[": 84,
+      "▁]": 85,
+      "▁C": 86,
+      "▁O": 87,
+      "▁h": 88,
+      "▁•": 89,
+      "▁É": 90,
+      "▁▪": 91,
+      "▁*": 92,
+      "▁/": 93,
+      "▁Ń": 94,
+      "▁q": 95,
+      "▁|": 96,
+      "▁V": 97,
+      "▁ḿ": 98,
+      "▁–": 99,
+      "▁è": 100,
+      "▁Ḿ": 101,
+      "▁H": 102,
+      "▁Á": 103,
+      "▁": 104,
+      "n": 105,
+      "a": 106,
+      "i": 107,
+      "ɛ": 108,
+      "u": 109,
+      "e": 110,
+      "l": 111,
+      "’": 112,
+      "k": 113,
+      "m": 114,
+      "s": 115,
+      "b": 116,
+      "ɔ": 117,
+      ",": 118,
+      "'": 119,
+      "o": 120,
+      "w": 121,
+      "f": 122,
+      "t": 123,
+      "g": 124,
+      "y": 125,
+      ".": 126,
+      "r": 127,
+      "d": 128,
+      "p": 129,
+      "z": 130,
+      "j": 131,
+      ":": 132,
+      "1": 133,
+      "Z": 134,
+      "c": 135,
+      "?": 136,
+      "S": 137,
+      "A": 138,
+      "N": 139,
+      "K": 140,
+      "v": 141,
+      "á": 142,
+      "2": 143,
+      "Ɲ": 144,
+      "M": 145,
+      "ɲ": 146,
+      "B": 147,
+      "é": 148,
+      "“": 149,
+      "”": 150,
+      "I": 151,
+      "́": 152,
+      "‘": 153,
+      "3": 154,
+      "Ɔ": 155,
+      "4": 156,
+      "0": 157,
+      ")": 158,
+      "E": 159,
+      "5": 160,
+      "-": 161,
+      "(": 162,
+      "L": 163,
+      "—": 164,
+      "Y": 165,
+      "F": 166,
+      "ó": 167,
+      "í": 168,
+      "ú": 169,
+      "6": 170,
+      "W": 171,
+      "7": 172,
+      "9": 173,
+      "D": 174,
+      "T": 175,
+      "8": 176,
+      "P": 177,
+      ";": 178,
+      "Ɛ": 179,
+      "!": 180,
+      "J": 181,
+      "ń": 182,
+      "G": 183,
+      "R": 184,
+      "U": 185,
+      "[": 186,
+      "]": 187,
+      "C": 188,
+      "O": 189,
+      "h": 190,
+      "•": 191,
+      "É": 192,
+      "▪": 193,
+      "*": 194,
+      "/": 195,
+      "Ń": 196,
+      "q": 197,
+      "|": 198,
+      "V": 199,
+      "ḿ": 200,
+      "–": 201,
+      "è": 202,
+      "Ḿ": 203,
+      "H": 204,
+      "Á": 205
+    },
+    "merges": [
+      [
+        "▁",
+        "n"
+      ],
+      [
+        "▁",
+        "a"
+      ],
+      [
+        "▁",
+        "i"
+      ],
+      [
+        "▁",
+        "ɛ"
+      ],
+      [
+        "▁",
+        "u"
+      ],
+      [
+        "▁",
+        "e"
+      ],
+      [
+        "▁",
+        "l"
+      ],
+      [
+        "▁",
+        "’"
+      ],
+      [
+        "▁",
+        "k"
+      ],
+      [
+        "▁",
+        "m"
+      ],
+      [
+        "▁",
+        "s"
+      ],
+      [
+        "▁",
+        "b"
+      ],
+      [
+        "▁",
+        "ɔ"
+      ],
+      [
+        "▁",
+        ","
+      ],
+      [
+        "▁",
+        "'"
+      ],
+      [
+        "▁",
+        "o"
+      ],
+      [
+        "▁",
+        "w"
+      ],
+      [
+        "▁",
+        "f"
+      ],
+      [
+        "▁",
+        "t"
+      ],
+      [
+        "▁",
+        "g"
+      ],
+      [
+        "▁",
+        "y"
+      ],
+      [
+        "▁",
+        "."
+      ],
+      [
+        "▁",
+        "r"
+      ],
+      [
+        "▁",
+        "d"
+      ],
+      [
+        "▁",
+        "p"
+      ],
+      [
+        "▁",
+        "z"
+      ],
+      [
+        "▁",
+        "j"
+      ],
+      [
+        "▁",
+        ":"
+      ],
+      [
+        "▁",
+        "1"
+      ],
+      [
+        "▁",
+        "Z"
+      ],
+      [
+        "▁",
+        "c"
+      ],
+      [
+        "▁",
+        "?"
+      ],
+      [
+        "▁",
+        "S"
+      ],
+      [
+        "▁",
+        "A"
+      ],
+      [
+        "▁",
+        "N"
+      ],
+      [
+        "▁",
+        "K"
+      ],
+      [
+        "▁",
+        "v"
+      ],
+      [
+        "▁",
+        "á"
+      ],
+      [
+        "▁",
+        "2"
+      ],
+      [
+        "▁",
+        "Ɲ"
+      ],
+      [
+        "▁",
+        "M"
+      ],
+      [
+        "▁",
+        "ɲ"
+      ],
+      [
+        "▁",
+        "B"
+      ],
+      [
+        "▁",
+        "é"
+      ],
+      [
+        "▁",
+        "“"
+      ],
+      [
+        "▁",
+        "”"
+      ],
+      [
+        "▁",
+        "I"
+      ],
+      [
+        "▁",
+        "́"
+      ],
+      [
+        "▁",
+        "‘"
+      ],
+      [
+        "▁",
+        "3"
+      ],
+      [
+        "▁",
+        "Ɔ"
+      ],
+      [
+        "▁",
+        "4"
+      ],
+      [
+        "▁",
+        "0"
+      ],
+      [
+        "▁",
+        ")"
+      ],
+      [
+        "▁",
+        "E"
+      ],
+      [
+        "▁",
+        "5"
+      ],
+      [
+        "▁",
+        "-"
+      ],
+      [
+        "▁",
+        "("
+      ],
+      [
+        "▁",
+        "L"
+      ],
+      [
+        "▁",
+        "—"
+      ],
+      [
+        "▁",
+        "Y"
+      ],
+      [
+        "▁",
+        "F"
+      ],
+      [
+        "▁",
+        "ó"
+      ],
+      [
+        "▁",
+        "í"
+      ],
+      [
+        "▁",
+        "ú"
+      ],
+      [
+        "▁",
+        "6"
+      ],
+      [
+        "▁",
+        "W"
+      ],
+      [
+        "▁",
+        "7"
+      ],
+      [
+        "▁",
+        "9"
+      ],
+      [
+        "▁",
+        "D"
+      ],
+      [
+        "▁",
+        "T"
+      ],
+      [
+        "▁",
+        "8"
+      ],
+      [
+        "▁",
+        "P"
+      ],
+      [
+        "▁",
+        ";"
+      ],
+      [
+        "▁",
+        "Ɛ"
+      ],
+      [
+        "▁",
+        "!"
+      ],
+      [
+        "▁",
+        "J"
+      ],
+      [
+        "▁",
+        "ń"
+      ],
+      [
+        "▁",
+        "G"
+      ],
+      [
+        "▁",
+        "R"
+      ],
+      [
+        "▁",
+        "U"
+      ],
+      [
+        "▁",
+        "["
+      ],
+      [
+        "▁",
+        "]"
+      ],
+      [
+        "▁",
+        "C"
+      ],
+      [
+        "▁",
+        "O"
+      ],
+      [
+        "▁",
+        "h"
+      ],
+      [
+        "▁",
+        "•"
+      ],
+      [
+        "▁",
+        "É"
+      ],
+      [
+        "▁",
+        "▪"
+      ],
+      [
+        "▁",
+        "*"
+      ],
+      [
+        "▁",
+        "/"
+      ],
+      [
+        "▁",
+        "Ń"
+      ],
+      [
+        "▁",
+        "q"
+      ],
+      [
+        "▁",
+        "|"
+      ],
+      [
+        "▁",
+        "V"
+      ],
+      [
+        "▁",
+        "ḿ"
+      ],
+      [
+        "▁",
+        "–"
+      ],
+      [
+        "▁",
+        "è"
+      ],
+      [
+        "▁",
+        "Ḿ"
+      ],
+      [
+        "▁",
+        "H"
+      ],
+      [
+        "▁",
+        "Á"
+      ]
+    ]
+  }
+}

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9d8c538d19bc460d3b0730c26645f6b0ede1f462c5cbf9bc6cf5dc578f0049f1
+size 240063

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,59 @@

+{
+    "add_bos_token": true,
+    "add_eos_token": true,
+    "add_prefix_space": null,
+    "added_tokens_decoder": {
+        "0": {
+            "content": "<unk>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "1": {
+            "content": "<s>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "2": {
+            "content": "</s>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        },
+        "206": {
+            "content": "<pad>",
+            "lstrip": false,
+            "normalized": false,
+            "rstrip": false,
+            "single_word": false,
+            "special": true
+        }
+    },
+    "bos_token": "<s>",
+    "clean_up_tokenization_spaces": false,
+    "eos_token": "</s>",
+    "extra_special_tokens": {},
+    "legacy": true,
+    "model_max_length": 1000000000000000019884624838656,
+    "pad_token": "<pad>",
+    "sp_model_kwargs": {},
+    "spaces_between_special_tokens": false,
+    "tokenizer_class": "LlamaTokenizer",
+    "unk_token": "<unk>",
+    "use_default_system_prompt": false,
+    "special_tokens_map_file": "special_tokens_map.json",
+    "description": "Tokenizer Baoulé pour traduction Français-Baoulé",
+    "language": [
+        "baq",
+        "Baoule"
+    ],
+    "license": "Apache-2.0",
+    "do_lower_case": false
+}