ShiWarai commited on
Commit
f7bb0cd
·
verified ·
1 Parent(s): 30ccfd4

Auto-upload: модель обучена через CI/CD

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
2_Dense/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "in_features": 768,
3
+ "out_features": 3072,
4
+ "bias": false,
5
+ "activation_function": "torch.nn.modules.linear.Identity"
6
+ }
2_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ffbd357e603d1635ea0ffb287ac4d6b640e8410cef9392bbce696809229b2520
3
+ size 9437272
3_Dense/config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "in_features": 3072,
3
+ "out_features": 768,
4
+ "bias": false,
5
+ "activation_function": "torch.nn.modules.linear.Identity"
6
+ }
3_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:144adf0dcbf3552f465aa5d893d42b0d81eac5c9e2d1dd8f4a8b0b354cbf88fb
3
+ size 9437272
README.md CHANGED
@@ -1,132 +1,226 @@
1
- ---
2
- library_name: setfit
3
- tags:
4
- - setfit
5
- - sentence-transformers
6
- - classification
7
- - russian
8
- - voice-commands
9
- - panda
10
- - commands
11
- language: ru
12
- license: mit
13
- pipeline_tag: text-classification
14
- ---
15
- # CVC-Panda
16
-
17
- Модель для классификации голосовых команд на русском языке для управления пандой (CVC-Panda).
18
-
19
- ## Описание
20
-
21
- Эта модель представляет собой SetFit классификатор, обученный для распознавания и классификации голосовых команд на русском языке. Модель использует few-shot learning подход через SetFit, что позволяет эффективно обучаться на небольшом количестве примеров.
22
-
23
- ## Использование
24
-
25
- ### Базовое использование
26
-
27
- ```python
28
- from setfit import SetFitModel
29
-
30
- # Загрузка модели
31
- model = SetFitModel.from_pretrained("your-username/cvc-panda-commands", token="your_token")
32
-
33
- # Классификация команд
34
- commands = ["равняйся", "отставить", "налево", "направо"]
35
- predictions = model(commands)
36
-
37
- # Получение вероятностей
38
- probs = model.predict_proba(commands)
39
- ```
40
-
41
- ### Использование с токеном
42
-
43
- Если модель приватная или требует аутентификации:
44
-
45
- ```python
46
- from setfit import SetFitModel
47
- from huggingface_hub import login
48
-
49
- # Авторизация
50
- login(token="your_hf_token")
51
-
52
- # Загрузка модели
53
- model = SetFitModel.from_pretrained("your-username/cvc-panda-commands")
54
- ```
55
-
56
- ### Пакетная обработка
57
-
58
- ```python
59
- from setfit import SetFitModel
60
-
61
- model = SetFitModel.from_pretrained("your-username/cvc-panda-commands", token="your_token")
62
-
63
- # Обработка списка команд
64
- commands = [
65
- "равняйся",
66
- "отставить",
67
- "налево",
68
- "направо",
69
- "шагом марш"
70
- ]
71
-
72
- predictions = model(commands)
73
- print(predictions)
74
- ```
75
-
76
- ## Архитектура модели
77
-
78
- Модель основана на SetFit и использует архитектуру sentence-transformers:
79
-
80
- - **Базовый энкодер**: Предобученная модель для русского языка
81
- - **Классификатор**: Обученная голова классификатора (SetFit head)
82
- - **Формат**: SetFit/sentence-transformers совместимый формат
83
-
84
- ## Обучение
85
-
86
- Модель обучена на датасете голосовых команд с использованием SetFit (few-shot learning). Обучение происходит автоматически через CI/CD пайплайн, после чего модель загружается на Hugging Face Hub.
87
-
88
- ### Параметры обучения
89
-
90
- - **Метод**: SetFit (few-shot learning)
91
- - **Язык**: Русский (ru)
92
- - **Тип задачи**: Многоклассовая классификация
93
-
94
- ## Структура модели
95
-
96
- Модель сохраняется в формате sentence-transformers и содержит:
97
-
98
- - `config.json` - Основная конфигурация модели
99
- - `config_setfit.json` - Конфигурация SetFit
100
- - `config_sentence_transformers.json` - Конфигурация sentence-transformers
101
- - `model_head.pkl` - Обученная голова классификатора
102
- - `model.safetensors` - Веса модели
103
- - `tokenizer.json` - Токенизатор
104
- - Модули: `1_Pooling/`, `2_Dense/`, `3_Dense/`, `4_Normalize/`
105
-
106
- ## Версионирование
107
-
108
- Модель автоматически версионируется через Git при каждой загрузке через CI/CD. Вы можете:
109
-
110
- - Просматривать историю версий на вкладке "Files and versions"
111
- - Использовать конкретную версию через commit hash
112
- - Тегировать стабильные версии для production
113
-
114
- ## Требования
115
-
116
- ```python
117
- setfit>=0.7.0
118
- sentence-transformers>=2.2.0
119
- transformers>=4.21.0
120
- ```
121
-
122
- ## Лицензия
123
-
124
- MIT License
125
-
126
- ## Автор
127
-
128
- ShiWarai
129
-
130
- ---
131
-
132
- **Примечание**: Для приватных моделей убедитесь, что токен `HF_TOKEN` установлен на production-сервере.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - setfit
4
+ - sentence-transformers
5
+ - text-classification
6
+ - generated_from_setfit_trainer
7
+ widget:
8
+ - text: робот не любит бегать
9
+ - text: надо равняться
10
+ - text: панда ложись
11
+ - text: ну остановись
12
+ - text: беги бы
13
+ metrics:
14
+ - accuracy
15
+ pipeline_tag: text-classification
16
+ library_name: setfit
17
+ inference: true
18
+ base_model: google/embeddinggemma-300M
19
+ model-index:
20
+ - name: SetFit with google/embeddinggemma-300M
21
+ results:
22
+ - task:
23
+ type: text-classification
24
+ name: Text Classification
25
+ dataset:
26
+ name: Unknown
27
+ type: unknown
28
+ split: test
29
+ metrics:
30
+ - type: accuracy
31
+ value: 0.890282131661442
32
+ name: Accuracy
33
+ ---
34
+
35
+ # SetFit with google/embeddinggemma-300M
36
+
37
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [google/embeddinggemma-300M](https://huggingface.co/google/embeddinggemma-300M) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
38
+
39
+ The model has been trained using an efficient few-shot learning technique that involves:
40
+
41
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
42
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
43
+
44
+ ## Model Details
45
+
46
+ ### Model Description
47
+ - **Model Type:** SetFit
48
+ - **Sentence Transformer body:** [google/embeddinggemma-300M](https://huggingface.co/google/embeddinggemma-300M)
49
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
50
+ - **Maximum Sequence Length:** 2048 tokens
51
+ - **Number of Classes:** 13 classes
52
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
53
+ <!-- - **Language:** Unknown -->
54
+ <!-- - **License:** Unknown -->
55
+
56
+ ### Model Sources
57
+
58
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
59
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
60
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
61
+
62
+ ### Model Labels
63
+ | Label | Examples |
64
+ |:-------------------|:------------------------------------------------------------------------------------------------------------------------------|
65
+ | unknown | <ul><li>'чей робот'</li><li>'опять лежать'</li><li>'смотри панда'</li></ul> |
66
+ | stand_at_attention | <ul><li>'пора выравняться'</li><li>'не хочешь равняться'</li><li>'выравнялся бы'</li></ul> |
67
+ | dismiss | <ul><li>'давай поднимись'</li><li>'пора подняться'</li><li>'эй встань'</li></ul> |
68
+ | silence | <ul><li>'перестать говорить'</li><li>'перестаньте болтать'</li><li>'замолкать'</li></ul> |
69
+ | rotate | <ul><li>'переворачиваться'</li><li>'кувыркнись'</li><li>'ты кувыркайся'</li></ul> |
70
+ | give_paw | <ul><li>'ну лапу дай'</li><li>'хотела бы чтобы панда дала лапу'</li><li>'панда давай лапу'</li></ul> |
71
+ | stop_running | <ul><li>'а ну остановись'</li><li>'смирно'</li><li>'попроси робота остановиться'</li></ul> |
72
+ | reconnect_joystick | <ul><li>'не хочешь подключить джойстик'</li><li>'подключись к джойстику сейчас'</li><li>'можно подключить джойстик'</li></ul> |
73
+ | bind | <ul><li>'привязывать робота'</li><li>'панда привяжи панду'</li><li>'привяжите робота'</li></ul> |
74
+ | unbind | <ul><li>'панда отвяжи панду'</li><li>'отвяжешь панду'</li><li>'отвяжи панду панда'</li></ul> |
75
+ | lie_down | <ul><li>'быстро ложись'</li><li>'упасть'</li><li>'полежи'</li></ul> |
76
+ | run | <ul><li>'надо бежать'</li><li>'побеги'</li><li>'хотела бы чтобы панда бежала'</li></ul> |
77
+ | help | <ul><li>'надо помочь'</li><li>'помог бы'</li><li>'команды'</li></ul> |
78
+
79
+ ## Evaluation
80
+
81
+ ### Metrics
82
+ | Label | Accuracy |
83
+ |:--------|:---------|
84
+ | **all** | 0.8903 |
85
+
86
+ ## Uses
87
+
88
+ ### Direct Use for Inference
89
+
90
+ First install the SetFit library:
91
+
92
+ ```bash
93
+ pip install setfit
94
+ ```
95
+
96
+ Then you can load this model and run inference.
97
+
98
+ ```python
99
+ from setfit import SetFitModel
100
+
101
+ # Download from the 🤗 Hub
102
+ model = SetFitModel.from_pretrained("tmplxrs3m7t/panda_commands")
103
+ # Run inference
104
+ preds = model("беги бы")
105
+ ```
106
+
107
+ <!--
108
+ ### Downstream Use
109
+
110
+ *List how someone could finetune this model on their own dataset.*
111
+ -->
112
+
113
+ <!--
114
+ ### Out-of-Scope Use
115
+
116
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
117
+ -->
118
+
119
+ <!--
120
+ ## Bias, Risks and Limitations
121
+
122
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
123
+ -->
124
+
125
+ <!--
126
+ ### Recommendations
127
+
128
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
129
+ -->
130
+
131
+ ## Training Details
132
+
133
+ ### Training Set Metrics
134
+ | Training set | Min | Median | Max |
135
+ |:-------------|:----|:-------|:----|
136
+ | Word count | 1 | 2.3697 | 7 |
137
+
138
+ | Label | Training Sample Count |
139
+ |:-------------------|:----------------------|
140
+ | bind | 44 |
141
+ | dismiss | 128 |
142
+ | give_paw | 83 |
143
+ | help | 18 |
144
+ | lie_down | 88 |
145
+ | reconnect_joystick | 108 |
146
+ | rotate | 109 |
147
+ | run | 85 |
148
+ | silence | 22 |
149
+ | stand_at_attention | 70 |
150
+ | stop_running | 108 |
151
+ | unbind | 30 |
152
+ | unknown | 381 |
153
+
154
+ ### Training Hyperparameters
155
+ - batch_size: (128, 128)
156
+ - num_epochs: (1, 1)
157
+ - max_steps: -1
158
+ - sampling_strategy: oversampling
159
+ - num_iterations: 20
160
+ - body_learning_rate: (2e-05, 2e-05)
161
+ - head_learning_rate: 2e-05
162
+ - loss: CosineSimilarityLoss
163
+ - distance_metric: cosine_distance
164
+ - margin: 0.25
165
+ - end_to_end: False
166
+ - use_amp: False
167
+ - warmup_proportion: 0.1
168
+ - l2_weight: 0.01
169
+ - seed: 42
170
+ - eval_max_steps: -1
171
+ - load_best_model_at_end: False
172
+
173
+ ### Training Results
174
+ | Epoch | Step | Training Loss | Validation Loss |
175
+ |:------:|:----:|:-------------:|:---------------:|
176
+ | 0.0025 | 1 | 0.2328 | - |
177
+ | 0.1253 | 50 | 0.0955 | - |
178
+ | 0.2506 | 100 | 0.0228 | - |
179
+ | 0.3759 | 150 | 0.0098 | - |
180
+ | 0.5013 | 200 | 0.0044 | - |
181
+ | 0.6266 | 250 | 0.0031 | - |
182
+ | 0.7519 | 300 | 0.0031 | - |
183
+ | 0.8772 | 350 | 0.0022 | - |
184
+
185
+ ### Framework Versions
186
+ - Python: 3.11.14
187
+ - SetFit: 1.1.3
188
+ - Sentence Transformers: 5.2.1
189
+ - Transformers: 4.57.6
190
+ - PyTorch: 2.9.1+cu128
191
+ - Datasets: 4.5.0
192
+ - Tokenizers: 0.22.2
193
+
194
+ ## Citation
195
+
196
+ ### BibTeX
197
+ ```bibtex
198
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
199
+ doi = {10.48550/ARXIV.2209.11055},
200
+ url = {https://arxiv.org/abs/2209.11055},
201
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
202
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
203
+ title = {Efficient Few-Shot Learning Without Prompts},
204
+ publisher = {arXiv},
205
+ year = {2022},
206
+ copyright = {Creative Commons Attribution 4.0 International}
207
+ }
208
+ ```
209
+
210
+ <!--
211
+ ## Glossary
212
+
213
+ *Clearly define terms in order to be accessible across audiences.*
214
+ -->
215
+
216
+ <!--
217
+ ## Model Card Authors
218
+
219
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
220
+ -->
221
+
222
+ <!--
223
+ ## Model Card Contact
224
+
225
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
226
+ -->
config.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_sliding_window_pattern": 6,
3
+ "architectures": [
4
+ "Gemma3TextModel"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "attn_logit_softcapping": null,
9
+ "bos_token_id": 2,
10
+ "dtype": "float32",
11
+ "eos_token_id": 1,
12
+ "final_logit_softcapping": null,
13
+ "head_dim": 256,
14
+ "hidden_activation": "gelu_pytorch_tanh",
15
+ "hidden_size": 768,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 1152,
18
+ "layer_types": [
19
+ "sliding_attention",
20
+ "sliding_attention",
21
+ "sliding_attention",
22
+ "sliding_attention",
23
+ "sliding_attention",
24
+ "full_attention",
25
+ "sliding_attention",
26
+ "sliding_attention",
27
+ "sliding_attention",
28
+ "sliding_attention",
29
+ "sliding_attention",
30
+ "full_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "sliding_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "full_attention",
37
+ "sliding_attention",
38
+ "sliding_attention",
39
+ "sliding_attention",
40
+ "sliding_attention",
41
+ "sliding_attention",
42
+ "full_attention"
43
+ ],
44
+ "max_position_embeddings": 2048,
45
+ "model_type": "gemma3_text",
46
+ "num_attention_heads": 3,
47
+ "num_hidden_layers": 24,
48
+ "num_key_value_heads": 1,
49
+ "pad_token_id": 0,
50
+ "query_pre_attn_scalar": 256,
51
+ "rms_norm_eps": 1e-06,
52
+ "rope_local_base_freq": 10000.0,
53
+ "rope_scaling": null,
54
+ "rope_theta": 1000000.0,
55
+ "sliding_window": 257,
56
+ "transformers_version": "4.57.6",
57
+ "use_bidirectional_attention": true,
58
+ "use_cache": true,
59
+ "vocab_size": 262144
60
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.2.1",
5
+ "transformers": "4.57.6",
6
+ "pytorch": "2.9.1+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "task: search result | query: ",
10
+ "document": "title: none | text: ",
11
+ "BitextMining": "task: search result | query: ",
12
+ "Clustering": "task: clustering | query: ",
13
+ "Classification": "task: classification | query: ",
14
+ "InstructionRetrieval": "task: code retrieval | query: ",
15
+ "MultilabelClassification": "task: classification | query: ",
16
+ "PairClassification": "task: sentence similarity | query: ",
17
+ "Reranking": "task: search result | query: ",
18
+ "Retrieval": "task: search result | query: ",
19
+ "Retrieval-query": "task: search result | query: ",
20
+ "Retrieval-document": "title: none | text: ",
21
+ "STS": "task: sentence similarity | query: ",
22
+ "Summarization": "task: summarization | query: "
23
+ },
24
+ "default_prompt_name": null,
25
+ "similarity_fn_name": "cosine"
26
+ }
config_setfit.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels": [
3
+ "bind",
4
+ "dismiss",
5
+ "give_paw",
6
+ "help",
7
+ "lie_down",
8
+ "reconnect_joystick",
9
+ "rotate",
10
+ "run",
11
+ "silence",
12
+ "stand_at_attention",
13
+ "stop_running",
14
+ "unbind",
15
+ "unknown"
16
+ ],
17
+ "normalize_embeddings": false
18
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a48b877b051dfd0d0f3db30d4902e9f7b400ad1f4baac77157f8af3d385b41d
3
+ size 1211486072
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c7c402de572d9f0275e22adf98c74ad367409a904d425b229ea42b7d01888d0
3
+ size 81743
modules.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Dense",
18
+ "type": "sentence_transformers.models.Dense"
19
+ },
20
+ {
21
+ "idx": 3,
22
+ "name": "3",
23
+ "path": "3_Dense",
24
+ "type": "sentence_transformers.models.Dense"
25
+ },
26
+ {
27
+ "idx": 4,
28
+ "name": "4",
29
+ "path": "4_Normalize",
30
+ "type": "sentence_transformers.models.Normalize"
31
+ }
32
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 2048,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "boi_token": "<start_of_image>",
3
+ "bos_token": {
4
+ "content": "<bos>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ "eoi_token": "<end_of_image>",
11
+ "eos_token": {
12
+ "content": "<eos>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "image_token": "<image_soft_token>",
19
+ "pad_token": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ },
26
+ "unk_token": {
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:216e2a79606fe879c9f17c529c71cd241338407fd5646b595ffd3c4b9ea1d503
3
+ size 33385262
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff