MossaabDev commited on
Commit
fe0650d
·
verified ·
1 Parent(s): 1ecbe0e

Upload 12 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ app/my_finetuned_modelV2/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ app/my_finetuned_modelV2/unigram.json filter=lfs diff=lfs merge=lfs -text
app/my_finetuned_modelV2/1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
app/my_finetuned_modelV2/README.md ADDED
@@ -0,0 +1,358 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:193
9
+ - loss:CosineSimilarityLoss
10
+ base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
11
+ widget:
12
+ - source_sentence: I saw someone killing a cat in the street, I felt helpless and
13
+ sad
14
+ sentences:
15
+ - There is no god ?worthy of worship? except You. Glory be to You! I have certainly
16
+ done wrong.
17
+ - 'who say, when struck by a disaster, Surely to Allah we belong and to Him we
18
+ will ?all? return. '
19
+ - And never think that Allah is unaware of what the wrongdoers do. He only delays
20
+ them for a Day when eyes will stare [in horror]
21
+ - source_sentence: I am really sad, I hate my life and I wanna suicide
22
+ sentences:
23
+ - And never think that Allah is unaware of what the wrongdoers do. He only delays
24
+ them for a Day when eyes will stare [in horror]
25
+ - And when the ignorant address them, they say words of peace
26
+ - And seek help through patience and prayer. Indeed, it is a burden except for the
27
+ humble
28
+ - source_sentence: 'my cousin just died '
29
+ sentences:
30
+ - 'who say, when struck by a disaster, Surely to Allah we belong and to Him we
31
+ will ?all? return. '
32
+ - Again, no! Never obey him ?O Prophet?! Rather, ?continue to? prostrate and draw
33
+ near ?to Allah?.
34
+ - Do not do a favour expecting more ?in return?.
35
+ - source_sentence: tell me about peace
36
+ sentences:
37
+ - O mankind, eat from whatever is on earth [that is] lawful and good and do not
38
+ follow the footsteps of Satan. Indeed, he is to you a clear enemy
39
+ - And when the ignorant address them, they say words of peace
40
+ - And if you divorce them before consummating the marriage but after deciding on
41
+ a dowry, pay half of the dowry, unless the wife graciously waives it or the husband
42
+ graciously pays in full. Graciousness is closer to righteousness. And do not forget
43
+ kindness among yourselves. Surely Allah is All-Seeing of what you do.
44
+ - source_sentence: I lost my friend, he died and I miss him
45
+ sentences:
46
+ - Not equal are the good deed and the bad deed. Repel [evil] by that [deed] which
47
+ is better; and thereupon the one whom between you and him is enmity [will become]
48
+ as though he was a devoted friend
49
+ - Every soul will taste death. And you will only receive your full reward on the
50
+ Day of Judgment. Whoever is spared from the Fire and is admitted into Paradise
51
+ will ?indeed? triumph, whereas the life of this world is no more than the delusion
52
+ of enjoyment.
53
+ - Every soul will taste death, then to Us you will ?all? be returned.
54
+ pipeline_tag: sentence-similarity
55
+ library_name: sentence-transformers
56
+ ---
57
+
58
+ # SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
59
+
60
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
61
+
62
+ ## Model Details
63
+
64
+ ### Model Description
65
+ - **Model Type:** Sentence Transformer
66
+ - **Base model:** [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) <!-- at revision 86741b4e3f5cb7765a600d3a3d55a0f6a6cb443d -->
67
+ - **Maximum Sequence Length:** 128 tokens
68
+ - **Output Dimensionality:** 384 dimensions
69
+ - **Similarity Function:** Cosine Similarity
70
+ <!-- - **Training Dataset:** Unknown -->
71
+ <!-- - **Language:** Unknown -->
72
+ <!-- - **License:** Unknown -->
73
+
74
+ ### Model Sources
75
+
76
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
77
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
78
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
79
+
80
+ ### Full Model Architecture
81
+
82
+ ```
83
+ SentenceTransformer(
84
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
85
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
86
+ )
87
+ ```
88
+
89
+ ## Usage
90
+
91
+ ### Direct Usage (Sentence Transformers)
92
+
93
+ First install the Sentence Transformers library:
94
+
95
+ ```bash
96
+ pip install -U sentence-transformers
97
+ ```
98
+
99
+ Then you can load this model and run inference.
100
+ ```python
101
+ from sentence_transformers import SentenceTransformer
102
+
103
+ # Download from the 🤗 Hub
104
+ model = SentenceTransformer("sentence_transformers_model_id")
105
+ # Run inference
106
+ sentences = [
107
+ 'I lost my friend, he died and I miss him',
108
+ 'Every soul will taste death. And you will only receive your full reward on the Day of Judgment. Whoever is spared from the Fire and is admitted into Paradise will ?indeed? triumph, whereas the life of this world is no more than the delusion of enjoyment.',
109
+ 'Every soul will taste death, then to Us you will ?all? be returned.',
110
+ ]
111
+ embeddings = model.encode(sentences)
112
+ print(embeddings.shape)
113
+ # [3, 384]
114
+
115
+ # Get the similarity scores for the embeddings
116
+ similarities = model.similarity(embeddings, embeddings)
117
+ print(similarities)
118
+ # tensor([[1.0000, 0.9072, 0.9224],
119
+ # [0.9072, 1.0000, 0.9847],
120
+ # [0.9224, 0.9847, 1.0000]])
121
+ ```
122
+
123
+ <!--
124
+ ### Direct Usage (Transformers)
125
+
126
+ <details><summary>Click to see the direct usage in Transformers</summary>
127
+
128
+ </details>
129
+ -->
130
+
131
+ <!--
132
+ ### Downstream Usage (Sentence Transformers)
133
+
134
+ You can finetune this model on your own dataset.
135
+
136
+ <details><summary>Click to expand</summary>
137
+
138
+ </details>
139
+ -->
140
+
141
+ <!--
142
+ ### Out-of-Scope Use
143
+
144
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
145
+ -->
146
+
147
+ <!--
148
+ ## Bias, Risks and Limitations
149
+
150
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
151
+ -->
152
+
153
+ <!--
154
+ ### Recommendations
155
+
156
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
157
+ -->
158
+
159
+ ## Training Details
160
+
161
+ ### Training Dataset
162
+
163
+ #### Unnamed Dataset
164
+
165
+ * Size: 193 training samples
166
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
167
+ * Approximate statistics based on the first 193 samples:
168
+ | | sentence_0 | sentence_1 | label |
169
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:--------------------------------------------------------------|
170
+ | type | string | string | float |
171
+ | details | <ul><li>min: 5 tokens</li><li>mean: 12.27 tokens</li><li>max: 34 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 39.33 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.9</li><li>max: 1.0</li></ul> |
172
+ * Samples:
173
+ | sentence_0 | sentence_1 | label |
174
+ |:-------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
175
+ | <code>I am afraid that my son is not in the right way</code> | <code>And those who say: Our Lord! Grant us comfort in our spouses and our offspring, and make us leaders of the righteous</code> | <code>1.0</code> |
176
+ | <code>my cat just died</code> | <code>And We will surely test you with something of fear and hunger and a loss of wealth and lives and fruits, but give good tidings to the patient</code> | <code>1.0</code> |
177
+ | <code>I do not have childre</code> | <code>And those who say: Our Lord! Grant us comfort in our spouses and our offspring, and make us leaders of the righteous</code> | <code>1.0</code> |
178
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
179
+ ```json
180
+ {
181
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
182
+ }
183
+ ```
184
+
185
+ ### Training Hyperparameters
186
+ #### Non-Default Hyperparameters
187
+
188
+ - `num_train_epochs`: 10
189
+ - `multi_dataset_batch_sampler`: round_robin
190
+
191
+ #### All Hyperparameters
192
+ <details><summary>Click to expand</summary>
193
+
194
+ - `overwrite_output_dir`: False
195
+ - `do_predict`: False
196
+ - `eval_strategy`: no
197
+ - `prediction_loss_only`: True
198
+ - `per_device_train_batch_size`: 8
199
+ - `per_device_eval_batch_size`: 8
200
+ - `per_gpu_train_batch_size`: None
201
+ - `per_gpu_eval_batch_size`: None
202
+ - `gradient_accumulation_steps`: 1
203
+ - `eval_accumulation_steps`: None
204
+ - `torch_empty_cache_steps`: None
205
+ - `learning_rate`: 5e-05
206
+ - `weight_decay`: 0.0
207
+ - `adam_beta1`: 0.9
208
+ - `adam_beta2`: 0.999
209
+ - `adam_epsilon`: 1e-08
210
+ - `max_grad_norm`: 1
211
+ - `num_train_epochs`: 10
212
+ - `max_steps`: -1
213
+ - `lr_scheduler_type`: linear
214
+ - `lr_scheduler_kwargs`: {}
215
+ - `warmup_ratio`: 0.0
216
+ - `warmup_steps`: 0
217
+ - `log_level`: passive
218
+ - `log_level_replica`: warning
219
+ - `log_on_each_node`: True
220
+ - `logging_nan_inf_filter`: True
221
+ - `save_safetensors`: True
222
+ - `save_on_each_node`: False
223
+ - `save_only_model`: False
224
+ - `restore_callback_states_from_checkpoint`: False
225
+ - `no_cuda`: False
226
+ - `use_cpu`: False
227
+ - `use_mps_device`: False
228
+ - `seed`: 42
229
+ - `data_seed`: None
230
+ - `jit_mode_eval`: False
231
+ - `bf16`: False
232
+ - `fp16`: False
233
+ - `fp16_opt_level`: O1
234
+ - `half_precision_backend`: auto
235
+ - `bf16_full_eval`: False
236
+ - `fp16_full_eval`: False
237
+ - `tf32`: None
238
+ - `local_rank`: 0
239
+ - `ddp_backend`: None
240
+ - `tpu_num_cores`: None
241
+ - `tpu_metrics_debug`: False
242
+ - `debug`: []
243
+ - `dataloader_drop_last`: False
244
+ - `dataloader_num_workers`: 0
245
+ - `dataloader_prefetch_factor`: None
246
+ - `past_index`: -1
247
+ - `disable_tqdm`: False
248
+ - `remove_unused_columns`: True
249
+ - `label_names`: None
250
+ - `load_best_model_at_end`: False
251
+ - `ignore_data_skip`: False
252
+ - `fsdp`: []
253
+ - `fsdp_min_num_params`: 0
254
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
255
+ - `fsdp_transformer_layer_cls_to_wrap`: None
256
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
257
+ - `parallelism_config`: None
258
+ - `deepspeed`: None
259
+ - `label_smoothing_factor`: 0.0
260
+ - `optim`: adamw_torch
261
+ - `optim_args`: None
262
+ - `adafactor`: False
263
+ - `group_by_length`: False
264
+ - `length_column_name`: length
265
+ - `project`: huggingface
266
+ - `trackio_space_id`: trackio
267
+ - `ddp_find_unused_parameters`: None
268
+ - `ddp_bucket_cap_mb`: None
269
+ - `ddp_broadcast_buffers`: False
270
+ - `dataloader_pin_memory`: True
271
+ - `dataloader_persistent_workers`: False
272
+ - `skip_memory_metrics`: True
273
+ - `use_legacy_prediction_loop`: False
274
+ - `push_to_hub`: False
275
+ - `resume_from_checkpoint`: None
276
+ - `hub_model_id`: None
277
+ - `hub_strategy`: every_save
278
+ - `hub_private_repo`: None
279
+ - `hub_always_push`: False
280
+ - `hub_revision`: None
281
+ - `gradient_checkpointing`: False
282
+ - `gradient_checkpointing_kwargs`: None
283
+ - `include_inputs_for_metrics`: False
284
+ - `include_for_metrics`: []
285
+ - `eval_do_concat_batches`: True
286
+ - `fp16_backend`: auto
287
+ - `push_to_hub_model_id`: None
288
+ - `push_to_hub_organization`: None
289
+ - `mp_parameters`:
290
+ - `auto_find_batch_size`: False
291
+ - `full_determinism`: False
292
+ - `torchdynamo`: None
293
+ - `ray_scope`: last
294
+ - `ddp_timeout`: 1800
295
+ - `torch_compile`: False
296
+ - `torch_compile_backend`: None
297
+ - `torch_compile_mode`: None
298
+ - `include_tokens_per_second`: False
299
+ - `include_num_input_tokens_seen`: no
300
+ - `neftune_noise_alpha`: None
301
+ - `optim_target_modules`: None
302
+ - `batch_eval_metrics`: False
303
+ - `eval_on_start`: False
304
+ - `use_liger_kernel`: False
305
+ - `liger_kernel_config`: None
306
+ - `eval_use_gather_object`: False
307
+ - `average_tokens_across_devices`: True
308
+ - `prompts`: None
309
+ - `batch_sampler`: batch_sampler
310
+ - `multi_dataset_batch_sampler`: round_robin
311
+ - `router_mapping`: {}
312
+ - `learning_rate_mapping`: {}
313
+
314
+ </details>
315
+
316
+ ### Framework Versions
317
+ - Python: 3.12.7
318
+ - Sentence Transformers: 5.1.1
319
+ - Transformers: 4.57.1
320
+ - PyTorch: 2.5.1
321
+ - Accelerate: 1.11.0
322
+ - Datasets: 4.3.0
323
+ - Tokenizers: 0.22.1
324
+
325
+ ## Citation
326
+
327
+ ### BibTeX
328
+
329
+ #### Sentence Transformers
330
+ ```bibtex
331
+ @inproceedings{reimers-2019-sentence-bert,
332
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
333
+ author = "Reimers, Nils and Gurevych, Iryna",
334
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
335
+ month = "11",
336
+ year = "2019",
337
+ publisher = "Association for Computational Linguistics",
338
+ url = "https://arxiv.org/abs/1908.10084",
339
+ }
340
+ ```
341
+
342
+ <!--
343
+ ## Glossary
344
+
345
+ *Clearly define terms in order to be accessible across audiences.*
346
+ -->
347
+
348
+ <!--
349
+ ## Model Card Authors
350
+
351
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
352
+ -->
353
+
354
+ <!--
355
+ ## Model Card Contact
356
+
357
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
358
+ -->
app/my_finetuned_modelV2/config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "dtype": "float32",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "transformers_version": "4.57.1",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 250037
25
+ }
app/my_finetuned_modelV2/config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "5.1.1",
4
+ "transformers": "4.57.1",
5
+ "pytorch": "2.5.1"
6
+ },
7
+ "model_type": "SentenceTransformer",
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
app/my_finetuned_modelV2/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:256726c15cf1c3568a8a75d936a5c205b017384461e1768d5bb26a4721ac1c38
3
+ size 470637416
app/my_finetuned_modelV2/modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
app/my_finetuned_modelV2/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
app/my_finetuned_modelV2/special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
app/my_finetuned_modelV2/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cad551d5600a84242d0973327029452a1e3672ba6313c2a3c3d69c4310e12719
3
+ size 17082987
app/my_finetuned_modelV2/tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "<s>",
47
+ "do_lower_case": true,
48
+ "eos_token": "</s>",
49
+ "extra_special_tokens": {},
50
+ "mask_token": "<mask>",
51
+ "max_length": 128,
52
+ "model_max_length": 128,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "<pad>",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "</s>",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "<unk>"
65
+ }
app/my_finetuned_modelV2/unigram.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da145b5e7700ae40f16691ec32a0b1fdc1ee3298db22a31ea55f57a966c4a65d
3
+ size 14763260
app/my_finetuned_modelV2/vocab.txt ADDED
The diff for this file is too large to render. See raw diff