AlIshaq commited on
Commit
457c8d2
·
verified ·
1 Parent(s): 6606d60

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -331
README.md CHANGED
@@ -55,334 +55,4 @@ model-index:
55
  - type: spearman_cosine
56
  value: .nan
57
  name: Spearman Cosine
58
- ---
59
-
60
- # SentenceTransformer based on intfloat/multilingual-e5-small
61
-
62
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
63
-
64
- ## Model Details
65
-
66
- ### Model Description
67
- - **Model Type:** Sentence Transformer
68
- - **Base model:** [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) <!-- at revision c007d7ef6fd86656326059b28395a7a03a7c5846 -->
69
- - **Maximum Sequence Length:** 512 tokens
70
- - **Output Dimensionality:** 384 dimensions
71
- - **Similarity Function:** Cosine Similarity
72
- <!-- - **Training Dataset:** Unknown -->
73
- <!-- - **Language:** Unknown -->
74
- <!-- - **License:** Unknown -->
75
-
76
- ### Model Sources
77
-
78
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
79
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
80
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
81
-
82
- ### Full Model Architecture
83
-
84
- ```
85
- SentenceTransformer(
86
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
87
- (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
88
- (2): Normalize()
89
- )
90
- ```
91
-
92
- ## Usage
93
-
94
- ### Direct Usage (Sentence Transformers)
95
-
96
- First install the Sentence Transformers library:
97
-
98
- ```bash
99
- pip install -U sentence-transformers
100
- ```
101
-
102
- Then you can load this model and run inference.
103
- ```python
104
- from sentence_transformers import SentenceTransformer
105
-
106
- # Download from the 🤗 Hub
107
- model = SentenceTransformer("sentence_transformers_model_id")
108
- # Run inference
109
- sentences = [
110
- "Apakah ada kegiatan Maulid Nabi atau Isra Mi'raj?",
111
- 'Ada, biasanya diisi dengan puasa, tahsin, dan murajaah.',
112
- 'Jadwal disusun oleh bagian akademik agar merata dan efisien.',
113
- ]
114
- embeddings = model.encode(sentences)
115
- print(embeddings.shape)
116
- # [3, 384]
117
-
118
- # Get the similarity scores for the embeddings
119
- similarities = model.similarity(embeddings, embeddings)
120
- print(similarities.shape)
121
- # [3, 3]
122
- ```
123
-
124
- <!--
125
- ### Direct Usage (Transformers)
126
-
127
- <details><summary>Click to see the direct usage in Transformers</summary>
128
-
129
- </details>
130
- -->
131
-
132
- <!--
133
- ### Downstream Usage (Sentence Transformers)
134
-
135
- You can finetune this model on your own dataset.
136
-
137
- <details><summary>Click to expand</summary>
138
-
139
- </details>
140
- -->
141
-
142
- <!--
143
- ### Out-of-Scope Use
144
-
145
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
146
- -->
147
-
148
- ## Evaluation
149
-
150
- ### Metrics
151
-
152
- #### Semantic Similarity
153
-
154
- * Dataset: `eval`
155
- * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
156
-
157
- | Metric | Value |
158
- |:--------------------|:--------|
159
- | pearson_cosine | nan |
160
- | **spearman_cosine** | **nan** |
161
-
162
- <!--
163
- ## Bias, Risks and Limitations
164
-
165
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
166
- -->
167
-
168
- <!--
169
- ### Recommendations
170
-
171
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
172
- -->
173
-
174
- ## Training Details
175
-
176
- ### Training Dataset
177
-
178
- #### Unnamed Dataset
179
-
180
- * Size: 8,100 training samples
181
- * Columns: <code>sentence_0</code> and <code>sentence_1</code>
182
- * Approximate statistics based on the first 1000 samples:
183
- | | sentence_0 | sentence_1 |
184
- |:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
185
- | type | string | string |
186
- | details | <ul><li>min: 7 tokens</li><li>mean: 11.2 tokens</li><li>max: 18 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 15.97 tokens</li><li>max: 42 tokens</li></ul> |
187
- * Samples:
188
- | sentence_0 | sentence_1 |
189
- |:-------------------------------------------------------------------|:-------------------------------------------------------------------------------------------|
190
- | <code>Apakah ada kuota penerimaan untuk jenjang Pesantren?</code> | <code>Ya, setiap jenjang memiliki kuota terbatas sesuai kapasitas kelas dan asrama.</code> |
191
- | <code>Bagaimana pengamanan saat ada tamu luar masuk?</code> | <code>Tamu wajib lapor dan didampingi oleh petugas keamanan.</code> |
192
- | <code>Bagaimana cara melaporkan kondisi santri kepada wali?</code> | <code>Pondok mengirimkan laporan tertulis dan progres belajar secara berkala.</code> |
193
- * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
194
- ```json
195
- {
196
- "scale": 20.0,
197
- "similarity_fct": "cos_sim"
198
- }
199
- ```
200
-
201
- ### Training Hyperparameters
202
- #### Non-Default Hyperparameters
203
-
204
- - `eval_strategy`: steps
205
- - `per_device_train_batch_size`: 16
206
- - `per_device_eval_batch_size`: 16
207
- - `multi_dataset_batch_sampler`: round_robin
208
-
209
- #### All Hyperparameters
210
- <details><summary>Click to expand</summary>
211
-
212
- - `overwrite_output_dir`: False
213
- - `do_predict`: False
214
- - `eval_strategy`: steps
215
- - `prediction_loss_only`: True
216
- - `per_device_train_batch_size`: 16
217
- - `per_device_eval_batch_size`: 16
218
- - `per_gpu_train_batch_size`: None
219
- - `per_gpu_eval_batch_size`: None
220
- - `gradient_accumulation_steps`: 1
221
- - `eval_accumulation_steps`: None
222
- - `torch_empty_cache_steps`: None
223
- - `learning_rate`: 5e-05
224
- - `weight_decay`: 0.0
225
- - `adam_beta1`: 0.9
226
- - `adam_beta2`: 0.999
227
- - `adam_epsilon`: 1e-08
228
- - `max_grad_norm`: 1
229
- - `num_train_epochs`: 3
230
- - `max_steps`: -1
231
- - `lr_scheduler_type`: linear
232
- - `lr_scheduler_kwargs`: {}
233
- - `warmup_ratio`: 0.0
234
- - `warmup_steps`: 0
235
- - `log_level`: passive
236
- - `log_level_replica`: warning
237
- - `log_on_each_node`: True
238
- - `logging_nan_inf_filter`: True
239
- - `save_safetensors`: True
240
- - `save_on_each_node`: False
241
- - `save_only_model`: False
242
- - `restore_callback_states_from_checkpoint`: False
243
- - `no_cuda`: False
244
- - `use_cpu`: False
245
- - `use_mps_device`: False
246
- - `seed`: 42
247
- - `data_seed`: None
248
- - `jit_mode_eval`: False
249
- - `use_ipex`: False
250
- - `bf16`: False
251
- - `fp16`: False
252
- - `fp16_opt_level`: O1
253
- - `half_precision_backend`: auto
254
- - `bf16_full_eval`: False
255
- - `fp16_full_eval`: False
256
- - `tf32`: None
257
- - `local_rank`: 0
258
- - `ddp_backend`: None
259
- - `tpu_num_cores`: None
260
- - `tpu_metrics_debug`: False
261
- - `debug`: []
262
- - `dataloader_drop_last`: False
263
- - `dataloader_num_workers`: 0
264
- - `dataloader_prefetch_factor`: None
265
- - `past_index`: -1
266
- - `disable_tqdm`: False
267
- - `remove_unused_columns`: True
268
- - `label_names`: None
269
- - `load_best_model_at_end`: False
270
- - `ignore_data_skip`: False
271
- - `fsdp`: []
272
- - `fsdp_min_num_params`: 0
273
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
274
- - `fsdp_transformer_layer_cls_to_wrap`: None
275
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
276
- - `deepspeed`: None
277
- - `label_smoothing_factor`: 0.0
278
- - `optim`: adamw_torch
279
- - `optim_args`: None
280
- - `adafactor`: False
281
- - `group_by_length`: False
282
- - `length_column_name`: length
283
- - `ddp_find_unused_parameters`: None
284
- - `ddp_bucket_cap_mb`: None
285
- - `ddp_broadcast_buffers`: False
286
- - `dataloader_pin_memory`: True
287
- - `dataloader_persistent_workers`: False
288
- - `skip_memory_metrics`: True
289
- - `use_legacy_prediction_loop`: False
290
- - `push_to_hub`: False
291
- - `resume_from_checkpoint`: None
292
- - `hub_model_id`: None
293
- - `hub_strategy`: every_save
294
- - `hub_private_repo`: None
295
- - `hub_always_push`: False
296
- - `gradient_checkpointing`: False
297
- - `gradient_checkpointing_kwargs`: None
298
- - `include_inputs_for_metrics`: False
299
- - `include_for_metrics`: []
300
- - `eval_do_concat_batches`: True
301
- - `fp16_backend`: auto
302
- - `push_to_hub_model_id`: None
303
- - `push_to_hub_organization`: None
304
- - `mp_parameters`:
305
- - `auto_find_batch_size`: False
306
- - `full_determinism`: False
307
- - `torchdynamo`: None
308
- - `ray_scope`: last
309
- - `ddp_timeout`: 1800
310
- - `torch_compile`: False
311
- - `torch_compile_backend`: None
312
- - `torch_compile_mode`: None
313
- - `include_tokens_per_second`: False
314
- - `include_num_input_tokens_seen`: False
315
- - `neftune_noise_alpha`: None
316
- - `optim_target_modules`: None
317
- - `batch_eval_metrics`: False
318
- - `eval_on_start`: False
319
- - `use_liger_kernel`: False
320
- - `eval_use_gather_object`: False
321
- - `average_tokens_across_devices`: False
322
- - `prompts`: None
323
- - `batch_sampler`: batch_sampler
324
- - `multi_dataset_batch_sampler`: round_robin
325
-
326
- </details>
327
-
328
- ### Training Logs
329
- | Epoch | Step | eval_spearman_cosine |
330
- |:------:|:----:|:--------------------:|
331
- | 0.1972 | 100 | nan |
332
-
333
-
334
- ### Framework Versions
335
- - Python: 3.11.13
336
- - Sentence Transformers: 4.1.0
337
- - Transformers: 4.52.4
338
- - PyTorch: 2.6.0+cu124
339
- - Accelerate: 1.7.0
340
- - Datasets: 2.14.4
341
- - Tokenizers: 0.21.1
342
-
343
- ## Citation
344
-
345
- ### BibTeX
346
-
347
- #### Sentence Transformers
348
- ```bibtex
349
- @inproceedings{reimers-2019-sentence-bert,
350
- title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
351
- author = "Reimers, Nils and Gurevych, Iryna",
352
- booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
353
- month = "11",
354
- year = "2019",
355
- publisher = "Association for Computational Linguistics",
356
- url = "https://arxiv.org/abs/1908.10084",
357
- }
358
- ```
359
-
360
- #### MultipleNegativesRankingLoss
361
- ```bibtex
362
- @misc{henderson2017efficient,
363
- title={Efficient Natural Language Response Suggestion for Smart Reply},
364
- author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
365
- year={2017},
366
- eprint={1705.00652},
367
- archivePrefix={arXiv},
368
- primaryClass={cs.CL}
369
- }
370
- ```
371
-
372
- <!--
373
- ## Glossary
374
-
375
- *Clearly define terms in order to be accessible across audiences.*
376
- -->
377
-
378
- <!--
379
- ## Model Card Authors
380
-
381
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
382
- -->
383
-
384
- <!--
385
- ## Model Card Contact
386
-
387
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
388
- -->
 
55
  - type: spearman_cosine
56
  value: .nan
57
  name: Spearman Cosine
58
+ ---