Add new CrossEncoder model

Browse files

Files changed (7) hide show

README.md +48 -30
config.json +15 -28
model.safetensors +2 -2
special_tokens_map.json +1 -15
tokenizer.json +0 -0
tokenizer_config.json +17 -17
vocab.txt +0 -0

README.md CHANGED Viewed

@@ -6,22 +6,22 @@ tags:
 - generated_from_trainer
 - dataset_size:553491
 - loss:CrossEntropyLoss
-base_model: cross-encoder/nli-deberta-v3-base
 datasets:
 - software-si/kitchen-nli-it
 pipeline_tag: text-classification
 library_name: sentence-transformers
 ---
-# CrossEncoder based on cross-encoder/nli-deberta-v3-base
-This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [cross-encoder/nli-deberta-v3-base](https://huggingface.co/cross-encoder/nli-deberta-v3-base) on the [kitchen-nli-it](https://huggingface.co/datasets/software-si/kitchen-nli-it) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text pair classification.
 ## Model Details
 ### Model Description
 - **Model Type:** Cross Encoder
-- **Base model:** [cross-encoder/nli-deberta-v3-base](https://huggingface.co/cross-encoder/nli-deberta-v3-base) <!-- at revision 6c749ce3425cd33b46d187e45b92bbf96ee12ec7 -->
 - **Maximum Sequence Length:** 512 tokens
 - **Number of Output Labels:** 3 labels
 - **Training Dataset:**
@@ -54,11 +54,11 @@ from sentence_transformers import CrossEncoder
 model = CrossEncoder("software-si/kitchen-it-nli")
 # Get scores for pairs of texts
 pairs = [
-    ['modulo cucina lato posteriore di 70 cm, teglia dimensione gn1/1 due zone operative, completo di forno elettrico,', 'le zone cottura sono 4'],
-    ['unità di cottura unità posizionata su vano a giorno, quattro punti cottura, con piastre quadre,', 'il vano della cucina è aperto'],
-    ['unità di cottura disposta su forno a gas, con piastre tonde, dotata di quattro piastre di cottura,', 'la cucina è disposta su vano a giorno'],
-    ['piano cottura con le piastre quadre, dotata di quattro piastre di cottura, con teglie di gn1/1 70 cm di lato,', 'la cucina è alimentata a gas'],
-    ['modulo cucina sistema di cottura elettrico, quattro piastre di cottura, con teglie di gn1/1 profondità pari a 90 cm,', 'la cucina ha due zone cottura'],
 ]
 scores = model.predict(pairs)
 print(scores.shape)
@@ -114,13 +114,13 @@ You can finetune this model on your own dataset.
   |         | premises                                                                                         | hypothesis                                                                                     | labels                                                             |
   |:--------|:-------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------|
   | type    | string                                                                                           | string                                                                                         | int                                                                |
-  | details | <ul><li>min: 45 characters</li><li>mean: 104.07 characters</li><li>max: 153 characters</li></ul> | <ul><li>min: 12 characters</li><li>mean: 32.84 characters</li><li>max: 50 characters</li></ul> | <ul><li>0: ~30.90%</li><li>1: ~36.50%</li><li>2: ~32.60%</li></ul> |
 * Samples:
-  | premises                                                                                                          | hypothesis                                                  | labels         |
-  |:------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------|:---------------|
-  | <code>piano cottura forno a funzionamento elettrico, configurazione a 2 piastre, modulo con piastre tonde,</code> | <code>la cucina ha le piastre quadre</code>                 | <code>0</code> |
-  | <code>cucina profondità standard 70 cm, dotata di piastre tonde, 6 bruciatori,</code>                             | <code>la cucina è profonda 70 cm</code>                     | <code>1</code> |
-  | <code>cucina fornita di sei fuochi cottura, con teglie di gn1/1 70 cm di lato, operativa a induzione,</code>      | <code>le piastre della cucina sono di forma quadrata</code> | <code>2</code> |
 * Loss: [<code>CrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#crossentropyloss)
 ### Evaluation Dataset
@@ -134,13 +134,13 @@ You can finetune this model on your own dataset.
   |         | premises                                                                                         | hypothesis                                                                                     | labels                                                             |
   |:--------|:-------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------|
   | type    | string                                                                                           | string                                                                                         | int                                                                |
-  | details | <ul><li>min: 43 characters</li><li>mean: 103.98 characters</li><li>max: 159 characters</li></ul> | <ul><li>min: 12 characters</li><li>mean: 33.46 characters</li><li>max: 50 characters</li></ul> | <ul><li>0: ~30.00%</li><li>1: ~38.50%</li><li>2: ~31.50%</li></ul> |
 * Samples:
-  | premises                                                                                                                      | hypothesis                                         | labels         |
-  |:------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------|:---------------|
-  | <code>modulo cucina lato posteriore di 70 cm, teglia dimensione gn1/1 due zone operative, completo di forno elettrico,</code> | <code>le zone cottura sono 4</code>                | <code>0</code> |
-  | <code>unità di cottura unità posizionata su vano a giorno, quattro punti cottura, con piastre quadre,</code>                  | <code>il vano della cucina è aperto</code>         | <code>1</code> |
-  | <code>unità di cottura disposta su forno a gas, con piastre tonde, dotata di quattro piastre di cottura,</code>               | <code>la cucina è disposta su vano a giorno</code> | <code>2</code> |
 * Loss: [<code>CrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#crossentropyloss)
 ### Training Hyperparameters
@@ -195,7 +195,6 @@ You can finetune this model on your own dataset.
 - `seed`: 42
 - `data_seed`: None
 - `jit_mode_eval`: False
-- `use_ipex`: False
 - `bf16`: True
 - `fp16`: False
 - `fp16_opt_level`: O1
@@ -230,6 +229,8 @@ You can finetune this model on your own dataset.
 - `adafactor`: False
 - `group_by_length`: False
 - `length_column_name`: length
 - `ddp_find_unused_parameters`: None
 - `ddp_bucket_cap_mb`: None
 - `ddp_broadcast_buffers`: False
@@ -262,7 +263,7 @@ You can finetune this model on your own dataset.
 - `torch_compile_backend`: None
 - `torch_compile_mode`: None
 - `include_tokens_per_second`: False
-- `include_num_input_tokens_seen`: False
 - `neftune_noise_alpha`: None
 - `optim_target_modules`: None
 - `batch_eval_metrics`: False
@@ -270,7 +271,7 @@ You can finetune this model on your own dataset.
 - `use_liger_kernel`: False
 - `liger_kernel_config`: None
 - `eval_use_gather_object`: False
-- `average_tokens_across_devices`: False
 - `prompts`: None
 - `batch_sampler`: batch_sampler
 - `multi_dataset_batch_sampler`: proportional
@@ -282,16 +283,33 @@ You can finetune this model on your own dataset.
 ### Training Logs
 | Epoch  | Step | Training Loss | Validation Loss |
 |:------:|:----:|:-------------:|:---------------:|
-| 0.2312 | 2000 | 0.8441        | 0.3763          |
-| 0.4625 | 4000 | 0.2613        | 0.1401          |
-| 0.6937 | 6000 | 0.1298        | 0.0938          |
-| 0.9250 | 8000 | 0.0945        | 0.0853          |
 ### Framework Versions
-- Python: 3.12.11
 - Sentence Transformers: 5.1.1
-- Transformers: 4.56.2
 - PyTorch: 2.8.0+cu128
 - Accelerate: 1.10.1
 - Datasets: 4.1.1

 - generated_from_trainer
 - dataset_size:553491
 - loss:CrossEntropyLoss
+base_model: dbmdz/bert-base-italian-uncased
 datasets:
 - software-si/kitchen-nli-it
 pipeline_tag: text-classification
 library_name: sentence-transformers
 ---
+# CrossEncoder based on dbmdz/bert-base-italian-uncased
+This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [dbmdz/bert-base-italian-uncased](https://huggingface.co/dbmdz/bert-base-italian-uncased) on the [kitchen-nli-it](https://huggingface.co/datasets/software-si/kitchen-nli-it) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text pair classification.
 ## Model Details
 ### Model Description
 - **Model Type:** Cross Encoder
+- **Base model:** [dbmdz/bert-base-italian-uncased](https://huggingface.co/dbmdz/bert-base-italian-uncased) <!-- at revision 55058d75cf3bc75a67a412584491b774cb99d68a -->
 - **Maximum Sequence Length:** 512 tokens
 - **Number of Output Labels:** 3 labels
 - **Training Dataset:**
 model = CrossEncoder("software-si/kitchen-it-nli")
 # Get scores for pairs of texts
 pairs = [
+    ['piano cottura sopra forno preinstallato, dotata di 6 piastre di cottura, fornita di piastre quadrate, cucina alimentata a induzione,', 'la cucina è alimentata ad induzione'],
+    ['modulo cucina dimensione teglie di gn1/1 piastre di forma quadrata, di profondità 70 cm, con forno,', 'le piastre della cucina sono di tonde'],
+    ['modulo cucina modalità di alimentazione elettrica, con piastre tonde operative, forno alimentato a gas, 2 zone,', "l'alimentazione del forno è a gas"],
+    ['cucina con teglie di gn1/1 piastre tonde preinstallate, superficie di cottura elettrica, con forno incluso,', 'la cucina ha un forno integrato'],
+    ['cucina sei punti cottura, dimensione anteriore 70 cm, posta su vano, con cottura a gas,', 'la cucina è alimentata ad elettrico'],
 ]
 scores = model.predict(pairs)
 print(scores.shape)
   |         | premises                                                                                         | hypothesis                                                                                     | labels                                                             |
   |:--------|:-------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------|
   | type    | string                                                                                           | string                                                                                         | int                                                                |
+  | details | <ul><li>min: 51 characters</li><li>mean: 104.34 characters</li><li>max: 153 characters</li></ul> | <ul><li>min: 12 characters</li><li>mean: 33.34 characters</li><li>max: 50 characters</li></ul> | <ul><li>0: ~31.80%</li><li>1: ~37.40%</li><li>2: ~30.80%</li></ul> |
 * Samples:
+  | premises                                                                                                                    | hypothesis                                         | labels         |
+  |:----------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------|:---------------|
+  | <code>cucina con piastre tonde, 4 fuochi, su base con forno elettrico,</code>                                               | <code>la cucina ha un forno elettrico</code>       | <code>1</code> |
+  | <code>piano cottura profonda 90 cm, con sei zone cottura, piastre tonde incluse,</code>                                     | <code>la cucina è profonda 90 cm</code>            | <code>1</code> |
+  | <code>piano cottura dotata di 6 fuochi di cottura, di profondità 70 cm, con teglie di gn1/1 piastre tonde integrate,</code> | <code>la dimensione della teglie è di gn1/1</code> | <code>1</code> |
 * Loss: [<code>CrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#crossentropyloss)
 ### Evaluation Dataset
   |         | premises                                                                                         | hypothesis                                                                                     | labels                                                             |
   |:--------|:-------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------|
   | type    | string                                                                                           | string                                                                                         | int                                                                |
+  | details | <ul><li>min: 44 characters</li><li>mean: 103.86 characters</li><li>max: 149 characters</li></ul> | <ul><li>min: 12 characters</li><li>mean: 33.19 characters</li><li>max: 50 characters</li></ul> | <ul><li>0: ~31.60%</li><li>1: ~35.50%</li><li>2: ~32.90%</li></ul> |
 * Samples:
+  | premises                                                                                                                                          | hypothesis                                         | labels         |
+  |:--------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------|:---------------|
+  | <code>piano cottura sopra forno preinstallato, dotata di 6 piastre di cottura, fornita di piastre quadrate, cucina alimentata a induzione,</code> | <code>la cucina è alimentata ad induzione</code>   | <code>1</code> |
+  | <code>modulo cucina dimensione teglie di gn1/1 piastre di forma quadrata, di profondità 70 cm, con forno,</code>                                  | <code>le piastre della cucina sono di tonde</code> | <code>0</code> |
+  | <code>modulo cucina modalità di alimentazione elettrica, con piastre tonde operative, forno alimentato a gas, 2 zone,</code>                      | <code>l'alimentazione del forno è a gas</code>     | <code>1</code> |
 * Loss: [<code>CrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#crossentropyloss)
 ### Training Hyperparameters
 - `seed`: 42
 - `data_seed`: None
 - `jit_mode_eval`: False
 - `bf16`: True
 - `fp16`: False
 - `fp16_opt_level`: O1
 - `adafactor`: False
 - `group_by_length`: False
 - `length_column_name`: length
+- `project`: huggingface
+- `trackio_space_id`: trackio
 - `ddp_find_unused_parameters`: None
 - `ddp_bucket_cap_mb`: None
 - `ddp_broadcast_buffers`: False
 - `torch_compile_backend`: None
 - `torch_compile_mode`: None
 - `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: no
 - `neftune_noise_alpha`: None
 - `optim_target_modules`: None
 - `batch_eval_metrics`: False
 - `use_liger_kernel`: False
 - `liger_kernel_config`: None
 - `eval_use_gather_object`: False
+- `average_tokens_across_devices`: True
 - `prompts`: None
 - `batch_sampler`: batch_sampler
 - `multi_dataset_batch_sampler`: proportional
 ### Training Logs
 | Epoch  | Step | Training Loss | Validation Loss |
 |:------:|:----:|:-------------:|:---------------:|
+| 0.0462 | 400  | 1.1097        | 1.0908          |
+| 0.0925 | 800  | 1.074         | 1.0388          |
+| 0.1387 | 1200 | 1.0096        | 0.9463          |
+| 0.1850 | 1600 | 0.9181        | 0.8411          |
+| 0.2312 | 2000 | 0.8197        | 0.7405          |
+| 0.2775 | 2400 | 0.7356        | 0.6496          |
+| 0.3237 | 2800 | 0.6549        | 0.5535          |
+| 0.3700 | 3200 | 0.5595        | 0.4527          |
+| 0.4162 | 3600 | 0.4713        | 0.3730          |
+| 0.4625 | 4000 | 0.3963        | 0.3116          |
+| 0.5087 | 4400 | 0.3393        | 0.2627          |
+| 0.5550 | 4800 | 0.2966        | 0.2278          |
+| 0.6012 | 5200 | 0.2574        | 0.1980          |
+| 0.6475 | 5600 | 0.2278        | 0.1759          |
+| 0.6937 | 6000 | 0.2147        | 0.1613          |
+| 0.7400 | 6400 | 0.1944        | 0.1466          |
+| 0.7862 | 6800 | 0.1754        | 0.1387          |
+| 0.8325 | 7200 | 0.1658        | 0.1312          |
+| 0.8787 | 7600 | 0.1514        | 0.1244          |
+| 0.9250 | 8000 | 0.143         | 0.1133          |
+| 0.9712 | 8400 | 0.1313        | 0.1095          |
 ### Framework Versions
+- Python: 3.12.3
 - Sentence Transformers: 5.1.1
+- Transformers: 4.57.0
 - PyTorch: 2.8.0+cu128
 - Accelerate: 1.10.1
 - Datasets: 4.1.1

config.json CHANGED Viewed

@@ -1,51 +1,38 @@
 {
   "architectures": [
-    "DebertaV2ForSequenceClassification"
   ],
   "attention_probs_dropout_prob": 0.1,
-  "bos_token_id": 1,
   "dtype": "float32",
-  "eos_token_id": 2,
   "hidden_act": "gelu",
   "hidden_dropout_prob": 0.1,
   "hidden_size": 768,
   "id2label": {
-    "0": "contradiction",
-    "1": "entailment",
-    "2": "neutral"
   },
   "initializer_range": 0.02,
   "intermediate_size": 3072,
   "label2id": {
-    "contradiction": 0,
-    "entailment": 1,
-    "neutral": 2
   },
-  "layer_norm_eps": 1e-07,
-  "legacy": true,
   "max_position_embeddings": 512,
-  "max_relative_positions": -1,
-  "model_type": "deberta-v2",
-  "norm_rel_ebd": "layer_norm",
   "num_attention_heads": 12,
   "num_hidden_layers": 12,
   "pad_token_id": 0,
-  "pooler_dropout": 0,
-  "pooler_hidden_act": "gelu",
-  "pooler_hidden_size": 768,
-  "pos_att_type": [
-    "p2c",
-    "c2p"
-  ],
-  "position_biased_input": false,
-  "position_buckets": 256,
-  "relative_attention": true,
   "sentence_transformers": {
     "activation_fn": "torch.nn.modules.linear.Identity",
     "version": "5.1.1"
   },
-  "share_att_key": true,
-  "transformers_version": "4.56.2",
-  "type_vocab_size": 0,
-  "vocab_size": 128100
 }

 {
   "architectures": [
+    "BertForSequenceClassification"
   ],
   "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
   "dtype": "float32",
   "hidden_act": "gelu",
   "hidden_dropout_prob": 0.1,
   "hidden_size": 768,
   "id2label": {
+    "0": "LABEL_0",
+    "1": "LABEL_1",
+    "2": "LABEL_2"
   },
   "initializer_range": 0.02,
   "intermediate_size": 3072,
   "label2id": {
+    "LABEL_0": 0,
+    "LABEL_1": 1,
+    "LABEL_2": 2
   },
+  "layer_norm_eps": 1e-12,
   "max_position_embeddings": 512,
+  "model_type": "bert",
   "num_attention_heads": 12,
   "num_hidden_layers": 12,
   "pad_token_id": 0,
+  "position_embedding_type": "absolute",
   "sentence_transformers": {
     "activation_fn": "torch.nn.modules.linear.Identity",
     "version": "5.1.1"
   },
+  "transformers_version": "4.57.0",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 31102
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d28631f1a91c2549b3f9e7e41e00333d91990df9d63ec5e9d32e69b9482a9714
-size 737722356

 version https://git-lfs.github.com/spec/v1
+oid sha256:ab435a94f874ed86c96fb3378132063758db4b6a31cb40897f511bdb71dd72d5
+size 439743484

special_tokens_map.json CHANGED Viewed

@@ -1,11 +1,4 @@
 {
-  "bos_token": {
-    "content": "[CLS]",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
   "cls_token": {
     "content": "[CLS]",
     "lstrip": false,
@@ -13,13 +6,6 @@
     "rstrip": false,
     "single_word": false
   },
-  "eos_token": {
-    "content": "[SEP]",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  },
   "mask_token": {
     "content": "[MASK]",
     "lstrip": false,
@@ -44,7 +30,7 @@
   "unk_token": {
     "content": "[UNK]",
     "lstrip": false,
-    "normalized": true,
     "rstrip": false,
     "single_word": false
   }

 {
   "cls_token": {
     "content": "[CLS]",
     "lstrip": false,
     "rstrip": false,
     "single_word": false
   },
   "mask_token": {
     "content": "[MASK]",
     "lstrip": false,
   "unk_token": {
     "content": "[UNK]",
     "lstrip": false,
+    "normalized": false,
     "rstrip": false,
     "single_word": false
   }

tokenizer.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json CHANGED Viewed

@@ -8,31 +8,31 @@
       "single_word": false,
       "special": true
     },
-    "1": {
-      "content": "[CLS]",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
       "single_word": false,
       "special": true
     },
-    "2": {
-      "content": "[SEP]",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
       "single_word": false,
       "special": true
     },
-    "3": {
-      "content": "[UNK]",
       "lstrip": false,
-      "normalized": true,
       "rstrip": false,
       "single_word": false,
       "special": true
     },
-    "128000": {
       "content": "[MASK]",
       "lstrip": false,
       "normalized": false,
@@ -41,26 +41,26 @@
       "special": true
     }
   },
-  "bos_token": "[CLS]",
-  "clean_up_tokenization_spaces": false,
   "cls_token": "[CLS]",
-  "do_lower_case": false,
-  "eos_token": "[SEP]",
   "extra_special_tokens": {},
   "mask_token": "[MASK]",
   "max_length": 512,
   "model_max_length": 512,
   "pad_to_multiple_of": null,
   "pad_token": "[PAD]",
   "pad_token_type_id": 0,
   "padding_side": "right",
   "sep_token": "[SEP]",
-  "sp_model_kwargs": {},
-  "split_by_punct": false,
   "stride": 0,
-  "tokenizer_class": "DebertaV2TokenizerFast",
   "truncation_side": "right",
   "truncation_strategy": "longest_first",
-  "unk_token": "[UNK]",
-  "vocab_type": "spm"
 }

       "single_word": false,
       "special": true
     },
+    "101": {
+      "content": "[UNK]",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
       "single_word": false,
       "special": true
     },
+    "102": {
+      "content": "[CLS]",
       "lstrip": false,
       "normalized": false,
       "rstrip": false,
       "single_word": false,
       "special": true
     },
+    "103": {
+      "content": "[SEP]",
       "lstrip": false,
+      "normalized": false,
       "rstrip": false,
       "single_word": false,
       "special": true
     },
+    "104": {
       "content": "[MASK]",
       "lstrip": false,
       "normalized": false,
       "special": true
     }
   },
+  "clean_up_tokenization_spaces": true,
   "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
   "extra_special_tokens": {},
   "mask_token": "[MASK]",
+  "max_len": 512,
   "max_length": 512,
   "model_max_length": 512,
+  "never_split": null,
   "pad_to_multiple_of": null,
   "pad_token": "[PAD]",
   "pad_token_type_id": 0,
   "padding_side": "right",
   "sep_token": "[SEP]",
   "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
   "truncation_side": "right",
   "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
 }

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff