AxelPCG commited on
Commit
b62a8d0
·
verified ·
1 Parent(s): fdc0c4d

Upload SPLADE-PT-BR model v1.0.0

Browse files
Files changed (2) hide show
  1. README.md +5 -2
  2. model_metadata.json +2 -1
README.md CHANGED
@@ -9,6 +9,7 @@ tags:
9
  - bert
10
  datasets:
11
  - unicamp-dl/mmarco
 
12
  base_model: neuralmind/bert-base-portuguese-cased
13
  ---
14
 
@@ -36,8 +37,10 @@ SPLADE is a neural retrieval model that learns to expand queries and documents w
36
 
37
  ### Training Data
38
 
39
- - **Primary Dataset**: mMARCO Portuguese (MS MARCO translated to Portuguese)
40
- - **Validation**: Portuguese query-document pairs
 
 
41
  - **Format**: Triplets (query, positive document, negative document)
42
 
43
  ### Training Configuration
 
9
  - bert
10
  datasets:
11
  - unicamp-dl/mmarco
12
+ - unicamp-dl/mrobust
13
  base_model: neuralmind/bert-base-portuguese-cased
14
  ---
15
 
 
37
 
38
  ### Training Data
39
 
40
+ - **Training Dataset**: mMARCO Portuguese (`unicamp-dl/mmarco`) - MS MARCO translated to Portuguese
41
+ - Used for training with triplets (query, positive document, negative document)
42
+ - **Validation Dataset**: mRobust (`unicamp-dl/mrobust`) - TREC Robust04 translated to Portuguese
43
+ - Used for validation and evaluation during training
44
  - **Format**: Triplets (query, positive document, negative document)
45
 
46
  ### Training Configuration
model_metadata.json CHANGED
@@ -13,7 +13,8 @@
13
  },
14
 
15
  "training": {
16
- "dataset": "mMARCO Portuguese",
 
17
  "num_iterations": 150000,
18
  "final_loss": 0.000047,
19
  "batch_size": 8,
 
13
  },
14
 
15
  "training": {
16
+ "training_dataset": "mMARCO Portuguese (unicamp-dl/mmarco)",
17
+ "validation_dataset": "mRobust (unicamp-dl/mrobust)",
18
  "num_iterations": 150000,
19
  "final_loss": 0.000047,
20
  "batch_size": 8,