EphAsad commited on
Commit
c3015be
·
verified ·
1 Parent(s): 502da85

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -0
README.md CHANGED
@@ -76,6 +76,27 @@ A multi-task sentence embedding model that uses **Reinforcement Learning** to dy
76
 
77
  FireDevourerEmbedder introduces an **RL-based adaptive task weighting system** that automatically adjusts the importance of each training task based on validation performance. Instead of using fixed task weights, a policy network learns optimal weight distributions during training, leading to better overall performance across diverse NLU benchmarks.
78
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  ## Model Details
80
 
81
  | Property | Value |
@@ -134,6 +155,27 @@ The model was trained on 5 balanced datasets with 100,000 samples each (500,000
134
  | [PAWS](https://huggingface.co/datasets/google-research-datasets/paws) | Paraphrase Detection | Adversarial | 100,000 |
135
  | [MRPC](https://huggingface.co/datasets/nyu-mll/glue) | Paraphrase Detection | News | 100,000 |
136
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
  ### Training Configuration
138
 
139
  | Parameter | Value |
 
76
 
77
  FireDevourerEmbedder introduces an **RL-based adaptive task weighting system** that automatically adjusts the importance of each training task based on validation performance. Instead of using fixed task weights, a policy network learns optimal weight distributions during training, leading to better overall performance across diverse NLU benchmarks.
78
 
79
+ ## Why Multi-Task? Information-Dense Embeddings
80
+
81
+ The core philosophy behind FireDevourerEmbedder is that **multi-task learning creates richer, more information-dense embeddings** than single-task approaches.
82
+
83
+ By training with multiple task heads simultaneously, the shared encoder is forced to learn representations that capture:
84
+
85
+ | Dimension | Learned From | What It Captures |
86
+ |-----------|--------------|------------------|
87
+ | **Semantic Similarity** | STS-B | Fine-grained meaning overlap |
88
+ | **Logical Relationships** | MultiNLI | Entailment, contradiction, neutrality |
89
+ | **Question Semantics** | QQP | Intent and duplicate detection |
90
+ | **Adversarial Patterns** | PAWS | Word-order sensitivity, paraphrase robustness |
91
+ | **Domain Awareness** | All datasets | Context-appropriate representations |
92
+
93
+ This results in embeddings that are:
94
+ - **More robust** - trained to handle diverse linguistic phenomena
95
+ - **More transferable** - generalize better to unseen tasks
96
+ - **More informative** - each dimension of the embedding vector carries meaningful semantic signal
97
+
98
+ Unlike single-task embedders that optimize for one objective, FireDevourerEmbedder's embeddings simultaneously encode multiple facets of meaning, making them suitable for a wide range of downstream applications without fine-tuning.
99
+
100
  ## Model Details
101
 
102
  | Property | Value |
 
155
  | [PAWS](https://huggingface.co/datasets/google-research-datasets/paws) | Paraphrase Detection | Adversarial | 100,000 |
156
  | [MRPC](https://huggingface.co/datasets/nyu-mll/glue) | Paraphrase Detection | News | 100,000 |
157
 
158
+ ### Data Augmentation Strategy
159
+
160
+ To prevent training bias, all datasets were balanced to exactly **100,000 samples** each:
161
+
162
+ | Dataset | Original Size | Augmentation Method |
163
+ |---------|---------------|---------------------|
164
+ | STS-B | ~8,600 | Repetition (~12x) + pair swapping |
165
+ | MultiNLI | ~433,000 | Subsampling |
166
+ | QQP | ~400,000 | Subsampling |
167
+ | PAWS | ~49,000 | Repetition (~2x) + pair swapping |
168
+ | MRPC | ~3,600 | Repetition (~10x, capped) + pair swapping |
169
+
170
+ **Why this matters:**
171
+ - Without balancing, larger datasets (QQP, MultiNLI) would dominate training
172
+ - Smaller but valuable datasets (MRPC, STS-B) would be underrepresented
173
+ - Equal representation ensures the model learns equally from all task types
174
+
175
+ **Augmentation techniques:**
176
+ - **Repetition**: Smaller datasets repeated up to 10x maximum to prevent memorization
177
+ - **Sentence pair swapping**: For symmetric tasks, (A, B) pairs also trained as (B, A)
178
+
179
  ### Training Configuration
180
 
181
  | Parameter | Value |