EphAsad
/

FireDevourerEmbedder-RL-v3.6

@@ -76,6 +76,27 @@ A multi-task sentence embedding model that uses **Reinforcement Learning** to dy
 FireDevourerEmbedder introduces an **RL-based adaptive task weighting system** that automatically adjusts the importance of each training task based on validation performance. Instead of using fixed task weights, a policy network learns optimal weight distributions during training, leading to better overall performance across diverse NLU benchmarks.
 ## Model Details
 | Property | Value |
@@ -134,6 +155,27 @@ The model was trained on 5 balanced datasets with 100,000 samples each (500,000
 | [PAWS](https://huggingface.co/datasets/google-research-datasets/paws) | Paraphrase Detection | Adversarial | 100,000 |
 | [MRPC](https://huggingface.co/datasets/nyu-mll/glue) | Paraphrase Detection | News | 100,000 |
 ### Training Configuration
 | Parameter | Value |

 FireDevourerEmbedder introduces an **RL-based adaptive task weighting system** that automatically adjusts the importance of each training task based on validation performance. Instead of using fixed task weights, a policy network learns optimal weight distributions during training, leading to better overall performance across diverse NLU benchmarks.
+## Why Multi-Task? Information-Dense Embeddings
+The core philosophy behind FireDevourerEmbedder is that **multi-task learning creates richer, more information-dense embeddings** than single-task approaches.
+By training with multiple task heads simultaneously, the shared encoder is forced to learn representations that capture:
+| Dimension | Learned From | What It Captures |
+|-----------|--------------|------------------|
+| **Semantic Similarity** | STS-B | Fine-grained meaning overlap |
+| **Logical Relationships** | MultiNLI | Entailment, contradiction, neutrality |
+| **Question Semantics** | QQP | Intent and duplicate detection |
+| **Adversarial Patterns** | PAWS | Word-order sensitivity, paraphrase robustness |
+| **Domain Awareness** | All datasets | Context-appropriate representations |
+This results in embeddings that are:
+- **More robust** - trained to handle diverse linguistic phenomena
+- **More transferable** - generalize better to unseen tasks
+- **More informative** - each dimension of the embedding vector carries meaningful semantic signal
+Unlike single-task embedders that optimize for one objective, FireDevourerEmbedder's embeddings simultaneously encode multiple facets of meaning, making them suitable for a wide range of downstream applications without fine-tuning.
 ## Model Details
 | Property | Value |
 | [PAWS](https://huggingface.co/datasets/google-research-datasets/paws) | Paraphrase Detection | Adversarial | 100,000 |
 | [MRPC](https://huggingface.co/datasets/nyu-mll/glue) | Paraphrase Detection | News | 100,000 |
+### Data Augmentation Strategy
+To prevent training bias, all datasets were balanced to exactly **100,000 samples** each:
+| Dataset | Original Size | Augmentation Method |
+|---------|---------------|---------------------|
+| STS-B | ~8,600 | Repetition (~12x) + pair swapping |
+| MultiNLI | ~433,000 | Subsampling |
+| QQP | ~400,000 | Subsampling |
+| PAWS | ~49,000 | Repetition (~2x) + pair swapping |
+| MRPC | ~3,600 | Repetition (~10x, capped) + pair swapping |
+**Why this matters:**
+- Without balancing, larger datasets (QQP, MultiNLI) would dominate training
+- Smaller but valuable datasets (MRPC, STS-B) would be underrepresented
+- Equal representation ensures the model learns equally from all task types
+**Augmentation techniques:**
+- **Repetition**: Smaller datasets repeated up to 10x maximum to prevent memorization
+- **Sentence pair swapping**: For symmetric tasks, (A, B) pairs also trained as (B, A)
 ### Training Configuration
 | Parameter | Value |