DomainEmbedder-v2.6

High-Information-Density Embeddings for Cross-Domain RAG and Retrieval

DomainEmbedder-v2.6 produces information-dense embeddings optimized for retrieval-augmented generation (RAG) and cross-domain similarity matching. It combines a multi-task base embedder with domain-adaptive LoRA routing.

What This Model Does

Component Description
Base Embedder FireDevourerEmbedder-RL-v3.6 trained on 5 NLP tasks with RL-based task weighting
Domain LoRAs 5 specialized adapters (Medical, Legal, Code, Finance, Scientific)
RL Policy Automatically selects the optimal domain adapter for any input

Why this matters for RAG/Retrieval:

  • Embeddings encode multiple facets of meaning (similarity, entailment, paraphrase, questions)
  • Domain routing provides context-appropriate representations
  • Results in more precise retrieval across diverse content types

Key Innovation: Dual RL Architecture

Stage RL Application Purpose
Base Model Training Task Weight Policy Dynamically balance 5 NLP objectives during training
Domain Extension Adapter Selection Policy Route to appropriate domain LoRA at inference

This dual RL approach is novel: RL at training time AND inference time.

Quick Start

Installation

pip install torch transformers peft

Loading the Model

import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel

# Device setup
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

# Define the base embedder architecture
class FireDevourerEmbedder(nn.Module):
    def __init__(self, base_model_name='sentence-transformers/all-MiniLM-L6-v2'):
        super().__init__()
        self.encoder = AutoModel.from_pretrained(base_model_name)
        self.hidden_size = 384

        # Task heads
        self.sts_head = nn.Sequential(nn.Linear(384, 1), nn.Sigmoid())
        self.nli_head = nn.Linear(384, 3)
        self.qqp_head = nn.Linear(384, 2)
        self.paws_head = nn.Linear(384, 2)
        self.domain_head = nn.Linear(384, 5)

    def mean_pool(self, token_embeddings, attention_mask):
        mask = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
        return torch.sum(token_embeddings * mask, 1) / torch.clamp(mask.sum(1), min=1e-9)

    def forward(self, input_ids, attention_mask, task='encode'):
        outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
        embedding = self.mean_pool(outputs.last_hidden_state, attention_mask)

        if task == 'encode':
            return embedding
        elif task == 'domain':
            return self.domain_head(embedding)
        # Add other tasks as needed

# Define RL Policy Network
class RLPolicyNetwork(nn.Module):
    def __init__(self, input_dim=384, hidden_dim=128, num_actions=5):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU()
        )
        self.policy_head = nn.Linear(hidden_dim, num_actions)
        self.value_head = nn.Linear(hidden_dim, 1)

    def forward(self, x):
        features = self.network(x)
        policy = torch.softmax(self.policy_head(features), dim=-1)
        value = self.value_head(features)
        return policy, value

# Load model
model_dir = "path/to/DomainEmbedder-v2.6"

# 1. Load base model with checkpoint
base_model = FireDevourerEmbedder()
checkpoint = torch.load(f"{model_dir}/FireDevourerEmbedder-RL-v3.6.pt", map_location=device)
base_model.load_state_dict(checkpoint['model_state_dict'], strict=False)
base_model.to(device)
base_model.eval()

# 2. Load RL policy
rl_policy = RLPolicyNetwork()
rl_checkpoint = torch.load(f"{model_dir}/rl_policy.pt", map_location=device)
rl_policy.load_state_dict(rl_checkpoint['policy_state_dict'])
rl_policy.to(device)
rl_policy.eval()

# 3. Load LoRA adapters (example: medical)
from peft import PeftModel
lora_model = PeftModel.from_pretrained(
    base_model.encoder,
    f"{model_dir}/medical_lora"
)

Computing Embeddings with Domain Selection

def get_domain_embedding(text, base_model, rl_policy, lora_models, tokenizer, device):
    """Get domain-aware embedding for input text."""
    # Tokenize
    inputs = tokenizer(text, return_tensors='pt', padding=True,
                       truncation=True, max_length=512).to(device)

    # Get base embedding
    with torch.no_grad():
        base_emb = base_model(inputs['input_ids'], inputs['attention_mask'], task='encode')

        # Get domain selection from RL policy
        policy_probs, _ = rl_policy(base_emb)
        domain_idx = torch.argmax(policy_probs, dim=-1).item()

    domains = ['medical', 'legal', 'code', 'finance', 'scientific']
    selected_domain = domains[domain_idx]
    confidence = policy_probs[0, domain_idx].item()

    return {
        'embedding': base_emb,
        'domain': selected_domain,
        'confidence': confidence,
        'all_probs': policy_probs[0].cpu().numpy()
    }

# Example usage
result = get_domain_embedding(
    "What are the symptoms of diabetes?",
    base_model, rl_policy, None, tokenizer, device
)
print(f"Domain: {result['domain']} (confidence: {result['confidence']:.2%})")

Architecture

Input Text
    │
    ▼
┌────────────────────────────────────────────┐
│  MiniLM-L6-v2 Encoder (FROZEN)             │
│  + Optional LoRA Adapter (domain-specific) │
│  384-dimensional output                     │
└────────────────────────────────────────────┘
    │
    ├──────────────────────────────────────────┐
    │                                          │
    ▼                                          ▼
┌─────────────────┐                  ┌──────────────────┐
│  Base Embedding │                  │   RL Policy Net  │
│    (384-dim)    │                  │   (66K params)   │
└─────────────────┘                  └──────────────────┘
                                              │
                                              ▼
                                     Domain Selection
                                     [Medical, Legal, Code,
                                      Finance, Scientific]
                                              │
                                              ▼
                              Load corresponding LoRA adapter
                                              │
                                              ▼
                              Domain-Adapted Embedding

Component Details

Component Specification
Base Encoder MiniLM-L6-v2 (22M params)
Embedding Dim 384
LoRA Rank 16
LoRA Alpha 32
LoRA Target Query, Value projections
LoRA Params 147,456 per adapter (0.645%)
RL Policy 66,566 params
Domains Medical, Legal, Code, Finance, Scientific

Performance

Base Model: Multi-Task Embedding Quality

The base FireDevourerEmbedder achieves 0.71 average across 5 distinct NLP tasks:

Task Dataset Score What It Measures
Question Similarity QQP 0.8636 Intent matching
Paraphrase Detection PAWS 0.8459 Adversarial robustness
Paraphrase Detection MRPC 0.7744 News domain paraphrase
NLI MultiNLI 0.7465 Logical relationships
Semantic Similarity STS-B 0.3366 Fine-grained similarity
Average 0.7134 Cross-task capability

Philosophy: Individual task scores are traded for cross-domain information density. This makes embeddings more versatile for RAG and retrieval across diverse content.

Domain Routing Accuracy

Training Results (In-Distribution)

Metric Value
Domain Accuracy 92.5%
Average Reward 1.527
Training Steps 5,000

Stress-Test Benchmark (Semantically Similar Cross-Domain Phrases)

The benchmark intentionally uses complex, semantically similar phrases across domains to test robustness:

Metric DomainEmbedder (RL+LoRA) Base Model Improvement
Domain Accuracy 56.0% 20.4% +35.6%
Avg Confidence 28.5% 77.6% More calibrated

Per-Domain Breakdown

Domain DomainEmbedder Base Model Note
Finance 78.0% 0.0% +78.0%
Medical 73.0% 0.0% +73.0%
Legal 53.0% 15.0% +38.0%
Scientific 48.0% 1.0% +47.0%
Code 28.0% 86.0% Base over-predicted code

Key Insight: The base model had an 86% "code" prediction bias with high confidence. The RL+LoRA system corrects this by providing balanced, calibrated domain distribution.

Training Details

Domain Training Data

Domain Samples Sources
Medical 40,000 MedQA-USMLE, MedQuAD, PubMedQA, Medical Meadow, ChatDoctor
Legal 40,000 EUR-LEX, CaseHold, ECTHR-A, ECTHR-B
Code 40,000 Code Alpaca, MBPP, Code Contests, Python Instructions
Finance 40,000 Finance Alpaca, FinGPT-FiQA, Financial QA
Scientific 40,000 arXiv, PubMed (87.3% real + 12.7% augmented)
Total 200,000

LoRA Training Configuration

Parameter Value
Epochs 3 per domain
Batch Size 32
Learning Rate 2e-4
Loss Contrastive (InfoNCE-style)
Trainable Params 147,456 (0.645% of base)
Warmup Steps 500
Max Gradient Norm 1.0

RL Training (Supervised A2C)

Parameter Value
Algorithm Actor-Critic (A2C)
Total Steps 5,000
Episodes per Step 5
Gamma (discount) 0.99
Entropy Coef 0.1 (high exploration)
Value Coef 0.5
Correctness Bonus +1.0
Correctness Penalty -0.5
Baseline Decay 0.99

Curriculum Learning Phases

Phase Steps Data Accuracy
1 (Easy) 0-1,500 Clear domain examples (10K) 68.8% → 87.5%
2 (Moderate) 1,500-3,500 Easy + ambiguous (20K) 87.5% → 89.3%
3 (Hard) 3,500-5,000 All data incl. hybrid (28K) 89.3% → 92.5%

Training Progress

Version Step Accuracy Reward
v2.1 500 68.8% 1.100
v2.2 1,000 80.1% 1.336
v2.3 1,500 87.5% 1.454
v2.4 2,000 88.9% 1.480
v2.5 3,000 89.3% 1.507
v2.6 4,000 92.5% 1.527

Package Contents

DomainEmbedder-v2.6/
├── FireDevourerEmbedder-RL-v3.6.pt   # Base model checkpoint (86.7 MB)
├── rl_policy.pt                       # Trained RL policy (0.27 MB)
├── metadata.json                      # Training metadata
├── README.md                          # This file
├── medical_lora/                      # Medical domain adapter (0.6 MB)
│   ├── adapter_config.json
│   └── adapter_model.safetensors
├── legal_lora/                        # Legal domain adapter (0.6 MB)
├── code_lora/                         # Code domain adapter (0.6 MB)
├── finance_lora/                      # Finance domain adapter (0.6 MB)
└── scientific_lora/                   # Scientific domain adapter (0.6 MB)

Total Size: ~90 MB (self-contained)

Intended Use

Best Use Cases

  • RAG Systems: Domain-aware retrieval for multi-domain knowledge bases
  • Cross-Domain Search: Finding similar content across Medical, Legal, Code, Finance, Scientific domains
  • Document Classification: Automatic domain routing for document processing pipelines
  • Semantic Similarity: Information-dense embeddings for precise matching
  • Multi-Domain Chatbots: Context-appropriate responses based on detected domain

Limitations

  • English Only: Trained exclusively on English data
  • Max Length: 512 tokens maximum input length
  • Domain Coverage: 5 domains only (Medical, Legal, Code, Finance, Scientific)
  • Stress-Test Accuracy: 56% on semantically similar cross-domain queries
  • STS-B Trade-off: Lower fine-grained similarity (0.34) for broader task coverage

Citation

@misc{domainembedder2025,
  author = {Asad, Zain},
  title = {DomainEmbedder: Domain-Adaptive Embeddings with Dual RL and LoRA},
  year = {2025},
  publisher = {Hugging Face},
  note = {Multi-task base embedder with RL-based task weighting + domain-specific LoRA adapters with curriculum learning}
}

Author

Zain Asad

License

MIT License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EphAsad/DomainEmbedder

Datasets used to train EphAsad/DomainEmbedder

Evaluation results