DomainEmbedder-v2.6

High-Information-Density Embeddings for Cross-Domain RAG and Retrieval

DomainEmbedder-v2.6 produces information-dense embeddings optimized for retrieval-augmented generation (RAG) and cross-domain similarity matching. It combines a multi-task base embedder with domain-adaptive LoRA routing.

What This Model Does

Component	Description
Base Embedder	FireDevourerEmbedder-RL-v3.6 trained on 5 NLP tasks with RL-based task weighting
Domain LoRAs	5 specialized adapters (Medical, Legal, Code, Finance, Scientific)
RL Policy	Automatically selects the optimal domain adapter for any input

Why this matters for RAG/Retrieval:

Embeddings encode multiple facets of meaning (similarity, entailment, paraphrase, questions)
Domain routing provides context-appropriate representations
Results in more precise retrieval across diverse content types

Key Innovation: Dual RL Architecture

Stage	RL Application	Purpose
Base Model Training	Task Weight Policy	Dynamically balance 5 NLP objectives during training
Domain Extension	Adapter Selection Policy	Route to appropriate domain LoRA at inference

This dual RL approach is novel: RL at training time AND inference time.

Quick Start

Installation

pip install torch transformers peft

Loading the Model

import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel

# Device setup
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

# Define the base embedder architecture
class FireDevourerEmbedder(nn.Module):
    def __init__(self, base_model_name='sentence-transformers/all-MiniLM-L6-v2'):
        super().__init__()
        self.encoder = AutoModel.from_pretrained(base_model_name)
        self.hidden_size = 384

        # Task heads
        self.sts_head = nn.Sequential(nn.Linear(384, 1), nn.Sigmoid())
        self.nli_head = nn.Linear(384, 3)
        self.qqp_head = nn.Linear(384, 2)
        self.paws_head = nn.Linear(384, 2)
        self.domain_head = nn.Linear(384, 5)

    def mean_pool(self, token_embeddings, attention_mask):
        mask = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
        return torch.sum(token_embeddings * mask, 1) / torch.clamp(mask.sum(1), min=1e-9)

    def forward(self, input_ids, attention_mask, task='encode'):
        outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
        embedding = self.mean_pool(outputs.last_hidden_state, attention_mask)

        if task == 'encode':
            return embedding
        elif task == 'domain':
            return self.domain_head(embedding)
        # Add other tasks as needed

# Define RL Policy Network
class RLPolicyNetwork(nn.Module):
    def __init__(self, input_dim=384, hidden_dim=128, num_actions=5):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU()
        )
        self.policy_head = nn.Linear(hidden_dim, num_actions)
        self.value_head = nn.Linear(hidden_dim, 1)

    def forward(self, x):
        features = self.network(x)
        policy = torch.softmax(self.policy_head(features), dim=-1)
        value = self.value_head(features)
        return policy, value

# Load model
model_dir = "path/to/DomainEmbedder-v2.6"

# 1. Load base model with checkpoint
base_model = FireDevourerEmbedder()
checkpoint = torch.load(f"{model_dir}/FireDevourerEmbedder-RL-v3.6.pt", map_location=device)
base_model.load_state_dict(checkpoint['model_state_dict'], strict=False)
base_model.to(device)
base_model.eval()

# 2. Load RL policy
rl_policy = RLPolicyNetwork()
rl_checkpoint = torch.load(f"{model_dir}/rl_policy.pt", map_location=device)
rl_policy.load_state_dict(rl_checkpoint['policy_state_dict'])
rl_policy.to(device)
rl_policy.eval()

# 3. Load LoRA adapters (example: medical)
from peft import PeftModel
lora_model = PeftModel.from_pretrained(
    base_model.encoder,
    f"{model_dir}/medical_lora"
)

Computing Embeddings with Domain Selection

def get_domain_embedding(text, base_model, rl_policy, lora_models, tokenizer, device):
    """Get domain-aware embedding for input text."""
    # Tokenize
    inputs = tokenizer(text, return_tensors='pt', padding=True,
                       truncation=True, max_length=512).to(device)

    # Get base embedding
    with torch.no_grad():
        base_emb = base_model(inputs['input_ids'], inputs['attention_mask'], task='encode')

        # Get domain selection from RL policy
        policy_probs, _ = rl_policy(base_emb)
        domain_idx = torch.argmax(policy_probs, dim=-1).item()

    domains = ['medical', 'legal', 'code', 'finance', 'scientific']
    selected_domain = domains[domain_idx]
    confidence = policy_probs[0, domain_idx].item()

    return {
        'embedding': base_emb,
        'domain': selected_domain,
        'confidence': confidence,
        'all_probs': policy_probs[0].cpu().numpy()
    }

# Example usage
result = get_domain_embedding(
    "What are the symptoms of diabetes?",
    base_model, rl_policy, None, tokenizer, device
)
print(f"Domain: {result['domain']} (confidence: {result['confidence']:.2%})")

Architecture

Input Text
    │
    ▼
┌────────────────────────────────────────────┐
│  MiniLM-L6-v2 Encoder (FROZEN)             │
│  + Optional LoRA Adapter (domain-specific) │
│  384-dimensional output                     │
└────────────────────────────────────────────┘
    │
    ├──────────────────────────────────────────┐
    │                                          │
    ▼                                          ▼
┌─────────────────┐                  ┌──────────────────┐
│  Base Embedding │                  │   RL Policy Net  │
│    (384-dim)    │                  │   (66K params)   │
└─────────────────┘                  └──────────────────┘
                                              │
                                              ▼
                                     Domain Selection
                                     [Medical, Legal, Code,
                                      Finance, Scientific]
                                              │
                                              ▼
                              Load corresponding LoRA adapter
                                              │
                                              ▼
                              Domain-Adapted Embedding

Component Details

Component	Specification
Base Encoder	MiniLM-L6-v2 (22M params)
Embedding Dim	384
LoRA Rank	16
LoRA Alpha	32
LoRA Target	Query, Value projections
LoRA Params	147,456 per adapter (0.645%)
RL Policy	66,566 params
Domains	Medical, Legal, Code, Finance, Scientific

Performance

Base Model: Multi-Task Embedding Quality

The base FireDevourerEmbedder achieves 0.71 average across 5 distinct NLP tasks:

Task	Dataset	Score	What It Measures
Question Similarity	QQP	0.8636	Intent matching
Paraphrase Detection	PAWS	0.8459	Adversarial robustness
Paraphrase Detection	MRPC	0.7744	News domain paraphrase
NLI	MultiNLI	0.7465	Logical relationships
Semantic Similarity	STS-B	0.3366	Fine-grained similarity
Average		0.7134	Cross-task capability

Philosophy: Individual task scores are traded for cross-domain information density. This makes embeddings more versatile for RAG and retrieval across diverse content.

Domain Routing Accuracy

Training Results (In-Distribution)

Metric	Value
Domain Accuracy	92.5%
Average Reward	1.527
Training Steps	5,000

Stress-Test Benchmark (Semantically Similar Cross-Domain Phrases)

The benchmark intentionally uses complex, semantically similar phrases across domains to test robustness:

Metric	DomainEmbedder (RL+LoRA)	Base Model	Improvement
Domain Accuracy	56.0%	20.4%	+35.6%
Avg Confidence	28.5%	77.6%	More calibrated

Per-Domain Breakdown

Domain	DomainEmbedder	Base Model	Note
Finance	78.0%	0.0%	+78.0%
Medical	73.0%	0.0%	+73.0%
Legal	53.0%	15.0%	+38.0%
Scientific	48.0%	1.0%	+47.0%
Code	28.0%	86.0%	Base over-predicted code

Key Insight: The base model had an 86% "code" prediction bias with high confidence. The RL+LoRA system corrects this by providing balanced, calibrated domain distribution.

Training Details

Domain Training Data

Domain	Samples	Sources
Medical	40,000	MedQA-USMLE, MedQuAD, PubMedQA, Medical Meadow, ChatDoctor
Legal	40,000	EUR-LEX, CaseHold, ECTHR-A, ECTHR-B
Code	40,000	Code Alpaca, MBPP, Code Contests, Python Instructions
Finance	40,000	Finance Alpaca, FinGPT-FiQA, Financial QA
Scientific	40,000	arXiv, PubMed (87.3% real + 12.7% augmented)
Total	200,000

LoRA Training Configuration

Parameter	Value
Epochs	3 per domain
Batch Size	32
Learning Rate	2e-4
Loss	Contrastive (InfoNCE-style)
Trainable Params	147,456 (0.645% of base)
Warmup Steps	500
Max Gradient Norm	1.0

RL Training (Supervised A2C)

Parameter	Value
Algorithm	Actor-Critic (A2C)
Total Steps	5,000
Episodes per Step	5
Gamma (discount)	0.99
Entropy Coef	0.1 (high exploration)
Value Coef	0.5
Correctness Bonus	+1.0
Correctness Penalty	-0.5
Baseline Decay	0.99

Curriculum Learning Phases

Phase	Steps	Data	Accuracy
1 (Easy)	0-1,500	Clear domain examples (10K)	68.8% → 87.5%
2 (Moderate)	1,500-3,500	Easy + ambiguous (20K)	87.5% → 89.3%
3 (Hard)	3,500-5,000	All data incl. hybrid (28K)	89.3% → 92.5%

Training Progress

Version	Step	Accuracy	Reward
v2.1	500	68.8%	1.100
v2.2	1,000	80.1%	1.336
v2.3	1,500	87.5%	1.454
v2.4	2,000	88.9%	1.480
v2.5	3,000	89.3%	1.507
v2.6	4,000	92.5%	1.527

Package Contents

DomainEmbedder-v2.6/
├── FireDevourerEmbedder-RL-v3.6.pt   # Base model checkpoint (86.7 MB)
├── rl_policy.pt                       # Trained RL policy (0.27 MB)
├── metadata.json                      # Training metadata
├── README.md                          # This file
├── medical_lora/                      # Medical domain adapter (0.6 MB)
│   ├── adapter_config.json
│   └── adapter_model.safetensors
├── legal_lora/                        # Legal domain adapter (0.6 MB)
├── code_lora/                         # Code domain adapter (0.6 MB)
├── finance_lora/                      # Finance domain adapter (0.6 MB)
└── scientific_lora/                   # Scientific domain adapter (0.6 MB)

Total Size: ~90 MB (self-contained)

Intended Use

Best Use Cases

RAG Systems: Domain-aware retrieval for multi-domain knowledge bases
Cross-Domain Search: Finding similar content across Medical, Legal, Code, Finance, Scientific domains
Document Classification: Automatic domain routing for document processing pipelines
Semantic Similarity: Information-dense embeddings for precise matching
Multi-Domain Chatbots: Context-appropriate responses based on detected domain

Limitations

English Only: Trained exclusively on English data
Max Length: 512 tokens maximum input length
Domain Coverage: 5 domains only (Medical, Legal, Code, Finance, Scientific)
Stress-Test Accuracy: 56% on semantically similar cross-domain queries
STS-B Trade-off: Lower fine-grained similarity (0.34) for broader task coverage

Citation

@misc{domainembedder2025,
  author = {Asad, Zain},
  title = {DomainEmbedder: Domain-Adaptive Embeddings with Dual RL and LoRA},
  year = {2025},
  publisher = {Hugging Face},
  note = {Multi-task base embedder with RL-based task weighting + domain-specific LoRA adapters with curriculum learning}
}

Author

Zain Asad

License

MIT License

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for EphAsad/DomainEmbedder

Base model

EphAsad/FireDevourerEmbedder-RL-v3.6

Adapter

(1)

this model

Datasets used to train EphAsad/DomainEmbedder

Evaluation results

Training Accuracy
self-reported

0.925
Stress-Test Accuracy
self-reported

0.560