File size: 12,403 Bytes

---

license: apache-2.0
language:
- en
tags:
- sentence-transformers
- embeddings
- retrieval
- agents
- memory
- rag
- semantic-search
- ai-agents
- llm-memory
- vector-search
library_name: transformers
pipeline_tag: sentence-similarity
datasets:
- custom
metrics:
- mrr
- recall
- ndcg
model-index:
- name: agentrank-base
  results:
  - task:
      type: retrieval
      name: Agent Memory Retrieval
    metrics:
    - type: mrr
      value: 0.6496
      name: MRR
    - type: recall
      value: 0.4440
      name: Recall@1
    - type: recall
      value: 0.9960
      name: Recall@5
    - type: ndcg
      value: 0.6786
      name: NDCG@10
---


<div align="center">

# 🧠 AgentRank-Base

### The First Embedding Model Built Specifically for AI Agent Memory Retrieval

<p>
  <img src="https://img.shields.io/badge/MRR-0.65-brightgreen?style=for-the-badge" alt="MRR">
  <img src="https://img.shields.io/badge/Recall%405-99.6%25-blue?style=for-the-badge" alt="Recall@5">
  <img src="https://img.shields.io/badge/Parameters-149M-orange?style=for-the-badge" alt="Parameters">
  <img src="https://img.shields.io/badge/License-Apache%202.0-green?style=for-the-badge" alt="License">
</p>

**+23% MRR improvement over general-purpose embedders** | **Temporal awareness** | **Memory type understanding**

[🚀 Quick Start](#-quick-start) • [📊 Benchmarks](#-benchmarks) • [🔧 Architecture](#-architecture) • [💡 Why AgentRank?](#-why-agentrank)

</div>

---

## 🎯 TL;DR

> **AgentRank-Base** is an embedding model designed for AI agents that need to remember. Unlike generic embedders (OpenAI, Cohere, MiniLM), AgentRank understands:
> - ⏰ **When** something happened (temporal awareness)
> - 📁 **What type** of memory it is (episodic vs semantic vs procedural)
> - ⭐ **How important** the memory is

---

## 💡 Why AgentRank?

### The Problem with Current Embedders

AI agents need memory. But when you ask an agent:

> *"What did we discuss about Python **yesterday**?"*

Current embedders fail because they:
- ❌ Don't understand "yesterday" means recent time
- ❌ Can't distinguish between a preference and an event
- ❌ Treat all memories as equally important

### The AgentRank Solution

| Challenge | OpenAI/Cohere/MiniLM | AgentRank |
|-----------|---------------------|-----------|
| "What did I say **yesterday**?" | Random old results 😕 | Recent memories first ✅ |
| "What's my **preference**?" | Mixed with events 😕 | Only preferences ✅ |
| "What's **most important**?" | No priority 😕 | Importance-aware retrieval ✅ |

---

## 📊 Benchmarks

Evaluated on **AgentMemBench** (500 test samples, 8 candidates each):

| Model | Parameters | MRR ↑ | Recall@1 ↑ | Recall@5 ↑ | NDCG@10 ↑ |
|-------|------------|-------|------------|------------|-----------|
| **AgentRank-Base** | 149M | **0.6496** | **0.4440** | **0.9960** | **0.6786** |
| AgentRank-Small | 33M | 0.6375 | 0.4460 | 0.9740 | 0.6797 |
| all-mpnet-base-v2 | 109M | 0.5351 | 0.3660 | 0.7960 | 0.6335 |
| all-MiniLM-L6-v2 | 22M | 0.5297 | 0.3720 | 0.7520 | 0.6370 |

### Improvement Over Baselines

| vs Baseline | MRR | Recall@1 | Recall@5 |
|-------------|-----|----------|----------|
| vs MiniLM | **+22.6%** | **+19.4%** | **+32.4%** |
| vs MPNet | **+21.4%** | **+21.3%** | **+25.1%** |

---

## 🚀 Quick Start

### Installation

```bash

pip install transformers torch

```

### Basic Usage

```python

from transformers import AutoModel, AutoTokenizer

import torch



# Load model and tokenizer

model = AutoModel.from_pretrained("vrushket/agentrank-base")

tokenizer = AutoTokenizer.from_pretrained("vrushket/agentrank-base")



def encode(texts, model, tokenizer):

    """Encode texts to embeddings."""

    inputs = tokenizer(

        texts, 

        padding=True, 

        truncation=True, 

        max_length=512,

        return_tensors="pt"

    )

    with torch.no_grad():

        outputs = model(**inputs)

        # Mean pooling

        embeddings = outputs.last_hidden_state.mean(dim=1)

        # L2 normalize

        embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)

    return embeddings



# Your agent's memories

memories = [

    "User prefers Python over JavaScript for backend development",

    "User asked about React frameworks yesterday",

    "User mentioned they have 3 years of coding experience",

    "User is working on an e-commerce project",

]



# A query from the user

query = "What programming language does the user prefer?"



# Encode everything

memory_embeddings = encode(memories, model, tokenizer)

query_embedding = encode([query], model, tokenizer)



# Find most similar memory

similarities = torch.mm(query_embedding, memory_embeddings.T)[0]

best_match_idx = similarities.argmax().item()



print(f"Query: {query}")

print(f"Best match: {memories[best_match_idx]}")

print(f"Similarity: {similarities[best_match_idx]:.4f}")



# Output:

# Query: What programming language does the user prefer?

# Best match: User prefers Python over JavaScript for backend development

# Similarity: 0.8234

```

### Advanced Usage with Metadata

For full temporal and memory type awareness, use the AgentRank package:

```python

# Coming soon: pip install agentrank

from agentrank import AgentRankEmbedder



model = AgentRankEmbedder.from_pretrained("vrushket/agentrank-base")



# Encode with temporal context

memory_embedding = model.encode(

    text="User mentioned they prefer morning meetings",

    days_ago=7,           # Memory is 1 week old

    memory_type="semantic" # It's a preference (not an event)

)



# Encode query (no metadata needed for queries)

query_embedding = model.encode("When does the user like to have meetings?")



# The model now knows this is a week-old preference!

similarity = torch.cosine_similarity(query_embedding, memory_embedding, dim=0)

```

---

## 🔧 Architecture

AgentRank-Base is built on **ModernBERT-base** (110M params) with novel additions:

```

┌─────────────────────────────────────────────────┐

│     ModernBERT Encoder (22 Transformer Layers)  │

│     - RoPE Positional Encoding                  │

│     - Flash Attention                           │

│     - 768 Hidden Dimension                      │

└─────────────────────────────────────────────────┘

                       │

       ┌───────────────┼───────────────┐

       ↓               ↓               ↓

┌─────────────┐ ┌─────────────┐ ┌─────────────┐

│  Temporal   │ │  Memory     │ │ Importance  │

│  Position   │ │  Type       │ │ Prediction  │

│  Embeddings │ │  Embeddings │ │ Head        │

│  (10 × 768) │ │  (4 × 768)  │ │ (768→1)     │

└─────────────┘ └─────────────┘ └─────────────┘

       │               │               │

       └───────────────┼───────────────┘

                       ↓

          ┌─────────────────────┐

          │  Projection Layer   │

          │  (768 → 768)        │

          └─────────────────────┘

                       ↓

          ┌─────────────────────┐

          │  L2 Normalization   │

          │  768-dim Embedding  │

          └─────────────────────┘

```

### Novel Components

| Component | Purpose | How It Helps |
|-----------|---------|--------------|
| **Temporal Embeddings** | Encodes memory age (today, this week, last month, etc.) | "Yesterday" queries match recent memories |
| **Memory Type Embeddings** | Distinguishes episodic/semantic/procedural | "What do I like?" matches preferences, not events |
| **Importance Head** | Auxiliary task predicting memory priority | Helps learn better representations |

### Temporal Buckets

```

Bucket 0: Today (0-1 days)

Bucket 1: Recent (1-3 days)

Bucket 2: This week (3-7 days)

Bucket 3: Last week (7-14 days)

Bucket 4: This month (14-30 days)

Bucket 5: Last month (30-60 days)

Bucket 6: Few months (60-90 days)

Bucket 7: Half year (90-180 days)

Bucket 8: This year (180-365 days)

Bucket 9: Long ago (365+ days)

```

### Memory Types

```

Type 0: Episodic   → Events, conversations ("We discussed X yesterday")

Type 1: Semantic   → Facts, preferences ("User likes Python")

Type 2: Procedural → Instructions ("To deploy, run npm build")

Type 3: Unknown    → Fallback

```

---

## 🎓 Training Details

| Aspect | Details |
|--------|---------|
| **Base Model** | answerdotai/ModernBERT-base (110M params) |
| **Training Data** | 500K synthetic agent memory samples |
| **Memory Distribution** | Episodic (40%), Semantic (35%), Procedural (25%) |
| **Loss Function** | Multiple Negatives Ranking Loss + Importance MSE |
| **Hard Negatives** | 7 per sample (5 types: temporal, type confusion, topic drift, etc.) |
| **Batch Size** | 16-32 per GPU |
| **Hardware** | 2× NVIDIA RTX 6000 Ada (48GB each) |
| **Training Time** | ~12 hours |
| **Precision** | FP16 Mixed Precision |
| **Final Val Loss** | 0.877 |

---

## 🏗️ Use Cases

### 1. AI Agents with Long-Term Memory

```python

# Store memories with metadata

agent.remember(

    text="User is allergic to peanuts",

    memory_type="semantic",

    importance=10,  # Critical medical info!

)



# Later, when discussing food...

relevant_memories = agent.recall("What should I know about the user's diet?")

# Returns: "User is allergic to peanuts" (even if stored months ago)

```

### 2. RAG Systems for Conversational AI

```python

# Better retrieval for chatbots

query = "What did we talk about in our last meeting?"

# AgentRank returns recent, relevant conversations

# Generic embedders return random topically-similar docs

```

### 3. Personal Knowledge Bases

```python

# User's notes and preferences

memories = [

    "I prefer dark mode in all apps",

    "My morning routine starts at 6 AM",

    "Important: Tax deadline April 15",

]

# AgentRank properly handles time-sensitive queries

```

---

## 🆚 When to Use AgentRank vs Others

| Use Case | Best Model |
|----------|------------|
| **AI agents with memory** | ✅ AgentRank |
| **Time-sensitive retrieval** | ✅ AgentRank |
| **Conversational AI** | ✅ AgentRank |
| General document search | OpenAI / Cohere |
| Code search | CodeBERT |
| Scientific papers | SciBERT |

---

## 📁 Model Family

| Model | Parameters | Speed | Quality | Best For |
|-------|------------|-------|---------|----------|
| [agentrank-small](https://huggingface.co/vrushket/agentrank-small) | 33M | ⚡⚡⚡ Fast | Good | Real-time agents, edge |
| **agentrank-base** | 149M | ⚡⚡ Medium | **Best** | Quality-critical apps |
| agentrank-reranker (coming) | 149M | ⚡ Slower | Superior | Two-stage retrieval |

---

## 📚 Citation

```bibtex

@misc{agentrank2024,

  author = {Vrushket More},

  title = {AgentRank: Embedding Models for AI Agent Memory Retrieval},

  year = {2024},

  publisher = {HuggingFace},

  url = {https://huggingface.co/vrushket/agentrank-base}

}

```

---

## 🤝 Community & Support

- 🐛 **Issues**: [GitHub Issues](https://github.com/vmore2/AgentRank-base/issues)
- 💬 **Discussions**: [HuggingFace Community](https://huggingface.co/vrushket/agentrank-base/discussions)
- 📧 **Contact**: vrushket2604@gmail.com

---

## 📄 License

Apache 2.0 - **Free for commercial use!**

---

<div align="center">

### ⭐ If AgentRank helps your project, please star the repo!

**Built with ❤️ for the AI agent community**

</div>