agentrank-base / README.md
vrushket's picture
Upload AgentRank model
a239dd5 verified
---
license: apache-2.0
language:
- en
tags:
- sentence-transformers
- embeddings
- retrieval
- agents
- memory
- rag
- semantic-search
- ai-agents
- llm-memory
- vector-search
library_name: transformers
pipeline_tag: sentence-similarity
datasets:
- custom
metrics:
- mrr
- recall
- ndcg
model-index:
- name: agentrank-base
results:
- task:
type: retrieval
name: Agent Memory Retrieval
metrics:
- type: mrr
value: 0.6496
name: MRR
- type: recall
value: 0.4440
name: Recall@1
- type: recall
value: 0.9960
name: Recall@5
- type: ndcg
value: 0.6786
name: NDCG@10
---
<div align="center">
# 🧠 AgentRank-Base
### The First Embedding Model Built Specifically for AI Agent Memory Retrieval
<p>
<img src="https://img.shields.io/badge/MRR-0.65-brightgreen?style=for-the-badge" alt="MRR">
<img src="https://img.shields.io/badge/Recall%405-99.6%25-blue?style=for-the-badge" alt="Recall@5">
<img src="https://img.shields.io/badge/Parameters-149M-orange?style=for-the-badge" alt="Parameters">
<img src="https://img.shields.io/badge/License-Apache%202.0-green?style=for-the-badge" alt="License">
</p>
**+23% MRR improvement over general-purpose embedders** | **Temporal awareness** | **Memory type understanding**
[πŸš€ Quick Start](#-quick-start) β€’ [πŸ“Š Benchmarks](#-benchmarks) β€’ [πŸ”§ Architecture](#-architecture) β€’ [πŸ’‘ Why AgentRank?](#-why-agentrank)
</div>
---
## 🎯 TL;DR
> **AgentRank-Base** is an embedding model designed for AI agents that need to remember. Unlike generic embedders (OpenAI, Cohere, MiniLM), AgentRank understands:
> - ⏰ **When** something happened (temporal awareness)
> - πŸ“ **What type** of memory it is (episodic vs semantic vs procedural)
> - ⭐ **How important** the memory is
---
## πŸ’‘ Why AgentRank?
### The Problem with Current Embedders
AI agents need memory. But when you ask an agent:
> *"What did we discuss about Python **yesterday**?"*
Current embedders fail because they:
- ❌ Don't understand "yesterday" means recent time
- ❌ Can't distinguish between a preference and an event
- ❌ Treat all memories as equally important
### The AgentRank Solution
| Challenge | OpenAI/Cohere/MiniLM | AgentRank |
|-----------|---------------------|-----------|
| "What did I say **yesterday**?" | Random old results πŸ˜• | Recent memories first βœ… |
| "What's my **preference**?" | Mixed with events πŸ˜• | Only preferences βœ… |
| "What's **most important**?" | No priority πŸ˜• | Importance-aware retrieval βœ… |
---
## πŸ“Š Benchmarks
Evaluated on **AgentMemBench** (500 test samples, 8 candidates each):
| Model | Parameters | MRR ↑ | Recall@1 ↑ | Recall@5 ↑ | NDCG@10 ↑ |
|-------|------------|-------|------------|------------|-----------|
| **AgentRank-Base** | 149M | **0.6496** | **0.4440** | **0.9960** | **0.6786** |
| AgentRank-Small | 33M | 0.6375 | 0.4460 | 0.9740 | 0.6797 |
| all-mpnet-base-v2 | 109M | 0.5351 | 0.3660 | 0.7960 | 0.6335 |
| all-MiniLM-L6-v2 | 22M | 0.5297 | 0.3720 | 0.7520 | 0.6370 |
### Improvement Over Baselines
| vs Baseline | MRR | Recall@1 | Recall@5 |
|-------------|-----|----------|----------|
| vs MiniLM | **+22.6%** | **+19.4%** | **+32.4%** |
| vs MPNet | **+21.4%** | **+21.3%** | **+25.1%** |
---
## πŸš€ Quick Start
### Installation
```bash
pip install transformers torch
```
### Basic Usage
```python
from transformers import AutoModel, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModel.from_pretrained("vrushket/agentrank-base")
tokenizer = AutoTokenizer.from_pretrained("vrushket/agentrank-base")
def encode(texts, model, tokenizer):
"""Encode texts to embeddings."""
inputs = tokenizer(
texts,
padding=True,
truncation=True,
max_length=512,
return_tensors="pt"
)
with torch.no_grad():
outputs = model(**inputs)
# Mean pooling
embeddings = outputs.last_hidden_state.mean(dim=1)
# L2 normalize
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
return embeddings
# Your agent's memories
memories = [
"User prefers Python over JavaScript for backend development",
"User asked about React frameworks yesterday",
"User mentioned they have 3 years of coding experience",
"User is working on an e-commerce project",
]
# A query from the user
query = "What programming language does the user prefer?"
# Encode everything
memory_embeddings = encode(memories, model, tokenizer)
query_embedding = encode([query], model, tokenizer)
# Find most similar memory
similarities = torch.mm(query_embedding, memory_embeddings.T)[0]
best_match_idx = similarities.argmax().item()
print(f"Query: {query}")
print(f"Best match: {memories[best_match_idx]}")
print(f"Similarity: {similarities[best_match_idx]:.4f}")
# Output:
# Query: What programming language does the user prefer?
# Best match: User prefers Python over JavaScript for backend development
# Similarity: 0.8234
```
### Advanced Usage with Metadata
For full temporal and memory type awareness, use the AgentRank package:
```python
# Coming soon: pip install agentrank
from agentrank import AgentRankEmbedder
model = AgentRankEmbedder.from_pretrained("vrushket/agentrank-base")
# Encode with temporal context
memory_embedding = model.encode(
text="User mentioned they prefer morning meetings",
days_ago=7, # Memory is 1 week old
memory_type="semantic" # It's a preference (not an event)
)
# Encode query (no metadata needed for queries)
query_embedding = model.encode("When does the user like to have meetings?")
# The model now knows this is a week-old preference!
similarity = torch.cosine_similarity(query_embedding, memory_embedding, dim=0)
```
---
## πŸ”§ Architecture
AgentRank-Base is built on **ModernBERT-base** (110M params) with novel additions:
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ModernBERT Encoder (22 Transformer Layers) β”‚
β”‚ - RoPE Positional Encoding β”‚
β”‚ - Flash Attention β”‚
β”‚ - 768 Hidden Dimension β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
↓ ↓ ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Temporal β”‚ β”‚ Memory β”‚ β”‚ Importance β”‚
β”‚ Position β”‚ β”‚ Type β”‚ β”‚ Prediction β”‚
β”‚ Embeddings β”‚ β”‚ Embeddings β”‚ β”‚ Head β”‚
β”‚ (10 Γ— 768) β”‚ β”‚ (4 Γ— 768) β”‚ β”‚ (768β†’1) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Projection Layer β”‚
β”‚ (768 β†’ 768) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ L2 Normalization β”‚
β”‚ 768-dim Embedding β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
### Novel Components
| Component | Purpose | How It Helps |
|-----------|---------|--------------|
| **Temporal Embeddings** | Encodes memory age (today, this week, last month, etc.) | "Yesterday" queries match recent memories |
| **Memory Type Embeddings** | Distinguishes episodic/semantic/procedural | "What do I like?" matches preferences, not events |
| **Importance Head** | Auxiliary task predicting memory priority | Helps learn better representations |
### Temporal Buckets
```
Bucket 0: Today (0-1 days)
Bucket 1: Recent (1-3 days)
Bucket 2: This week (3-7 days)
Bucket 3: Last week (7-14 days)
Bucket 4: This month (14-30 days)
Bucket 5: Last month (30-60 days)
Bucket 6: Few months (60-90 days)
Bucket 7: Half year (90-180 days)
Bucket 8: This year (180-365 days)
Bucket 9: Long ago (365+ days)
```
### Memory Types
```
Type 0: Episodic β†’ Events, conversations ("We discussed X yesterday")
Type 1: Semantic β†’ Facts, preferences ("User likes Python")
Type 2: Procedural β†’ Instructions ("To deploy, run npm build")
Type 3: Unknown β†’ Fallback
```
---
## πŸŽ“ Training Details
| Aspect | Details |
|--------|---------|
| **Base Model** | answerdotai/ModernBERT-base (110M params) |
| **Training Data** | 500K synthetic agent memory samples |
| **Memory Distribution** | Episodic (40%), Semantic (35%), Procedural (25%) |
| **Loss Function** | Multiple Negatives Ranking Loss + Importance MSE |
| **Hard Negatives** | 7 per sample (5 types: temporal, type confusion, topic drift, etc.) |
| **Batch Size** | 16-32 per GPU |
| **Hardware** | 2Γ— NVIDIA RTX 6000 Ada (48GB each) |
| **Training Time** | ~12 hours |
| **Precision** | FP16 Mixed Precision |
| **Final Val Loss** | 0.877 |
---
## πŸ—οΈ Use Cases
### 1. AI Agents with Long-Term Memory
```python
# Store memories with metadata
agent.remember(
text="User is allergic to peanuts",
memory_type="semantic",
importance=10, # Critical medical info!
)
# Later, when discussing food...
relevant_memories = agent.recall("What should I know about the user's diet?")
# Returns: "User is allergic to peanuts" (even if stored months ago)
```
### 2. RAG Systems for Conversational AI
```python
# Better retrieval for chatbots
query = "What did we talk about in our last meeting?"
# AgentRank returns recent, relevant conversations
# Generic embedders return random topically-similar docs
```
### 3. Personal Knowledge Bases
```python
# User's notes and preferences
memories = [
"I prefer dark mode in all apps",
"My morning routine starts at 6 AM",
"Important: Tax deadline April 15",
]
# AgentRank properly handles time-sensitive queries
```
---
## πŸ†š When to Use AgentRank vs Others
| Use Case | Best Model |
|----------|------------|
| **AI agents with memory** | βœ… AgentRank |
| **Time-sensitive retrieval** | βœ… AgentRank |
| **Conversational AI** | βœ… AgentRank |
| General document search | OpenAI / Cohere |
| Code search | CodeBERT |
| Scientific papers | SciBERT |
---
## πŸ“ Model Family
| Model | Parameters | Speed | Quality | Best For |
|-------|------------|-------|---------|----------|
| [agentrank-small](https://huggingface.co/vrushket/agentrank-small) | 33M | ⚑⚑⚑ Fast | Good | Real-time agents, edge |
| **agentrank-base** | 149M | ⚑⚑ Medium | **Best** | Quality-critical apps |
| agentrank-reranker (coming) | 149M | ⚑ Slower | Superior | Two-stage retrieval |
---
## πŸ“š Citation
```bibtex
@misc{agentrank2024,
author = {Vrushket More},
title = {AgentRank: Embedding Models for AI Agent Memory Retrieval},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/vrushket/agentrank-base}
}
```
---
## 🀝 Community & Support
- πŸ› **Issues**: [GitHub Issues](https://github.com/vmore2/AgentRank-base/issues)
- πŸ’¬ **Discussions**: [HuggingFace Community](https://huggingface.co/vrushket/agentrank-base/discussions)
- πŸ“§ **Contact**: vrushket2604@gmail.com
---
## πŸ“„ License
Apache 2.0 - **Free for commercial use!**
---
<div align="center">
### ⭐ If AgentRank helps your project, please star the repo!
**Built with ❀️ for the AI agent community**
</div>