AgentRank-Small: Embedding Model for AI Agent Memory Retrieval

MRR Recall@5 Parameters License

AgentRank is the first embedding model family specifically designed for AI agent memory retrieval. Unlike general-purpose embedders, AgentRank understands temporal context, memory types, and importance - critical for agents that need to remember past interactions.

πŸš€ Key Results

Model MRR Recall@1 Recall@5 NDCG@10
AgentRank-Small 0.6375 0.4460 0.9740 0.6797
all-MiniLM-L6-v2 0.5297 0.3720 0.7520 0.6370
all-mpnet-base-v2 0.5351 0.3660 0.7960 0.6335

+20% MRR improvement over base MiniLM model!

🎯 Why AgentRank?

AI agents need memory that understands:

Challenge General Embedders AgentRank
"What did I say yesterday?" ❌ No temporal awareness βœ… Temporal embeddings
"What's my preference?" ❌ Mixes with events βœ… Memory type awareness
"What's most important?" ❌ No priority βœ… Importance prediction

πŸ“¦ Installation

pip install transformers torch

πŸ’» Usage

Basic Usage

from transformers import AutoModel, AutoTokenizer
import torch

# Load model
model = AutoModel.from_pretrained("vrushket/agentrank-small")
tokenizer = AutoTokenizer.from_pretrained("vrushket/agentrank-small")

def encode(texts):
    inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
        embeddings = outputs.last_hidden_state.mean(dim=1)
        embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
    return embeddings

# Encode memories and query
memories = [
    "User prefers Python over JavaScript",
    "User asked about machine learning yesterday",
    "User is working on a web project",
]
query = "What programming language does the user like?"

memory_embeddings = encode(memories)
query_embedding = encode([query])

# Compute similarities
similarities = torch.mm(query_embedding, memory_embeddings.T)
print(f"Most relevant: {memories[similarities.argmax()]}")
# Output: "User prefers Python over JavaScript"

With Temporal & Memory Type Metadata (Full Power)

# For full AgentRank features including temporal awareness:
# pip install agentrank  (coming soon!)

from agentrank import AgentRankEmbedder

model = AgentRankEmbedder.from_pretrained("vrushket/agentrank-small")

# Encode with metadata
embedding = model.encode(
    "User mentioned they prefer morning meetings",
    days_ago=3,           # Memory is 3 days old
    memory_type="semantic" # It's a preference, not an event
)

πŸ—οΈ Architecture

AgentRank-Small is based on all-MiniLM-L6-v2 with novel additions:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  MiniLM Transformer Encoder (6 layers)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    ↓               ↓               ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Temporal β”‚   β”‚ Memory   β”‚   β”‚ Importanceβ”‚
β”‚ Position β”‚   β”‚ Type     β”‚   β”‚ Predictionβ”‚
β”‚ Embed    β”‚   β”‚ Embed    β”‚   β”‚ Head      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚               β”‚               β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↓
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚ L2 Normalized   β”‚
         β”‚ 384-dim Embeddingβ”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Novel Features:

  • Temporal Position Embeddings: 10 learnable buckets (today, 1-3 days, week, month, etc.)
  • Memory Type Embeddings: Episodic, Semantic, Procedural
  • Importance Prediction Head: Auxiliary task during training

πŸŽ“ Training

  • Dataset: 500K synthetic agent memory samples
  • Memory Types: Episodic (40%), Semantic (35%), Procedural (25%)
  • Loss: Multiple Negatives Ranking Loss + Importance MSE
  • Hard Negatives: 5 types (temporal, type confusion, topic drift, etc.)
  • Hardware: NVIDIA RTX 6000 Ada (48GB) with FP16

πŸ“Š Benchmarks

Evaluated on AgentMemBench (500 test samples, 8 candidates each):

Metric AgentRank-Small MiniLM Improvement
MRR 0.6375 0.5297 +20.4%
Recall@1 0.4460 0.3720 +19.9%
Recall@5 0.9740 0.7520 +29.5%
NDCG@10 0.6797 0.6370 +6.7%

πŸ”œ Coming Soon

  • AgentRank-Base: 110M params, even better performance
  • AgentRank-Reranker: Cross-encoder for top-k refinement
  • Python Package: pip install agentrank

πŸ“š Citation

@misc{agentrank2024,
  author = {Vrushket More},
  title = {AgentRank: Embedding Models for AI Agent Memory Retrieval},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/vrushket/agentrank-small}
}

πŸ“„ License

Apache 2.0 - Free for commercial use!

🀝 Acknowledgments

Built on top of sentence-transformers and MiniLM.

Downloads last month
15
Safetensors
Model size
22.7M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results