|
|
---
|
|
|
license: apache-2.0
|
|
|
language:
|
|
|
- en
|
|
|
tags:
|
|
|
- sentence-transformers
|
|
|
- embeddings
|
|
|
- retrieval
|
|
|
- agents
|
|
|
- memory
|
|
|
- rag
|
|
|
- semantic-search
|
|
|
- ai-agents
|
|
|
- llm-memory
|
|
|
- vector-search
|
|
|
library_name: transformers
|
|
|
pipeline_tag: sentence-similarity
|
|
|
datasets:
|
|
|
- custom
|
|
|
metrics:
|
|
|
- mrr
|
|
|
- recall
|
|
|
- ndcg
|
|
|
model-index:
|
|
|
- name: agentrank-base
|
|
|
results:
|
|
|
- task:
|
|
|
type: retrieval
|
|
|
name: Agent Memory Retrieval
|
|
|
metrics:
|
|
|
- type: mrr
|
|
|
value: 0.6496
|
|
|
name: MRR
|
|
|
- type: recall
|
|
|
value: 0.4440
|
|
|
name: Recall@1
|
|
|
- type: recall
|
|
|
value: 0.9960
|
|
|
name: Recall@5
|
|
|
- type: ndcg
|
|
|
value: 0.6786
|
|
|
name: NDCG@10
|
|
|
---
|
|
|
|
|
|
<div align="center">
|
|
|
|
|
|
# π§ AgentRank-Base
|
|
|
|
|
|
### The First Embedding Model Built Specifically for AI Agent Memory Retrieval
|
|
|
|
|
|
<p>
|
|
|
<img src="https://img.shields.io/badge/MRR-0.65-brightgreen?style=for-the-badge" alt="MRR">
|
|
|
<img src="https://img.shields.io/badge/Recall%405-99.6%25-blue?style=for-the-badge" alt="Recall@5">
|
|
|
<img src="https://img.shields.io/badge/Parameters-149M-orange?style=for-the-badge" alt="Parameters">
|
|
|
<img src="https://img.shields.io/badge/License-Apache%202.0-green?style=for-the-badge" alt="License">
|
|
|
</p>
|
|
|
|
|
|
**+23% MRR improvement over general-purpose embedders** | **Temporal awareness** | **Memory type understanding**
|
|
|
|
|
|
[π Quick Start](#-quick-start) β’ [π Benchmarks](#-benchmarks) β’ [π§ Architecture](#-architecture) β’ [π‘ Why AgentRank?](#-why-agentrank)
|
|
|
|
|
|
</div>
|
|
|
|
|
|
---
|
|
|
|
|
|
## π― TL;DR
|
|
|
|
|
|
> **AgentRank-Base** is an embedding model designed for AI agents that need to remember. Unlike generic embedders (OpenAI, Cohere, MiniLM), AgentRank understands:
|
|
|
> - β° **When** something happened (temporal awareness)
|
|
|
> - π **What type** of memory it is (episodic vs semantic vs procedural)
|
|
|
> - β **How important** the memory is
|
|
|
|
|
|
---
|
|
|
|
|
|
## π‘ Why AgentRank?
|
|
|
|
|
|
### The Problem with Current Embedders
|
|
|
|
|
|
AI agents need memory. But when you ask an agent:
|
|
|
|
|
|
> *"What did we discuss about Python **yesterday**?"*
|
|
|
|
|
|
Current embedders fail because they:
|
|
|
- β Don't understand "yesterday" means recent time
|
|
|
- β Can't distinguish between a preference and an event
|
|
|
- β Treat all memories as equally important
|
|
|
|
|
|
### The AgentRank Solution
|
|
|
|
|
|
| Challenge | OpenAI/Cohere/MiniLM | AgentRank |
|
|
|
|-----------|---------------------|-----------|
|
|
|
| "What did I say **yesterday**?" | Random old results π | Recent memories first β
|
|
|
|
| "What's my **preference**?" | Mixed with events π | Only preferences β
|
|
|
|
| "What's **most important**?" | No priority π | Importance-aware retrieval β
|
|
|
|
|
|
|
---
|
|
|
|
|
|
## π Benchmarks
|
|
|
|
|
|
Evaluated on **AgentMemBench** (500 test samples, 8 candidates each):
|
|
|
|
|
|
| Model | Parameters | MRR β | Recall@1 β | Recall@5 β | NDCG@10 β |
|
|
|
|-------|------------|-------|------------|------------|-----------|
|
|
|
| **AgentRank-Base** | 149M | **0.6496** | **0.4440** | **0.9960** | **0.6786** |
|
|
|
| AgentRank-Small | 33M | 0.6375 | 0.4460 | 0.9740 | 0.6797 |
|
|
|
| all-mpnet-base-v2 | 109M | 0.5351 | 0.3660 | 0.7960 | 0.6335 |
|
|
|
| all-MiniLM-L6-v2 | 22M | 0.5297 | 0.3720 | 0.7520 | 0.6370 |
|
|
|
|
|
|
### Improvement Over Baselines
|
|
|
|
|
|
| vs Baseline | MRR | Recall@1 | Recall@5 |
|
|
|
|-------------|-----|----------|----------|
|
|
|
| vs MiniLM | **+22.6%** | **+19.4%** | **+32.4%** |
|
|
|
| vs MPNet | **+21.4%** | **+21.3%** | **+25.1%** |
|
|
|
|
|
|
---
|
|
|
|
|
|
## π Quick Start
|
|
|
|
|
|
### Installation
|
|
|
|
|
|
```bash
|
|
|
pip install transformers torch
|
|
|
```
|
|
|
|
|
|
### Basic Usage
|
|
|
|
|
|
```python
|
|
|
from transformers import AutoModel, AutoTokenizer
|
|
|
import torch
|
|
|
|
|
|
# Load model and tokenizer
|
|
|
model = AutoModel.from_pretrained("vrushket/agentrank-base")
|
|
|
tokenizer = AutoTokenizer.from_pretrained("vrushket/agentrank-base")
|
|
|
|
|
|
def encode(texts, model, tokenizer):
|
|
|
"""Encode texts to embeddings."""
|
|
|
inputs = tokenizer(
|
|
|
texts,
|
|
|
padding=True,
|
|
|
truncation=True,
|
|
|
max_length=512,
|
|
|
return_tensors="pt"
|
|
|
)
|
|
|
with torch.no_grad():
|
|
|
outputs = model(**inputs)
|
|
|
# Mean pooling
|
|
|
embeddings = outputs.last_hidden_state.mean(dim=1)
|
|
|
# L2 normalize
|
|
|
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
|
|
|
return embeddings
|
|
|
|
|
|
# Your agent's memories
|
|
|
memories = [
|
|
|
"User prefers Python over JavaScript for backend development",
|
|
|
"User asked about React frameworks yesterday",
|
|
|
"User mentioned they have 3 years of coding experience",
|
|
|
"User is working on an e-commerce project",
|
|
|
]
|
|
|
|
|
|
# A query from the user
|
|
|
query = "What programming language does the user prefer?"
|
|
|
|
|
|
# Encode everything
|
|
|
memory_embeddings = encode(memories, model, tokenizer)
|
|
|
query_embedding = encode([query], model, tokenizer)
|
|
|
|
|
|
# Find most similar memory
|
|
|
similarities = torch.mm(query_embedding, memory_embeddings.T)[0]
|
|
|
best_match_idx = similarities.argmax().item()
|
|
|
|
|
|
print(f"Query: {query}")
|
|
|
print(f"Best match: {memories[best_match_idx]}")
|
|
|
print(f"Similarity: {similarities[best_match_idx]:.4f}")
|
|
|
|
|
|
# Output:
|
|
|
# Query: What programming language does the user prefer?
|
|
|
# Best match: User prefers Python over JavaScript for backend development
|
|
|
# Similarity: 0.8234
|
|
|
```
|
|
|
|
|
|
### Advanced Usage with Metadata
|
|
|
|
|
|
For full temporal and memory type awareness, use the AgentRank package:
|
|
|
|
|
|
```python
|
|
|
# Coming soon: pip install agentrank
|
|
|
from agentrank import AgentRankEmbedder
|
|
|
|
|
|
model = AgentRankEmbedder.from_pretrained("vrushket/agentrank-base")
|
|
|
|
|
|
# Encode with temporal context
|
|
|
memory_embedding = model.encode(
|
|
|
text="User mentioned they prefer morning meetings",
|
|
|
days_ago=7, # Memory is 1 week old
|
|
|
memory_type="semantic" # It's a preference (not an event)
|
|
|
)
|
|
|
|
|
|
# Encode query (no metadata needed for queries)
|
|
|
query_embedding = model.encode("When does the user like to have meetings?")
|
|
|
|
|
|
# The model now knows this is a week-old preference!
|
|
|
similarity = torch.cosine_similarity(query_embedding, memory_embedding, dim=0)
|
|
|
```
|
|
|
|
|
|
---
|
|
|
|
|
|
## π§ Architecture
|
|
|
|
|
|
AgentRank-Base is built on **ModernBERT-base** (110M params) with novel additions:
|
|
|
|
|
|
```
|
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
β ModernBERT Encoder (22 Transformer Layers) β
|
|
|
β - RoPE Positional Encoding β
|
|
|
β - Flash Attention β
|
|
|
β - 768 Hidden Dimension β
|
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
β
|
|
|
βββββββββββββββββΌββββββββββββββββ
|
|
|
β β β
|
|
|
βββββββββββββββ βββββββββββββββ βββββββββββββββ
|
|
|
β Temporal β β Memory β β Importance β
|
|
|
β Position β β Type β β Prediction β
|
|
|
β Embeddings β β Embeddings β β Head β
|
|
|
β (10 Γ 768) β β (4 Γ 768) β β (768β1) β
|
|
|
βββββββββββββββ βββββββββββββββ βββββββββββββββ
|
|
|
β β β
|
|
|
βββββββββββββββββΌββββββββββββββββ
|
|
|
β
|
|
|
βββββββββββββββββββββββ
|
|
|
β Projection Layer β
|
|
|
β (768 β 768) β
|
|
|
βββββββββββββββββββββββ
|
|
|
β
|
|
|
βββββββββββββββββββββββ
|
|
|
β L2 Normalization β
|
|
|
β 768-dim Embedding β
|
|
|
βββββββββββββββββββββββ
|
|
|
```
|
|
|
|
|
|
### Novel Components
|
|
|
|
|
|
| Component | Purpose | How It Helps |
|
|
|
|-----------|---------|--------------|
|
|
|
| **Temporal Embeddings** | Encodes memory age (today, this week, last month, etc.) | "Yesterday" queries match recent memories |
|
|
|
| **Memory Type Embeddings** | Distinguishes episodic/semantic/procedural | "What do I like?" matches preferences, not events |
|
|
|
| **Importance Head** | Auxiliary task predicting memory priority | Helps learn better representations |
|
|
|
|
|
|
### Temporal Buckets
|
|
|
|
|
|
```
|
|
|
Bucket 0: Today (0-1 days)
|
|
|
Bucket 1: Recent (1-3 days)
|
|
|
Bucket 2: This week (3-7 days)
|
|
|
Bucket 3: Last week (7-14 days)
|
|
|
Bucket 4: This month (14-30 days)
|
|
|
Bucket 5: Last month (30-60 days)
|
|
|
Bucket 6: Few months (60-90 days)
|
|
|
Bucket 7: Half year (90-180 days)
|
|
|
Bucket 8: This year (180-365 days)
|
|
|
Bucket 9: Long ago (365+ days)
|
|
|
```
|
|
|
|
|
|
### Memory Types
|
|
|
|
|
|
```
|
|
|
Type 0: Episodic β Events, conversations ("We discussed X yesterday")
|
|
|
Type 1: Semantic β Facts, preferences ("User likes Python")
|
|
|
Type 2: Procedural β Instructions ("To deploy, run npm build")
|
|
|
Type 3: Unknown β Fallback
|
|
|
```
|
|
|
|
|
|
---
|
|
|
|
|
|
## π Training Details
|
|
|
|
|
|
| Aspect | Details |
|
|
|
|--------|---------|
|
|
|
| **Base Model** | answerdotai/ModernBERT-base (110M params) |
|
|
|
| **Training Data** | 500K synthetic agent memory samples |
|
|
|
| **Memory Distribution** | Episodic (40%), Semantic (35%), Procedural (25%) |
|
|
|
| **Loss Function** | Multiple Negatives Ranking Loss + Importance MSE |
|
|
|
| **Hard Negatives** | 7 per sample (5 types: temporal, type confusion, topic drift, etc.) |
|
|
|
| **Batch Size** | 16-32 per GPU |
|
|
|
| **Hardware** | 2Γ NVIDIA RTX 6000 Ada (48GB each) |
|
|
|
| **Training Time** | ~12 hours |
|
|
|
| **Precision** | FP16 Mixed Precision |
|
|
|
| **Final Val Loss** | 0.877 |
|
|
|
|
|
|
---
|
|
|
|
|
|
## ποΈ Use Cases
|
|
|
|
|
|
### 1. AI Agents with Long-Term Memory
|
|
|
|
|
|
```python
|
|
|
# Store memories with metadata
|
|
|
agent.remember(
|
|
|
text="User is allergic to peanuts",
|
|
|
memory_type="semantic",
|
|
|
importance=10, # Critical medical info!
|
|
|
)
|
|
|
|
|
|
# Later, when discussing food...
|
|
|
relevant_memories = agent.recall("What should I know about the user's diet?")
|
|
|
# Returns: "User is allergic to peanuts" (even if stored months ago)
|
|
|
```
|
|
|
|
|
|
### 2. RAG Systems for Conversational AI
|
|
|
|
|
|
```python
|
|
|
# Better retrieval for chatbots
|
|
|
query = "What did we talk about in our last meeting?"
|
|
|
# AgentRank returns recent, relevant conversations
|
|
|
# Generic embedders return random topically-similar docs
|
|
|
```
|
|
|
|
|
|
### 3. Personal Knowledge Bases
|
|
|
|
|
|
```python
|
|
|
# User's notes and preferences
|
|
|
memories = [
|
|
|
"I prefer dark mode in all apps",
|
|
|
"My morning routine starts at 6 AM",
|
|
|
"Important: Tax deadline April 15",
|
|
|
]
|
|
|
# AgentRank properly handles time-sensitive queries
|
|
|
```
|
|
|
|
|
|
---
|
|
|
|
|
|
## π When to Use AgentRank vs Others
|
|
|
|
|
|
| Use Case | Best Model |
|
|
|
|----------|------------|
|
|
|
| **AI agents with memory** | β
AgentRank |
|
|
|
| **Time-sensitive retrieval** | β
AgentRank |
|
|
|
| **Conversational AI** | β
AgentRank |
|
|
|
| General document search | OpenAI / Cohere |
|
|
|
| Code search | CodeBERT |
|
|
|
| Scientific papers | SciBERT |
|
|
|
|
|
|
---
|
|
|
|
|
|
## π Model Family
|
|
|
|
|
|
| Model | Parameters | Speed | Quality | Best For |
|
|
|
|-------|------------|-------|---------|----------|
|
|
|
| [agentrank-small](https://huggingface.co/vrushket/agentrank-small) | 33M | β‘β‘β‘ Fast | Good | Real-time agents, edge |
|
|
|
| **agentrank-base** | 149M | β‘β‘ Medium | **Best** | Quality-critical apps |
|
|
|
| agentrank-reranker (coming) | 149M | β‘ Slower | Superior | Two-stage retrieval |
|
|
|
|
|
|
---
|
|
|
|
|
|
## π Citation
|
|
|
|
|
|
```bibtex
|
|
|
@misc{agentrank2024,
|
|
|
author = {Vrushket More},
|
|
|
title = {AgentRank: Embedding Models for AI Agent Memory Retrieval},
|
|
|
year = {2024},
|
|
|
publisher = {HuggingFace},
|
|
|
url = {https://huggingface.co/vrushket/agentrank-base}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
---
|
|
|
|
|
|
## π€ Community & Support
|
|
|
|
|
|
- π **Issues**: [GitHub Issues](https://github.com/vmore2/AgentRank-base/issues)
|
|
|
- π¬ **Discussions**: [HuggingFace Community](https://huggingface.co/vrushket/agentrank-base/discussions)
|
|
|
- π§ **Contact**: vrushket2604@gmail.com
|
|
|
|
|
|
---
|
|
|
|
|
|
## π License
|
|
|
|
|
|
Apache 2.0 - **Free for commercial use!**
|
|
|
|
|
|
---
|
|
|
|
|
|
<div align="center">
|
|
|
|
|
|
### β If AgentRank helps your project, please star the repo!
|
|
|
|
|
|
**Built with β€οΈ for the AI agent community**
|
|
|
|
|
|
</div>
|
|
|
|