agentrank-base / README.md

Upload AgentRank model

a239dd5 verified 16 days ago

12.4 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- sentence-transformers
	- embeddings
	- retrieval
	- agents
	- memory
	- rag
	- semantic-search
	- ai-agents
	- llm-memory
	- vector-search
	library_name: transformers
	pipeline_tag: sentence-similarity
	datasets:
	- custom
	metrics:
	- mrr
	- recall
	- ndcg
	model-index:
	- name: agentrank-base
	results:
	- task:
	type: retrieval
	name: Agent Memory Retrieval
	metrics:
	- type: mrr
	value: 0.6496
	name: MRR
	- type: recall
	value: 0.4440
	name: Recall@1
	- type: recall
	value: 0.9960
	name: Recall@5
	- type: ndcg
	value: 0.6786
	name: NDCG@10
	---

	<div align="center">

	# 🧠 AgentRank-Base

	### The First Embedding Model Built Specifically for AI Agent Memory Retrieval

	<p>
	<img src="https://img.shields.io/badge/MRR-0.65-brightgreen?style=for-the-badge" alt="MRR">
	<img src="https://img.shields.io/badge/Recall%405-99.6%25-blue?style=for-the-badge" alt="Recall@5">
	<img src="https://img.shields.io/badge/Parameters-149M-orange?style=for-the-badge" alt="Parameters">
	<img src="https://img.shields.io/badge/License-Apache%202.0-green?style=for-the-badge" alt="License">
	</p>

	+23% MRR improvement over general-purpose embedders \| Temporal awareness \| Memory type understanding

	[🚀 Quick Start](#-quick-start) • [📊 Benchmarks](#-benchmarks) • [🔧 Architecture](#-architecture) • [💡 Why AgentRank?](#-why-agentrank)

	</div>

	---

	## 🎯 TL;DR

	> AgentRank-Base is an embedding model designed for AI agents that need to remember. Unlike generic embedders (OpenAI, Cohere, MiniLM), AgentRank understands:
	> - ⏰ When something happened (temporal awareness)
	> - 📁 What type of memory it is (episodic vs semantic vs procedural)
	> - ⭐ How important the memory is

	---

	## 💡 Why AgentRank?

	### The Problem with Current Embedders

	AI agents need memory. But when you ask an agent:

	> "What did we discuss about Python yesterday?"

	Current embedders fail because they:
	- ❌ Don't understand "yesterday" means recent time
	- ❌ Can't distinguish between a preference and an event
	- ❌ Treat all memories as equally important

	### The AgentRank Solution

	\| Challenge \| OpenAI/Cohere/MiniLM \| AgentRank \|
	\|-----------\|---------------------\|-----------\|
	\| "What did I say yesterday?" \| Random old results 😕 \| Recent memories first ✅ \|
	\| "What's my preference?" \| Mixed with events 😕 \| Only preferences ✅ \|
	\| "What's most important?" \| No priority 😕 \| Importance-aware retrieval ✅ \|

	---

	## 📊 Benchmarks

	Evaluated on AgentMemBench (500 test samples, 8 candidates each):

	\| Model \| Parameters \| MRR ↑ \| Recall@1 ↑ \| Recall@5 ↑ \| NDCG@10 ↑ \|
	\|-------\|------------\|-------\|------------\|------------\|-----------\|
	\| AgentRank-Base \| 149M \| 0.6496 \| 0.4440 \| 0.9960 \| 0.6786 \|
	\| AgentRank-Small \| 33M \| 0.6375 \| 0.4460 \| 0.9740 \| 0.6797 \|
	\| all-mpnet-base-v2 \| 109M \| 0.5351 \| 0.3660 \| 0.7960 \| 0.6335 \|
	\| all-MiniLM-L6-v2 \| 22M \| 0.5297 \| 0.3720 \| 0.7520 \| 0.6370 \|

	### Improvement Over Baselines

	\| vs Baseline \| MRR \| Recall@1 \| Recall@5 \|
	\|-------------\|-----\|----------\|----------\|
	\| vs MiniLM \| +22.6% \| +19.4% \| +32.4% \|
	\| vs MPNet \| +21.4% \| +21.3% \| +25.1% \|

	---

	## 🚀 Quick Start

	### Installation

	```bash
	pip install transformers torch
	```

	### Basic Usage

	```python
	from transformers import AutoModel, AutoTokenizer
	import torch

	# Load model and tokenizer
	model = AutoModel.from_pretrained("vrushket/agentrank-base")
	tokenizer = AutoTokenizer.from_pretrained("vrushket/agentrank-base")

	def encode(texts, model, tokenizer):
	"""Encode texts to embeddings."""
	inputs = tokenizer(
	texts,
	padding=True,
	truncation=True,
	max_length=512,
	return_tensors="pt"
	)
	with torch.no_grad():
	outputs = model(**inputs)
	# Mean pooling
	embeddings = outputs.last_hidden_state.mean(dim=1)
	# L2 normalize
	embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
	return embeddings

	# Your agent's memories
	memories = [
	"User prefers Python over JavaScript for backend development",
	"User asked about React frameworks yesterday",
	"User mentioned they have 3 years of coding experience",
	"User is working on an e-commerce project",
	]

	# A query from the user
	query = "What programming language does the user prefer?"

	# Encode everything
	memory_embeddings = encode(memories, model, tokenizer)
	query_embedding = encode([query], model, tokenizer)

	# Find most similar memory
	similarities = torch.mm(query_embedding, memory_embeddings.T)[0]
	best_match_idx = similarities.argmax().item()

	print(f"Query: {query}")
	print(f"Best match: {memories[best_match_idx]}")
	print(f"Similarity: {similarities[best_match_idx]:.4f}")

	# Output:
	# Query: What programming language does the user prefer?
	# Best match: User prefers Python over JavaScript for backend development
	# Similarity: 0.8234
	```

	### Advanced Usage with Metadata

	For full temporal and memory type awareness, use the AgentRank package:

	```python
	# Coming soon: pip install agentrank
	from agentrank import AgentRankEmbedder

	model = AgentRankEmbedder.from_pretrained("vrushket/agentrank-base")

	# Encode with temporal context
	memory_embedding = model.encode(
	text="User mentioned they prefer morning meetings",
	days_ago=7, # Memory is 1 week old
	memory_type="semantic" # It's a preference (not an event)
	)

	# Encode query (no metadata needed for queries)
	query_embedding = model.encode("When does the user like to have meetings?")

	# The model now knows this is a week-old preference!
	similarity = torch.cosine_similarity(query_embedding, memory_embedding, dim=0)
	```

	---

	## 🔧 Architecture

	AgentRank-Base is built on ModernBERT-base (110M params) with novel additions:

	```
	┌─────────────────────────────────────────────────┐
	│ ModernBERT Encoder (22 Transformer Layers) │
	│ - RoPE Positional Encoding │
	│ - Flash Attention │
	│ - 768 Hidden Dimension │
	└─────────────────────────────────────────────────┘
	│
	┌───────────────┼───────────────┐
	↓ ↓ ↓
	┌─────────────┐ ┌─────────────┐ ┌─────────────┐
	│ Temporal │ │ Memory │ │ Importance │
	│ Position │ │ Type │ │ Prediction │
	│ Embeddings │ │ Embeddings │ │ Head │
	│ (10 × 768) │ │ (4 × 768) │ │ (768→1) │
	└─────────────┘ └─────────────┘ └─────────────┘
	│ │ │
	└───────────────┼───────────────┘
	↓
	┌─────────────────────┐
	│ Projection Layer │
	│ (768 → 768) │
	└─────────────────────┘
	↓
	┌─────────────────────┐
	│ L2 Normalization │
	│ 768-dim Embedding │
	└─────────────────────┘
	```

	### Novel Components

	\| Component \| Purpose \| How It Helps \|
	\|-----------\|---------\|--------------\|
	\| Temporal Embeddings \| Encodes memory age (today, this week, last month, etc.) \| "Yesterday" queries match recent memories \|
	\| Memory Type Embeddings \| Distinguishes episodic/semantic/procedural \| "What do I like?" matches preferences, not events \|
	\| Importance Head \| Auxiliary task predicting memory priority \| Helps learn better representations \|

	### Temporal Buckets

	```
	Bucket 0: Today (0-1 days)
	Bucket 1: Recent (1-3 days)
	Bucket 2: This week (3-7 days)
	Bucket 3: Last week (7-14 days)
	Bucket 4: This month (14-30 days)
	Bucket 5: Last month (30-60 days)
	Bucket 6: Few months (60-90 days)
	Bucket 7: Half year (90-180 days)
	Bucket 8: This year (180-365 days)
	Bucket 9: Long ago (365+ days)
	```

	### Memory Types

	```
	Type 0: Episodic → Events, conversations ("We discussed X yesterday")
	Type 1: Semantic → Facts, preferences ("User likes Python")
	Type 2: Procedural → Instructions ("To deploy, run npm build")
	Type 3: Unknown → Fallback
	```

	---

	## 🎓 Training Details

	\| Aspect \| Details \|
	\|--------\|---------\|
	\| Base Model \| answerdotai/ModernBERT-base (110M params) \|
	\| Training Data \| 500K synthetic agent memory samples \|
	\| Memory Distribution \| Episodic (40%), Semantic (35%), Procedural (25%) \|
	\| Loss Function \| Multiple Negatives Ranking Loss + Importance MSE \|
	\| Hard Negatives \| 7 per sample (5 types: temporal, type confusion, topic drift, etc.) \|
	\| Batch Size \| 16-32 per GPU \|
	\| Hardware \| 2× NVIDIA RTX 6000 Ada (48GB each) \|
	\| Training Time \| ~12 hours \|
	\| Precision \| FP16 Mixed Precision \|
	\| Final Val Loss \| 0.877 \|

	---

	## 🏗️ Use Cases

	### 1. AI Agents with Long-Term Memory

	```python
	# Store memories with metadata
	agent.remember(
	text="User is allergic to peanuts",
	memory_type="semantic",
	importance=10, # Critical medical info!
	)

	# Later, when discussing food...
	relevant_memories = agent.recall("What should I know about the user's diet?")
	# Returns: "User is allergic to peanuts" (even if stored months ago)
	```

	### 2. RAG Systems for Conversational AI

	```python
	# Better retrieval for chatbots
	query = "What did we talk about in our last meeting?"
	# AgentRank returns recent, relevant conversations
	# Generic embedders return random topically-similar docs
	```

	### 3. Personal Knowledge Bases

	```python
	# User's notes and preferences
	memories = [
	"I prefer dark mode in all apps",
	"My morning routine starts at 6 AM",
	"Important: Tax deadline April 15",
	]
	# AgentRank properly handles time-sensitive queries
	```

	---

	## 🆚 When to Use AgentRank vs Others

	\| Use Case \| Best Model \|
	\|----------\|------------\|
	\| AI agents with memory \| ✅ AgentRank \|
	\| Time-sensitive retrieval \| ✅ AgentRank \|
	\| Conversational AI \| ✅ AgentRank \|
	\| General document search \| OpenAI / Cohere \|
	\| Code search \| CodeBERT \|
	\| Scientific papers \| SciBERT \|

	---

	## 📁 Model Family

	\| Model \| Parameters \| Speed \| Quality \| Best For \|
	\|-------\|------------\|-------\|---------\|----------\|
	\| [agentrank-small](https://huggingface.co/vrushket/agentrank-small) \| 33M \| ⚡⚡⚡ Fast \| Good \| Real-time agents, edge \|
	\| agentrank-base \| 149M \| ⚡⚡ Medium \| Best \| Quality-critical apps \|
	\| agentrank-reranker (coming) \| 149M \| ⚡ Slower \| Superior \| Two-stage retrieval \|

	---

	## 📚 Citation

	```bibtex
	@misc{agentrank2024,
	author = {Vrushket More},
	title = {AgentRank: Embedding Models for AI Agent Memory Retrieval},
	year = {2024},
	publisher = {HuggingFace},
	url = {https://huggingface.co/vrushket/agentrank-base}
	}
	```

	---

	## 🤝 Community & Support

	- 🐛 Issues: [GitHub Issues](https://github.com/vmore2/AgentRank-base/issues)
	- 💬 Discussions: [HuggingFace Community](https://huggingface.co/vrushket/agentrank-base/discussions)
	- 📧 Contact: vrushket2604@gmail.com

	---

	## 📄 License

	Apache 2.0 - Free for commercial use!

	---

	<div align="center">

	### ⭐ If AgentRank helps your project, please star the repo!

	Built with ❤️ for the AI agent community

	</div>