--- license: mit tags: - vector-database - semantic-search - embeddings - llm - memory - hnsw - rust - python library_name: arms-hat pipeline_tag: feature-extraction --- # HAT: Hierarchical Attention Tree **A novel index structure for AI memory systems that achieves 100% recall at 70x faster build times than HNSW.** **Also: A new database paradigm for any domain with known hierarchy + semantic similarity.** [![PyPI](https://img.shields.io/pypi/v/arms-hat.svg)](https://pypi.org/project/arms-hat/) [![crates.io](https://img.shields.io/crates/v/arms-hat.svg)](https://crates.io/crates/arms-hat) [![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) [![Rust](https://img.shields.io/badge/Rust-1.70+-orange.svg)](https://www.rust-lang.org/) [![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://www.python.org/) --- ## Architecture

HAT Architecture

HAT exploits the **known hierarchy** in AI conversations: sessions contain documents, documents contain chunks. This structural prior enables O(log n) queries with 100% recall. --- ## Key Results

Summary Results

| Metric | HAT | HNSW | Improvement | |--------|-----|------|-------------| | **Recall@10** | **100%** | 70% | +30% | | **Build Time** | 30ms | 2.1s | **70x faster** | | **Query Latency** | 3.1ms | - | Production-ready | *Benchmarked on hierarchically-structured AI conversation data* --- ## Recall Comparison

HAT vs HNSW Recall

HAT achieves **100% recall** where HNSW achieves only ~70% on hierarchically-structured data. --- ## Build Time

Build Time Comparison

HAT builds indexes **70x faster** than HNSW - critical for real-time applications. --- ## The Problem Large language models have finite context windows. A 10K context model can only "see" the most recent 10K tokens, losing access to earlier conversation history. **Current solutions fall short:** - Longer context models: Expensive to train and run - Summarization: Lossy compression that discards detail - RAG retrieval: Re-embeds and recomputes attention every query ## The HAT Solution

HAT vs RAG

HAT exploits **known structure** in AI workloads. Unlike general vector databases that treat data as unstructured point clouds, AI conversations have inherent hierarchy: ``` Session (conversation boundary) └── Document (topic or turn) └── Chunk (individual message) ``` ### The Hippocampus Analogy

Hippocampus Analogy

HAT mirrors human memory architecture - functioning as an **artificial hippocampus** for AI systems. --- ## How It Works ### Beam Search Query

Beam Search

HAT uses beam search through the hierarchy: ``` 1. Start at root 2. At each level, score children by cosine similarity to query 3. Keep top-b candidates (beam width) 4. Return top-k from leaf level ``` **Complexity:** O(b · d · c) = O(log n) when balanced ### Consolidation Phases

Consolidation Phases

Inspired by sleep-staged memory consolidation, HAT maintains index quality through incremental consolidation. --- ## Scale Performance

Scale Performance

HAT maintains **100% recall** across all tested scales while HNSW degrades significantly. | Scale | HAT Build | HNSW Build | HAT R@10 | HNSW R@10 | |-------|-----------|------------|----------|-----------| | 500 | 16ms | 1.0s | **100%** | 55% | | 1000 | 25ms | 2.0s | **100%** | 44.5% | | 2000 | 50ms | 4.3s | **100%** | 67.5% | | 5000 | 127ms | 11.9s | **100%** | 55% | --- ## End-to-End Pipeline

Integration Pipeline

### Core Claim > **A 10K context model with HAT achieves 100% recall on 60K+ tokens with 3.1ms latency.** | Messages | Tokens | Context % | Recall | Latency | Memory | |----------|--------|-----------|--------|---------|--------| | 1000 | 30K | 33% | 100% | 1.7ms | 1.6MB | | 2000 | 60K | 17% | 100% | 3.1ms | 3.3MB | --- ## Quick Start ### Python ```python from arms_hat import HatIndex # Create index (1536 dimensions for OpenAI embeddings) index = HatIndex.cosine(1536) # Add messages with automatic hierarchy index.add(embedding) # Returns ID # Session/document management index.new_session() # Start new conversation index.new_document() # Start new topic # Query results = index.near(query_embedding, k=10) for result in results: print(f"ID: {result.id}, Score: {result.score:.4f}") # Persistence index.save("memory.hat") loaded = HatIndex.load("memory.hat") ``` ### Rust ```rust use hat::{HatIndex, HatConfig}; // Create index let config = HatConfig::default(); let mut index = HatIndex::new(config, 1536); // Add points let id = index.add(&embedding); // Query let results = index.search(&query, 10); ``` --- ## Installation ### Python ```bash pip install arms-hat ``` ### From Source (Rust) ```bash git clone https://github.com/automate-capture/hat.git cd hat cargo build --release ``` ### Python Development ```bash cd python pip install maturin maturin develop ``` --- ## Project Structure ``` hat/ ├── src/ # Rust implementation │ ├── lib.rs # Library entry point │ ├── index.rs # HatIndex implementation │ ├── container.rs # Tree node types │ ├── consolidation.rs # Background maintenance │ └── persistence.rs # Save/load functionality ├── python/ # Python bindings (PyO3) │ └── arms_hat/ # Python package ├── benchmarks/ # Performance comparisons ├── examples/ # Usage examples ├── paper/ # Research paper (PDF) ├── images/ # Figures and diagrams └── tests/ # Test suite ``` --- ## Reproducing Results ```bash # Run HAT vs HNSW benchmark cargo test --test phase31_hat_vs_hnsw -- --nocapture # Run real embedding dimension tests cargo test --test phase32_real_embeddings -- --nocapture # Run persistence tests cargo test --test phase33_persistence -- --nocapture # Run end-to-end LLM demo python examples/demo_hat_memory.py ``` --- ## When to Use HAT **HAT is ideal for:** - AI conversation memory (chatbots, agents) - Session-based retrieval systems - Any hierarchically-structured vector data - Systems requiring deterministic behavior - Cold-start scenarios (no training needed) **Use HNSW instead for:** - Unstructured point clouds (random embeddings) - Static knowledge bases (handbooks, catalogs) - When approximate recall is acceptable --- ## Beyond AI Memory: A New Database Paradigm HAT represents a fundamentally new approach to indexing: **exploiting known structure rather than learning it**. | Database Type | Structure | Semantics | |---------------|-----------|-----------| | Relational | Explicit (foreign keys) | None | | Document | Implicit (nesting) | None | | Vector (HNSW) | Learned from data | Yes | | **HAT** | **Explicit + exploited** | **Yes** | Traditional vector databases treat embeddings as unstructured point clouds, spending compute to *discover* topology. HAT inverts this: **known hierarchy is free information - use it.** ### General Applications Any domain with **hierarchical structure + semantic similarity** benefits from HAT: - **Legal/Medical Documents:** Case → Filing → Paragraph → Sentence - **Code Search:** Repository → Module → Function → Line - **IoT/Sensor Networks:** Facility → Zone → Device → Reading - **E-commerce:** Catalog → Category → Product → Variant - **Research Corpora:** Journal → Paper → Section → Citation ### The Core Insight > *"Position IS relationship. No foreign keys needed - proximity defines connection."* HAT combines the structural guarantees of document databases with the semantic power of vector search, without the computational overhead of learning topology from scratch. --- ## Citation ```bibtex @article{hat2026, title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory}, author={Young, Lucas and Automate Capture Research}, year={2026}, url={https://research.automate-capture.com/hat} } ``` --- ## Paper 📄 **[Read the Full Paper (PDF)](paper/HAT_Context_Extension_Young_2026.pdf)** --- ## License MIT License - see [LICENSE](LICENSE) for details. --- ## Links - **Research Site:** [research.automate-capture.com/hat](https://research.automate-capture.com/hat) - **Main Site:** [automate-capture.com](https://automate-capture.com)