SixOpen
/

HARE

@@ -20,6 +20,10 @@ base_model: Alibaba-NLP/gte-modernbert-base
 TL;DR: Stateful embedding model that replaces sliding-window attention with RWKV recurrence, allowing for incremental encoding and streaming semantic search.
 ![image](https://cdn-uploads.huggingface.co/production/uploads/65f47dc77874f3874523c628/GFqHaFy1fplauCi2mkm7M.png)
 Conventional embedding models are stateless: adding new content requires re-encoding from scratch because token representations depend on the entire sequence.
@@ -28,9 +32,6 @@ Each recurrent layer maintains a fixed-size state matrix that summarizes all pri
 Essentially, the biggest advantage is being able to perform semantic search on large files way before they're 100% available - and across multiple streams simultaneously (for example parallel distributed files, concurrent transcripts, documents arriving from different sources on the same topic)
-Demo:
-[![Try in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/SixOpen/HARE)
 ## Results
 ### LongEmbed (Needle/Passkey: nDCG@1; others: nDCG@10)
@@ -243,7 +244,9 @@ Three-stage pipeline:
 | `tokenizer.json` | Tokenizer |
 | `tokenizer_config.json` | Tokenizer config |
 | `surgery.py` | Standalone surgery CLI tool (inspect layers, perform surgery from scratch) |
-| `birwkv7.py` | BiRWKV-7 recurrence layer (required for loading) |
 | `streaming.py` | SpanEncoder for stateful incremental encoding |
 ## Intended uses
@@ -252,6 +255,11 @@ Three-stage pipeline:
 - Incremental indexing where text arrives sequentially and must be searchable before completion: live transcription, real-time meeting/dispatch/etc indexing, distributed (ie torrent) content search, incremental document editing
 - Multi-vector retrieval with chunk-level or token-level scoring
 ## Citation

 TL;DR: Stateful embedding model that replaces sliding-window attention with RWKV recurrence, allowing for incremental encoding and streaming semantic search.
+Live Demo:
+[![Try in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/SixOpen/HARE)
 ![image](https://cdn-uploads.huggingface.co/production/uploads/65f47dc77874f3874523c628/GFqHaFy1fplauCi2mkm7M.png)
 Conventional embedding models are stateless: adding new content requires re-encoding from scratch because token representations depend on the entire sequence.
 Essentially, the biggest advantage is being able to perform semantic search on large files way before they're 100% available - and across multiple streams simultaneously (for example parallel distributed files, concurrent transcripts, documents arriving from different sources on the same topic)
 ## Results
 ### LongEmbed (Needle/Passkey: nDCG@1; others: nDCG@10)
 | `tokenizer.json` | Tokenizer |
 | `tokenizer_config.json` | Tokenizer config |
 | `surgery.py` | Standalone surgery CLI tool (inspect layers, perform surgery from scratch) |
+| `birwkv7.py` | BiRWKV-7 recurrence layer /w Triton Kernel (required for loading) |
+| `modeling_hare.py` | Model wrapper |
+| `configuration_hare.py` | Config class |
 | `streaming.py` | SpanEncoder for stateful incremental encoding |
 ## Intended uses
 - Incremental indexing where text arrives sequentially and must be searchable before completion: live transcription, real-time meeting/dispatch/etc indexing, distributed (ie torrent) content search, incremental document editing
 - Multi-vector retrieval with chunk-level or token-level scoring
+## Limitations
+ - This is a research-grade model - although some numbers indicate long ctx sota on specific categories, it could benefit from seeing more diverse data during training as shown by the scores on legal case reports and stackoverflow above.
+ - Asymmetric streaming context - streaming mode uses forward (left-to-right) state carry, which accumulates full left context incrementally; the backward scan only sees within each piece, so right context is local
 ## Citation