HYDRA: Hybrid Dynamic Recurrent Architecture

A novel non-transformer language model built from scratch. Trained on CPU using a custom architecture that combines Mamba's Selective State Spaces, Griffin's Real-Gated Linear Recurrence (RG-LRU), and RWKV's channel mixing — with zero attention layers.

Architecture

Component	Source	Paper
Selective State Spaces	Mamba	arxiv:2312.00752
Real-Gated Linear Recurrence	Griffin	arxiv:2402.19427
Time/Channel Mixing	RWKV	arxiv:2305.13048
Multi-Scale Compression	Novel	Parallel recurrences at different timescales

Key Properties

NOT a transformer — no attention whatsoever
O(n) time — linear in sequence length (vs O(n²) transformers)
Constant memory at inference (no KV cache)
Content-aware selective gating (Mamba + Griffin fusion)
Multi-scale temporal processing

Architecture Diagram

Token Embedding → N × HydraBlock → RMSNorm → LM Head

HydraBlock:
  ├── RMSNorm → SelectiveGatedRecurrence → + residual
  └── RMSNorm → GatedChannelMixing (GeGeLU) → + residual

SelectiveGatedRecurrence (per timescale):
  ├── Input projection (2 branches)
  ├── Branch 1: Separable Conv1D → SiLU → Selective B,C projection
  ├── Input gate + Recurrence gate (from Griffin RG-LRU)
  ├── Gated recurrence: h_t = a_t·h_{t-1} + √(1-a_t²)·(i_t·B_t)
  └── Gated merge + output projection

Specs

Parameters: 19,274,816 (19.3M)
d_model: 256
Layers: 6
State dim: 16
Timescales: 2
Context length: 128

Training

Dataset: TinyStories (5,000 stories for quick training)
From scratch: Random initialization, no pretrained components
Best val_loss: 3.7988
Val_ppl: 44.6
Hardware: CPU only
Optimizer: AdamW (β₁=0.9, β₂=0.95)

Generated Samples

Once upon a time: Once upon a time, there was a little girl named Lily. She loved to play outside. She had a big, feeling very excited! She couldn't wait for a big, but she said.

The little boy was so happy to make the bird who loved A little dog: A little dog who ran to be happy. She was very excited that the park. She wanted to play with the window.

"I'm sorry, Lily. It is a voice?" She was so happy!

Ben did not give it and A girl named Lily: A girl named Lily. She was very happy he was very tall. She saw what he could not want to the forest. She says, "I'm sorry, we can have to go to the water. She was very very excited and happy.

Lily One day a boy: One day a boy named Lily liked to play with it. He said, "Don't be careful."

"We are happy, we have to go to the man that he decided to play with his friends. He was very happy that the dog had a man

Usage

import torch, json
from model import HydraModel, HydraConfig
from transformers import AutoTokenizer

config = HydraConfig(**json.load(open("config.json")))
model = HydraModel(config)
model.load_state_dict(torch.load("model.pt", map_location="cpu"))
model.eval()

tokenizer = AutoTokenizer.from_pretrained("gpt2")
prompt = "Once upon a time"
ids = torch.tensor([tokenizer.encode(prompt)])
with torch.no_grad():
    out = model.generate(ids, max_new_tokens=50, temperature=0.8, top_k=40)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Model Files

model.pt — trained weights
config.json — model configuration
model.py — full architecture source code

Research References

Gu & Dao, "Mamba: Linear-Time Sequence Modeling with Selective State Spaces", NeurIPS 2023
De et al., "Griffin: Mixing Gated Linear Recurrences with Local Attention", ICLR 2024
Peng et al., "RWKV: Reinventing RNNs for the Transformer Era", 2023
Eldan & Li, "TinyStories: How Small Can Language Models Be?", 2023

Downloads last month: 21

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train sukritvemula/hydra-small-tinystories

Papers for sukritvemula/hydra-small-tinystories

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 57

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 150

RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 21