Aether — 25M RWKV Language Model

24.7M parameters · RWKV v4 · CPU-first · Emotion-aware

A compact conversational language model built on the RWKV v4 architecture. Designed to run efficiently on CPU with no GPU required. Trained on 22K multi-turn conversations in English and Greek.

Try it live: GitHub Pages | Download v1.0 | Dataset

Quick Start

pip install torch huggingface_hub
import torch
from huggingface_hub import hf_hub_download
from tokenizer import Tokenizer
from model import RWKV

# Download model
path = hf_hub_download("konpep/aether-rwkv-25m", "aether_model.pt")
ckpt = torch.load(path, map_location="cpu")

# Load tokenizer
tokenizer = Tokenizer(vocab=ckpt["tokenizer_vocab"])

# Create model
model = RWKV(
    vocab_size=ckpt["vocab_size"],
    hidden_size=ckpt["hidden_size"],
    num_layers=ckpt["num_layers"],
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

# Chat
prompt = "User: Hello!\n\nAether:"
ids = tokenizer.encode(prompt, add_bos=True)
out = model.generate(ids, max_new=80, temperature=0.5, top_k=40, repetition_penalty=1.1)
print(tokenizer.decode(out[len(ids):], skip_special=True))

Download from CLI

huggingface-cli download konpep/aether-rwkv-25m aether_model.pt --local-dir .

Files

File Size Description
aether_model.pt 93 MB Full model weights + tokenizer vocab
aether_tokenizer.json 116 KB Standalone tokenizer vocab
aether_dataset.jsonl 2.7 MB Training data (22K conversations)

Architecture

Property Value
Architecture RWKV v4
Layers 12
Hidden dim 480
Feed-forward 960
Vocab 4,645 (word-level)
Parameters 24,383,520
Context 1024 tokens
Inference state 2 vectors/layer (no KV cache)

Recommended Parameters

Setting Temp Top-K Rep. Penalty Best for
Default 0.5 40 1.1 General chat
Precise 0.3 20 1.0 Q&A, facts
Creative 0.8 60 1.2 Stories, jokes
Deterministic 0.1 10 1.0 Reproducible output

Emotion Tags

Aether supports emotion tags in responses. Tags appear as <tag> prefixes:

Tag Meaning
<joy> Happy, positive
<neutral> Neutral tone
<curiosity> Curious, thoughtful
<sarcasm> Playful sarcasm
<confusion> Uncertain
<encouragement> Supportive

Example: User: Tell me a story\n\nAether: <curiosity> A man wrote a letter every day for a year to his late wife.

Reproducing Training

git clone https://github.com/konpep-dev/Aether-RWKV-
cd Aether-RWKV-
python train.py           # ~15 hours on 4-core CPU

Training uses the dataset at konpep/aether-conversations.

Limitations

  • Small vocabulary limits out-of-vocabulary handling
  • Trained on a single CPU machine; outputs are not production-grade
  • English and Greek only
  • No instruction-tuning or RLHF
  • 25M parameters = limited reasoning capability

Source

License

MIT

Downloads last month
593
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for konpep/aether-rwkv-25m

Unable to build the model tree, the base model loops to the model itself. Learn more.

Space using konpep/aether-rwkv-25m 1