Aether — 25M RWKV Language Model

24.7M parameters · RWKV v4 · CPU-first · Emotion-aware

A compact conversational language model built on the RWKV v4 architecture. Designed to run efficiently on CPU with no GPU required. Trained on 22K multi-turn conversations in English and Greek.

Try it live: GitHub Pages | Download v1.0 | Dataset

Quick Start

pip install torch huggingface_hub

import torch
from huggingface_hub import hf_hub_download
from tokenizer import Tokenizer
from model import RWKV

# Download model
path = hf_hub_download("konpep/aether-rwkv-25m", "aether_model.pt")
ckpt = torch.load(path, map_location="cpu")

# Load tokenizer
tokenizer = Tokenizer(vocab=ckpt["tokenizer_vocab"])

# Create model
model = RWKV(
    vocab_size=ckpt["vocab_size"],
    hidden_size=ckpt["hidden_size"],
    num_layers=ckpt["num_layers"],
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

# Chat
prompt = "User: Hello!\n\nAether:"
ids = tokenizer.encode(prompt, add_bos=True)
out = model.generate(ids, max_new=80, temperature=0.5, top_k=40, repetition_penalty=1.1)
print(tokenizer.decode(out[len(ids):], skip_special=True))

Download from CLI

huggingface-cli download konpep/aether-rwkv-25m aether_model.pt --local-dir .

Files

File	Size	Description
`aether_model.pt`	93 MB	Full model weights + tokenizer vocab
`aether_tokenizer.json`	116 KB	Standalone tokenizer vocab
`aether_dataset.jsonl`	2.7 MB	Training data (22K conversations)

Architecture

Property	Value
Architecture	RWKV v4
Layers	12
Hidden dim	480
Feed-forward	960
Vocab	4,645 (word-level)
Parameters	24,383,520
Context	1024 tokens
Inference state	2 vectors/layer (no KV cache)

Recommended Parameters

Setting	Temp	Top-K	Rep. Penalty	Best for
Default	0.5	40	1.1	General chat
Precise	0.3	20	1.0	Q&A, facts
Creative	0.8	60	1.2	Stories, jokes
Deterministic	0.1	10	1.0	Reproducible output

Emotion Tags

Aether supports emotion tags in responses. Tags appear as <tag> prefixes:

Tag	Meaning
`<joy>`	Happy, positive
`<neutral>`	Neutral tone
`<curiosity>`	Curious, thoughtful
`<sarcasm>`	Playful sarcasm
`<confusion>`	Uncertain
`<encouragement>`	Supportive

Example: User: Tell me a story\n\nAether: <curiosity> A man wrote a letter every day for a year to his late wife.

Reproducing Training

git clone https://github.com/konpep-dev/Aether-RWKV-
cd Aether-RWKV-
python train.py           # ~15 hours on 4-core CPU

Training uses the dataset at konpep/aether-conversations.

Limitations

Small vocabulary limits out-of-vocabulary handling
Trained on a single CPU machine; outputs are not production-grade
English and Greek only
No instruction-tuning or RLHF
25M parameters = limited reasoning capability

Source

GitHub: konpep-dev/Aether-RWKV-
Dataset: konpep/aether-conversations
Project page: https://konpep-dev.github.io/Aether-RWKV-/

License

MIT

Downloads last month: 593

Model tree for konpep/aether-rwkv-25m

Unable to build the model tree, the base model loops to the model itself. Learn more.

konpep
/

aether-rwkv-25m