Aether — 25M RWKV Language Model
24.7M parameters · RWKV v4 · CPU-first · Emotion-aware
A compact conversational language model built on the RWKV v4 architecture. Designed to run efficiently on CPU with no GPU required. Trained on 22K multi-turn conversations in English and Greek.
Try it live: GitHub Pages | Download v1.0 | Dataset
Quick Start
pip install torch huggingface_hub
import torch
from huggingface_hub import hf_hub_download
from tokenizer import Tokenizer
from model import RWKV
# Download model
path = hf_hub_download("konpep/aether-rwkv-25m", "aether_model.pt")
ckpt = torch.load(path, map_location="cpu")
# Load tokenizer
tokenizer = Tokenizer(vocab=ckpt["tokenizer_vocab"])
# Create model
model = RWKV(
vocab_size=ckpt["vocab_size"],
hidden_size=ckpt["hidden_size"],
num_layers=ckpt["num_layers"],
)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
# Chat
prompt = "User: Hello!\n\nAether:"
ids = tokenizer.encode(prompt, add_bos=True)
out = model.generate(ids, max_new=80, temperature=0.5, top_k=40, repetition_penalty=1.1)
print(tokenizer.decode(out[len(ids):], skip_special=True))
Download from CLI
huggingface-cli download konpep/aether-rwkv-25m aether_model.pt --local-dir .
Files
| File | Size | Description |
|---|---|---|
aether_model.pt |
93 MB | Full model weights + tokenizer vocab |
aether_tokenizer.json |
116 KB | Standalone tokenizer vocab |
aether_dataset.jsonl |
2.7 MB | Training data (22K conversations) |
Architecture
| Property | Value |
|---|---|
| Architecture | RWKV v4 |
| Layers | 12 |
| Hidden dim | 480 |
| Feed-forward | 960 |
| Vocab | 4,645 (word-level) |
| Parameters | 24,383,520 |
| Context | 1024 tokens |
| Inference state | 2 vectors/layer (no KV cache) |
Recommended Parameters
| Setting | Temp | Top-K | Rep. Penalty | Best for |
|---|---|---|---|---|
| Default | 0.5 | 40 | 1.1 | General chat |
| Precise | 0.3 | 20 | 1.0 | Q&A, facts |
| Creative | 0.8 | 60 | 1.2 | Stories, jokes |
| Deterministic | 0.1 | 10 | 1.0 | Reproducible output |
Emotion Tags
Aether supports emotion tags in responses. Tags appear as <tag> prefixes:
| Tag | Meaning |
|---|---|
<joy> |
Happy, positive |
<neutral> |
Neutral tone |
<curiosity> |
Curious, thoughtful |
<sarcasm> |
Playful sarcasm |
<confusion> |
Uncertain |
<encouragement> |
Supportive |
Example: User: Tell me a story\n\nAether: <curiosity> A man wrote a letter every day for a year to his late wife.
Reproducing Training
git clone https://github.com/konpep-dev/Aether-RWKV-
cd Aether-RWKV-
python train.py # ~15 hours on 4-core CPU
Training uses the dataset at konpep/aether-conversations.
Limitations
- Small vocabulary limits out-of-vocabulary handling
- Trained on a single CPU machine; outputs are not production-grade
- English and Greek only
- No instruction-tuning or RLHF
- 25M parameters = limited reasoning capability
Source
- GitHub: konpep-dev/Aether-RWKV-
- Dataset: konpep/aether-conversations
- Project page: https://konpep-dev.github.io/Aether-RWKV-/
License
MIT
- Downloads last month
- 593
Model tree for konpep/aether-rwkv-25m
Unable to build the model tree, the base model loops to the model itself. Learn more.