Instructions to use HawkLabofficial/HawkGPT-v0.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use HawkLabofficial/HawkGPT-v0.5 with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://HawkLabofficial/HawkGPT-v0.5") - Notebooks
- Google Colab
- Kaggle
metadata
language: ru
license: mit
library_name: keras
tags:
- gpt
- russian
- transformer
- gqa
- alibi
- rmsnorm
pipeline_tag: text-generation
datasets:
- HawkLabofficial/HawkGPT-v0.5
metrics:
- accuracy
HawkGPT v0.5
Russian-language GPT-style transformer language model (24M params) trained from scratch on synthetic Q&A data.
Architecture
| Param | Value |
|---|---|
| Embed dim | 512 |
| Layers | 8 |
| Query heads | 8 |
| KV heads (GQA) | 2 |
| FF dim | 2048 |
| Vocab size | ~3200 (BPE) |
| Max seq len | 256 |
| Parameters | 24,384,000 |
Key design choices:
- Grouped Query Attention (GQA) — 8 query / 2 KV heads for faster inference
- ALiBi — position biases instead of learned embeddings (extrapolates to longer sequences)
- RMSNorm — faster normalization without mean computation
- No bias terms — in all Linear layers
- Weight tying — embedding and output projection share weights
- BPE tokenizer — digit-aware (individual digit tokens), vocab ~3200
Training
- Mixed precision (bfloat16) with XLA JIT compilation
- AdamW optimizer, cosine LR schedule with 1000-step warmup
- EMA (exponential moving average) of weights
- Batch size 96, max 30 epochs (early stopping patience 10)
- Trained on NVIDIA RTX 4070 12GB
Training history
| Epoch | Loss | Throughput |
|---|---|---|
| 1 | 0.0663 | 57K t/s |
| 5 | 0.0520 | 157K t/s |
| 10 | 0.0512 | 360K t/s |
| 13 (best) | 0.0479 | 153K t/s |
Benchmark
Overall: 40/72 (55.6%)
| Category | Score |
|---|---|
| Division | 90% |
| Knowledge | 80% |
| Algebra | 75% |
| Addition | 60% |
| Multiplication | 60% |
| Multi-step | 50% |
| Subtraction | 40% |
| Word problems | 33% |
| Sequences | 20% |
Dataset
Synthetic Russian Q&A corpus (~200K+ pairs, ~80M+ characters) covering:
- Arithmetic (add, sub, mul, div, multi-step)
- Algebra (linear, quadratic, systems)
- Sequences, geometry, physics
- Python code tracing
- General knowledge (science, history, geography)
- Dialogue & conversations
Usage
import tensorflow as tf
from tokenizers import Tokenizer
# Load tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.no_padding()
tokenizer.no_truncation()
# Build & load model
from model import build_model
model = build_model(vocab_size=tokenizer.get_vocab_size())
model.load_weights("model_best.weights.h5")
# Generate
def generate(prompt, temperature=0.7, top_k=50, max_new=200):
bos_id = tokenizer.token_to_id("[BOS]")
eos_id = tokenizer.token_to_id("[EOS]")
enc = tokenizer.encode(prompt)
ids = [bos_id] + enc.ids
for _ in range(max_new):
ctx = tf.constant([ids[-256:]], dtype=tf.int32)
logits = model(ctx, training=False)[0, -1, :] / temperature
if top_k:
vals, _ = tf.math.top_k(logits, k=top_k)
logits = tf.where(logits < vals[-1], -1e9, logits)
next_id = int(tf.random.categorical(tf.nn.softmax(logits)[None], 1)[0, 0])
if next_id in (eos_id, tokenizer.token_to_id("[PAD]")):
break
ids.append(next_id)
return tokenizer.decode(ids[len([bos_id] + enc.ids):])
print(generate("Вопрос: 2 + 2 ="))
CLI
python3 generate.py --prompt "Вопрос: Сколько будет 5 * 7?" --temperature 0.3 --top_k 20
Files
| File | Description |
|---|---|
model_best.weights.h5 |
Best checkpoint weights (94 MB) |
tokenizer.json |
BPE tokenizer |
config.py |
Full model & training config |
model.py |
Model definition (GQA, RMSNorm, ALiBi) |
generate.py |
Inference script |
License
MIT