Instructions to use HawkLabofficial/HawkGPT-v0.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use HawkLabofficial/HawkGPT-v0.5 with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://HawkLabofficial/HawkGPT-v0.5") - Notebooks
- Google Colab
- Kaggle
| language: ru | |
| license: mit | |
| library_name: keras | |
| tags: | |
| - gpt | |
| - russian | |
| - transformer | |
| - gqa | |
| - alibi | |
| - rmsnorm | |
| pipeline_tag: text-generation | |
| datasets: | |
| - HawkLabofficial/HawkGPT-v0.5 # synthetic | |
| metrics: | |
| - accuracy | |
| # HawkGPT v0.5 | |
| Russian-language GPT-style transformer language model (24M params) trained from scratch on synthetic Q&A data. | |
| ## Architecture | |
| | Param | Value | | |
| |-------|-------| | |
| | Embed dim | 512 | | |
| | Layers | 8 | | |
| | Query heads | 8 | | |
| | KV heads (GQA) | 2 | | |
| | FF dim | 2048 | | |
| | Vocab size | ~3200 (BPE) | | |
| | Max seq len | 256 | | |
| | Parameters | 24,384,000 | | |
| **Key design choices:** | |
| - **Grouped Query Attention (GQA)** — 8 query / 2 KV heads for faster inference | |
| - **ALiBi** — position biases instead of learned embeddings (extrapolates to longer sequences) | |
| - **RMSNorm** — faster normalization without mean computation | |
| - **No bias terms** — in all Linear layers | |
| - **Weight tying** — embedding and output projection share weights | |
| - **BPE tokenizer** — digit-aware (individual digit tokens), vocab ~3200 | |
| ## Training | |
| - Mixed precision (bfloat16) with XLA JIT compilation | |
| - AdamW optimizer, cosine LR schedule with 1000-step warmup | |
| - EMA (exponential moving average) of weights | |
| - Batch size 96, max 30 epochs (early stopping patience 10) | |
| - Trained on NVIDIA RTX 4070 12GB | |
| ### Training history | |
| | Epoch | Loss | Throughput | | |
| |-------|------|------------| | |
| | 1 | 0.0663 | 57K t/s | | |
| | 5 | 0.0520 | 157K t/s | | |
| | 10 | 0.0512 | 360K t/s | | |
| | 13 (best) | **0.0479** | 153K t/s | | |
| ## Benchmark | |
| **Overall: 40/72 (55.6%)** | |
| | Category | Score | | |
| |----------|-------| | |
| | Division | 90% | | |
| | Knowledge | 80% | | |
| | Algebra | 75% | | |
| | Addition | 60% | | |
| | Multiplication | 60% | | |
| | Multi-step | 50% | | |
| | Subtraction | 40% | | |
| | Word problems | 33% | | |
| | Sequences | 20% | | |
| ## Dataset | |
| Synthetic Russian Q&A corpus (~200K+ pairs, ~80M+ characters) covering: | |
| - Arithmetic (add, sub, mul, div, multi-step) | |
| - Algebra (linear, quadratic, systems) | |
| - Sequences, geometry, physics | |
| - Python code tracing | |
| - General knowledge (science, history, geography) | |
| - Dialogue & conversations | |
| ## Usage | |
| ```python | |
| import tensorflow as tf | |
| from tokenizers import Tokenizer | |
| # Load tokenizer | |
| tokenizer = Tokenizer.from_file("tokenizer.json") | |
| tokenizer.no_padding() | |
| tokenizer.no_truncation() | |
| # Build & load model | |
| from model import build_model | |
| model = build_model(vocab_size=tokenizer.get_vocab_size()) | |
| model.load_weights("model_best.weights.h5") | |
| # Generate | |
| def generate(prompt, temperature=0.7, top_k=50, max_new=200): | |
| bos_id = tokenizer.token_to_id("[BOS]") | |
| eos_id = tokenizer.token_to_id("[EOS]") | |
| enc = tokenizer.encode(prompt) | |
| ids = [bos_id] + enc.ids | |
| for _ in range(max_new): | |
| ctx = tf.constant([ids[-256:]], dtype=tf.int32) | |
| logits = model(ctx, training=False)[0, -1, :] / temperature | |
| if top_k: | |
| vals, _ = tf.math.top_k(logits, k=top_k) | |
| logits = tf.where(logits < vals[-1], -1e9, logits) | |
| next_id = int(tf.random.categorical(tf.nn.softmax(logits)[None], 1)[0, 0]) | |
| if next_id in (eos_id, tokenizer.token_to_id("[PAD]")): | |
| break | |
| ids.append(next_id) | |
| return tokenizer.decode(ids[len([bos_id] + enc.ids):]) | |
| print(generate("Вопрос: 2 + 2 =")) | |
| ``` | |
| ### CLI | |
| ```bash | |
| python3 generate.py --prompt "Вопрос: Сколько будет 5 * 7?" --temperature 0.3 --top_k 20 | |
| ``` | |
| ## Files | |
| | File | Description | | |
| |------|-------------| | |
| | `model_best.weights.h5` | Best checkpoint weights (94 MB) | | |
| | `tokenizer.json` | BPE tokenizer | | |
| | `config.py` | Full model & training config | | |
| | `model.py` | Model definition (GQA, RMSNorm, ALiBi) | | |
| | `generate.py` | Inference script | | |
| ## License | |
| MIT | |