Henyo-153M-CulturaX

Henyo is a 153M parameter Tagalog Language Model trained on the MaAIos/culturax-filipino-subset dataset. It utilizes a custom efficient architecture heavily inspired by Llama 2/3 and PaLM.

Architecture Details

This model uses a custom Decoder-Only Transformer architecture built from scratch in PyTorch.

Hyperparameter	Value
Parameters	~153M
Context Window	1024 tokens
Embedding Dim	768
Layers (Depth)	12
Attention Heads	12
KV Heads (GQA)	4
Vocab Size	50,257 (GPT-2 tokenizer)

Key Features

SwiGLU Activation: High-performance gated linear unit activation.
Grouped Query Attention (GQA): 12 Query heads sharing 4 KV heads (3:1 ratio) for efficient inference.
Rotary Positional Embeddings (RoPE): For better generalization on sequence lengths.
RMSNorm: Pre-normalization for training stability.

Training Configuration

Dataset: MaAIos/culturax-filipino-subset
Mode: Streaming (Iterable Dataset)
Optimizer: AdamW
Scheduler: Cosine Decay
Gradient Accumulation: 8 steps (Effective batch size ~32)
Precision: Mixed Precision (FP16)

Usage

Since this model uses a custom architecture, you must include the class definitions (provided in the train_henyo.py file in this repo) or use the inference script below.

# See inference_henyo.py in files for full class definitions
from transformers import AutoTokenizer

model_id = "marcuscedricridia/Henyo-153M-CulturaX"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load model using custom class wrapper...

Reproducibility

The full training script (train_henyo.py) is included in the file listing of this repository.

Downloads last month: 4

Safetensors

Model size

0.1B params

Tensor type

F32