Veyra3 5M Base

Veyra3 5M Base is a tiny Gemma4-style causal language model trained as an architecture and pipeline smoke test.

This repository is a native Hugging Face Transformers compatibility export using gemma4_text / Gemma4ForCausalLM.

Important fidelity note

The faithful checkpoint artifact is the ONNX export in veyra-ai/veyra3-5m-base-onnx-int8. This Transformers repo uses HF's native Gemma4 implementation, which has stricter architecture expectations than the training smoke-test model. The exporter copied every compatible trained tensor into native Gemma4ForCausalLM and kept HF-only tensors initialized by Transformers. See conversion_report.json for details.

Model metadata

Repo: veyra-ai/veyra3-5m-base
Native architecture: Gemma4ForCausalLM
Native model type: gemma4_text
Faithful local parameter count before native conversion: 4,377,856
Checkpoint tokens seen: 350486528
Best validation loss from checkpoint: 4.382341027259827
Context length: 4096
Sliding window: 512
Vocab size: 4096
HF layer types: ['sliding_attention', 'sliding_attention', 'sliding_attention', 'full_attention', 'sliding_attention', 'full_attention']

Conversion summary

copied_exact: 54
copied_partial_overlap: 1
hf_only_keep_native_init: 30

Load

from transformers import AutoTokenizer, AutoModelForCausalLM

tok = AutoTokenizer.from_pretrained('veyra-ai/veyra3-5m-base', trust_remote_code=False)
model = AutoModelForCausalLM.from_pretrained('veyra-ai/veyra3-5m-base', trust_remote_code=False)

Downloads last month: 752

Safetensors

Model size

4.5M params

Tensor type

BF16

Space using veyra-ai/veyra3-5m-base 1

Collection including veyra-ai/veyra3-5m-base

Veyra3

Collection

The third generation of Veyra, higher context length and improved reasoning. • 2 items • Updated 8 days ago