VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use
Paper • 2605.13989 • Published
VectraYX-Base is a 260M-parameter Spanish cybersecurity language model trained from scratch using the same three-phase curriculum and replay-buffer recipe as VectraYX-Nano, scaled to a mid-tier architecture (d_model=1024, n_layers=16).
| Model | Params | B1 KW | B3 TM | B4 Tool | B5 Chat |
|---|---|---|---|---|---|
| VectraYX-Nano v7 (N=4) | 42M | 0.332±0.005 | — | 0.230±0.052 | 0.725±0.130 |
| VectraYX-Base 260M | 260M | 0.325 | 0.114 | 0.000 | 0.800 |
| Base + LoRA mini (ratio 1:21, N=4) | 260M | 0.019±0.003 | — | 0.445±0.201 | 0.600 |
| VectraYX-Pro 3B | 3.2B | 0.341 | 0.686 | 0.600 | 0.800 |
B4=0.000 on mixed SFT is a corpus-density artifact — at ratio 1:21 (LoRA mini), Base reaches B4=0.445±0.201.
| Component | Value |
|---|---|
| Parameters | 260M |
| Layers | 16 |
| Hidden dim | 1024 |
| Attention heads | 16 (GQA 16q/4kv) |
| FFN | SwiGLU |
| Positional encoding | RoPE |
| Normalization | RMSNorm + QK-Norm |
| Tokenizer | BPE-16384 (same as Nano) |
Same architecture config as configs/base.json in vectrayx-paper-code.
| File | Description |
|---|---|
base_sft_v1_s42.pt |
Base 260M post-SFT, seed 42 (~3.1 GB) |
Training ran on AWS SageMaker ml.g5.xlarge (NVIDIA A10G 24GB), ~11 wall-clock hours, ~$11 USD.
@misc{santillana2026vectrayx,
title = {VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model
with Curriculum Learning and Native Tool Use},
author = {Santillana, Juan S.},
year = {2026},
eprint = {2605.13989},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2605.13989}
}