Model Card for Villanova-2B-Base-2512-Preview

Villanova.AI logo

Villanova is a family of multilingual and multimodal Large Language Models (LLMs). VillanovaAI/Villanova-2B-Base-2512-Preview is a base text-only LLM.

DISCLAIMER: This model is a preview.

Model Summary

Villanova-2B-Base-2512-Preview is a decoder-only transformer of 2B parameters.

Villanova-2B-Base-2512-Preview was pre-trained from scratch on 2.2 trillion tokens drawn from a curated, high-quality corpus, in a two-stage fashion.

It supports 5 languages: English, Italian, Spanish, French and German.

Stage 1 (0T โ†’ 2T tokens)

Broad, diverse multilingual data mixture with primary focus on the five core languages of the Villanova project.

Stage 2 (2T โ†’ 2.2T tokens)

Cosine annealing learning rate schedule over a mixture of 200B higher-quality tokens.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "VillanovaAI/Villanova-2B-Base-2512-Preview"
device = "cuda"  # for GPU usage or "cpu" for CPU usage

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

# prepare the model input
prompt = "What is gravity?"
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

# Generate the output
generated_ids = model.generate(**model_inputs, max_new_tokens=128, do_sample=True, temperature=0.7)

# Get and decode the output
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
print(tokenizer.decode(output_ids, skip_special_tokens=True))

Evaluation

Overall performance of Villanova-2B-Base-2512-Preview on English and Multilingual Benchmarks.

Model size/performance

Detailed results are enlisted in the following tables.

Global evaluation:

Model Training Tokens (T) Average arc_easy hellaswag hellaswag_de hellaswag_es hellaswag_fr hellaswag_it openbookqa piqa sciq winogrande xcopa_it xnli_de xnli_en xnli_es xnli_fr
Minerva-3B-base-v1.0 0.66 47.20 62.33 46.28 27.20 29.69 29.02 40.01 24.60 74.27 88.00 56.75 69.60 34.54 52.13 36.31 37.35
EuroLLM-1.7B 4 52.35 69.07 45.04 37.97 40.98 40.05 39.46 29.80 72.20 90.60 61.25 66.00 47.99 50.24 45.58 49.00
OLMo-2-0425-1B 4 49.15 72.73 50.79 29.79 31.34 32.60 29.19 30.00 75.95 95.30 64.72 52.60 40.00 51.77 37.63 42.89
salamandra-2b 13 52.90 71.04 47.19 38.01 42.07 40.60 38.56 26.80 72.69 91.90 61.72 65.40 47.79 51.97 49.08 48.67
Qwen3-1.7B-Base - 53.32 73.61 49.29 37.54 40.73 39.27 38.45 30.20 75.90 95.80 64.01 64.20 46.47 54.50 44.06 45.78
Villanova-2B-Base-2512-Preview 2.2 55.25 75.13 48.57 42.06 45.72 44.62 43.32 26.60 75.08 94.40 61.96 68.40 49.36 52.21 49.04 52.33

English only:

Model Average arc_easy hellaswag openbookqa piqa sciq winogrande xnli_en
Minerva-3B-base-v1.0 57.76 62.33 46.28 24.60 74.27 88.00 56.75 52.13
EuroLLM-1.7B 59.74 69.07 45.04 29.80 72.20 90.60 61.25 50.24
OLMo-2-0425-1B 63.04 72.73 50.79 30.00 75.95 95.30 64.72 51.77
salamandra-2b 60.47 71.04 47.19 26.80 72.69 91.90 61.72 51.97
Qwen3-1.7B-Base 63.33 73.61 49.29 30.20 75.90 95.80 64.01 54.50
Villanova-2B-Base-2512-Preview 61.99 75.13 48.57 26.60 75.08 94.40 61.96 52.21

Multilingual Benchmarks:

Model Average hellaswag_de hellaswag_es hellaswag_fr hellaswag_it xcopa_it xnli_de xnli_es xnli_fr
Minerva-3B-base-v1.0 37.96 27.20 29.69 29.02 40.01 69.60 34.54 36.31 37.35
EuroLLM-1.7B 45.88 37.97 40.98 40.05 39.46 66.00 47.99 45.58 49.00
OLMo-2-0425-1B 37.01 29.79 31.34 32.60 29.19 52.60 40.00 37.63 42.89
salamandra-2b 46.27 38.01 42.07 40.60 38.56 65.40 47.79 49.08 48.67
Qwen3-1.7B-Base 44.56 37.54 40.73 39.27 38.45 64.20 46.47 44.06 45.78
Villanova-2B-Base-2512-Preview 49.36 42.06 45.72 44.62 43.32 68.40 49.36 49.04 52.33
Downloads last month
63
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for VillanovaAI/Villanova-2B-Base-2512-Preview

Finetunes
2 models
Quantizations
1 model

Collection including VillanovaAI/Villanova-2B-Base-2512-Preview