Villanova-2B-Base-2512-Preview / README.md

azugarini

Update README.md

c9e203a verified 3 days ago

preview code

raw

history blame contribute delete

5.29 kB

metadata

license: apache-2.0
language:
  - en
  - de
  - es
  - fr
  - it
pipeline_tag: text-generation
library_name: transformers

Model Card for Villanova-2B-Base-2512-Preview

Villanova is a family of multilingual and multimodal Large Language Models (LLMs). VillanovaAI/Villanova-2B-Base-2512-Preview is a base text-only LLM.

DISCLAIMER: This model is a preview.

Model Summary

Villanova-2B-Base-2512-Preview is a decoder-only transformer of 2B parameters.

Villanova-2B-Base-2512-Preview was pre-trained from scratch on 2.2 trillion tokens drawn from a curated, high-quality corpus, in a two-stage fashion.

It supports 5 languages: English, Italian, Spanish, French and German.

Stage 1 (0T → 2T tokens)

Broad, diverse multilingual data mixture with primary focus on the five core languages of the Villanova project.

Stage 2 (2T → 2.2T tokens)

Cosine annealing learning rate schedule over a mixture of 200B higher-quality tokens.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "VillanovaAI/Villanova-2B-Base-2512-Preview"
device = "cuda"  # for GPU usage or "cpu" for CPU usage

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

# prepare the model input
prompt = "What is gravity?"
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

# Generate the output
generated_ids = model.generate(**model_inputs, max_new_tokens=128, do_sample=True, temperature=0.7)

# Get and decode the output
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
print(tokenizer.decode(output_ids, skip_special_tokens=True))

Evaluation

Overall performance of Villanova-2B-Base-2512-Preview on English and Multilingual Benchmarks.

Detailed results are enlisted in the following tables.

Global evaluation:

Model	Training Tokens (T)	Average	arc_easy	hellaswag	hellaswag_de	hellaswag_es	hellaswag_fr	hellaswag_it	openbookqa	piqa	sciq	winogrande	xcopa_it	xnli_de	xnli_en	xnli_es	xnli_fr
Minerva-3B-base-v1.0	0.66	47.20	62.33	46.28	27.20	29.69	29.02	40.01	24.60	74.27	88.00	56.75	69.60	34.54	52.13	36.31	37.35
EuroLLM-1.7B	4	52.35	69.07	45.04	37.97	40.98	40.05	39.46	29.80	72.20	90.60	61.25	66.00	47.99	50.24	45.58	49.00
OLMo-2-0425-1B	4	49.15	72.73	50.79	29.79	31.34	32.60	29.19	30.00	75.95	95.30	64.72	52.60	40.00	51.77	37.63	42.89
salamandra-2b	13	52.90	71.04	47.19	38.01	42.07	40.60	38.56	26.80	72.69	91.90	61.72	65.40	47.79	51.97	49.08	48.67
Qwen3-1.7B-Base	-	53.32	73.61	49.29	37.54	40.73	39.27	38.45	30.20	75.90	95.80	64.01	64.20	46.47	54.50	44.06	45.78
Villanova-2B-Base-2512-Preview	2.2	55.25	75.13	48.57	42.06	45.72	44.62	43.32	26.60	75.08	94.40	61.96	68.40	49.36	52.21	49.04	52.33

English only:

Model	Average	arc_easy	hellaswag	openbookqa	piqa	sciq	winogrande	xnli_en
Minerva-3B-base-v1.0	57.76	62.33	46.28	24.60	74.27	88.00	56.75	52.13
EuroLLM-1.7B	59.74	69.07	45.04	29.80	72.20	90.60	61.25	50.24
OLMo-2-0425-1B	63.04	72.73	50.79	30.00	75.95	95.30	64.72	51.77
salamandra-2b	60.47	71.04	47.19	26.80	72.69	91.90	61.72	51.97
Qwen3-1.7B-Base	63.33	73.61	49.29	30.20	75.90	95.80	64.01	54.50
Villanova-2B-Base-2512-Preview	61.99	75.13	48.57	26.60	75.08	94.40	61.96	52.21

Multilingual Benchmarks:

Model	Average	hellaswag_de	hellaswag_es	hellaswag_fr	hellaswag_it	xcopa_it	xnli_de	xnli_es	xnli_fr
Minerva-3B-base-v1.0	37.96	27.20	29.69	29.02	40.01	69.60	34.54	36.31	37.35
EuroLLM-1.7B	45.88	37.97	40.98	40.05	39.46	66.00	47.99	45.58	49.00
OLMo-2-0425-1B	37.01	29.79	31.34	32.60	29.19	52.60	40.00	37.63	42.89
salamandra-2b	46.27	38.01	42.07	40.60	38.56	65.40	47.79	49.08	48.67
Qwen3-1.7B-Base	44.56	37.54	40.73	39.27	38.45	64.20	46.47	44.06	45.78
Villanova-2B-Base-2512-Preview	49.36	42.06	45.72	44.62	43.32	68.40	49.36	49.04	52.33