azugarini's picture
Update README.md
c9e203a verified
metadata
license: apache-2.0
language:
  - en
  - de
  - es
  - fr
  - it
pipeline_tag: text-generation
library_name: transformers

Model Card for Villanova-2B-Base-2512-Preview

Villanova.AI logo

Villanova is a family of multilingual and multimodal Large Language Models (LLMs). VillanovaAI/Villanova-2B-Base-2512-Preview is a base text-only LLM.

DISCLAIMER: This model is a preview.

Model Summary

Villanova-2B-Base-2512-Preview is a decoder-only transformer of 2B parameters.

Villanova-2B-Base-2512-Preview was pre-trained from scratch on 2.2 trillion tokens drawn from a curated, high-quality corpus, in a two-stage fashion.

It supports 5 languages: English, Italian, Spanish, French and German.

Stage 1 (0T → 2T tokens)

Broad, diverse multilingual data mixture with primary focus on the five core languages of the Villanova project.

Stage 2 (2T → 2.2T tokens)

Cosine annealing learning rate schedule over a mixture of 200B higher-quality tokens.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "VillanovaAI/Villanova-2B-Base-2512-Preview"
device = "cuda"  # for GPU usage or "cpu" for CPU usage

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

# prepare the model input
prompt = "What is gravity?"
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

# Generate the output
generated_ids = model.generate(**model_inputs, max_new_tokens=128, do_sample=True, temperature=0.7)

# Get and decode the output
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
print(tokenizer.decode(output_ids, skip_special_tokens=True))

Evaluation

Overall performance of Villanova-2B-Base-2512-Preview on English and Multilingual Benchmarks.

Model size/performance

Detailed results are enlisted in the following tables.

Global evaluation:

Model Training Tokens (T) Average arc_easy hellaswag hellaswag_de hellaswag_es hellaswag_fr hellaswag_it openbookqa piqa sciq winogrande xcopa_it xnli_de xnli_en xnli_es xnli_fr
Minerva-3B-base-v1.0 0.66 47.20 62.33 46.28 27.20 29.69 29.02 40.01 24.60 74.27 88.00 56.75 69.60 34.54 52.13 36.31 37.35
EuroLLM-1.7B 4 52.35 69.07 45.04 37.97 40.98 40.05 39.46 29.80 72.20 90.60 61.25 66.00 47.99 50.24 45.58 49.00
OLMo-2-0425-1B 4 49.15 72.73 50.79 29.79 31.34 32.60 29.19 30.00 75.95 95.30 64.72 52.60 40.00 51.77 37.63 42.89
salamandra-2b 13 52.90 71.04 47.19 38.01 42.07 40.60 38.56 26.80 72.69 91.90 61.72 65.40 47.79 51.97 49.08 48.67
Qwen3-1.7B-Base - 53.32 73.61 49.29 37.54 40.73 39.27 38.45 30.20 75.90 95.80 64.01 64.20 46.47 54.50 44.06 45.78
Villanova-2B-Base-2512-Preview 2.2 55.25 75.13 48.57 42.06 45.72 44.62 43.32 26.60 75.08 94.40 61.96 68.40 49.36 52.21 49.04 52.33

English only:

Model Average arc_easy hellaswag openbookqa piqa sciq winogrande xnli_en
Minerva-3B-base-v1.0 57.76 62.33 46.28 24.60 74.27 88.00 56.75 52.13
EuroLLM-1.7B 59.74 69.07 45.04 29.80 72.20 90.60 61.25 50.24
OLMo-2-0425-1B 63.04 72.73 50.79 30.00 75.95 95.30 64.72 51.77
salamandra-2b 60.47 71.04 47.19 26.80 72.69 91.90 61.72 51.97
Qwen3-1.7B-Base 63.33 73.61 49.29 30.20 75.90 95.80 64.01 54.50
Villanova-2B-Base-2512-Preview 61.99 75.13 48.57 26.60 75.08 94.40 61.96 52.21

Multilingual Benchmarks:

Model Average hellaswag_de hellaswag_es hellaswag_fr hellaswag_it xcopa_it xnli_de xnli_es xnli_fr
Minerva-3B-base-v1.0 37.96 27.20 29.69 29.02 40.01 69.60 34.54 36.31 37.35
EuroLLM-1.7B 45.88 37.97 40.98 40.05 39.46 66.00 47.99 45.58 49.00
OLMo-2-0425-1B 37.01 29.79 31.34 32.60 29.19 52.60 40.00 37.63 42.89
salamandra-2b 46.27 38.01 42.07 40.60 38.56 65.40 47.79 49.08 48.67
Qwen3-1.7B-Base 44.56 37.54 40.73 39.27 38.45 64.20 46.47 44.06 45.78
Villanova-2B-Base-2512-Preview 49.36 42.06 45.72 44.62 43.32 68.40 49.36 49.04 52.33