Model Card for Villanova-2B-2603

Villanova.AI logo

Villanova-2B-2603 is a fully open, multilingual instruction-tuned Large Language Model developed by Villanova.AI. Part of the Villanova project, it is designed to advance open European language technology with native support for five European languages. All model weights, training data sources, and training details are publicly released.

Built on top of Villanova-2B-Base-2603 — a 2.4B-parameter model pretrained from scratch — this instruction-tuned model offers strong multilingual instruction following and safety alignment under a fully open Apache 2.0 license.


Model Family

Villanova-2B-Base-2603 — Base model (4.4T)
 ↳ Villanova-2B-2603 — SFT / Instruct — 📍 This model
  ↳ Villanova-2B-2603-GGUF — Quantized
 ↳ Villanova-2B-VL-2603 — Vision-Language Instruct
  ↳ Villanova-2B-VL-2603-GGUF — Quantized

Villanova-2B-Base-2512-Preview — Base model (2.2T) (previous version, not recommended)
 ↳ Villanova-2B-2512-Preview — SFT / Instruct (previous version, not recommended)


Highlights

  • European-focused, fully open model released under Apache 2.0
  • Native multilingual support for 5 European languages: English, French, German, Italian, and Spanish
  • Strong instruction following, competitive with larger commercial models
  • Robust multilingual safety alignment across all supported languages
  • +58% overall improvement over our previous release (Villanova-2B-2512-Preview)
  • Only 2B parameters, efficient enough for edge and on-device deployment

Model Summary

Architecture Decoder-only Transformer (LLaMA-based)
Parameters 2.4B
Base Model VillanovaAI/Villanova-2B-Base-2603 (pretrained from scratch)
Pre-training Data 4.4T tokens (multilingual, two-stage)
Fine-tuning Data VillanovaAI/villanova-sft-2603
Languages English, French, German, Italian, Spanish
Context Length 32,768 tokens
Precision bfloat16
License Apache 2.0

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "VillanovaAI/Villanova-2B-2603"
device = "cuda"  # or "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

messages = [
    {"role": "user", "content": "Explain quantum entanglement in simple terms."}
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([input_text], return_tensors="pt").to(device)

generated_ids = model.generate(**model_inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
print(tokenizer.decode(output_ids, skip_special_tokens=True))

Evaluation

Villanova-2B-2603 was extensively evaluated across 25 benchmarks covering Reasoning, Question Answering, Safety, and Instruction Following in both English and multilingual settings. All evaluations were performed using identical settings and prompts for fair comparison.

Tables are sorted by the main metric (descending). Models are grouped into Fully Open and Open Weight categories.

Overall Performance

Villanova-2B-2603 is the #1 fully open model in overall average across all benchmarks.

Model Size Reasoning QA Safety Instr. Follow Overall
Fully Open
Villanova-2B-2603 2.4B 31.0 33.1 39.5 45.1 36.9
OLMo-2-0425-1B-Instruct 1.2B 38.7 35.6 19.4 39.3 33.9
Minerva-7B-instruct-v1.0 7.4B 27.1 36.2 30.1 16.9 28.5
EuroLLM-1.7B-Instruct 1.7B 26.0 24.7 3.8 19.5 19.5
salamandra-2b-instruct 2.3B 23.6 26.6 9.6 15.7 20.0
Open Weight
Llama-3.2-3B-Instruct 3.2B 51.2 48.1 56.8 48.1 50.4
Qwen2.5-3B-Instruct 3.1B 39.4 35.8 54.7 46.8 42.9
Llama-3.2-1B-Instruct 1.2B 37.5 38.1 56.6 35.5 41.1
gemma-3-1b-it 1.0B 28.5 27.0 53.6 39.9 35.7
Qwen3-1.7B 1.7B 37.4 37.5 2.6 19.5 26.2

Instruction Following

Villanova-2B-2603 is the #1 fully open model for instruction following, and is competitive with larger open weight models. The MARCO benchmark evaluates structured instruction following across all five languages.

Model Size IFEval MARCO-EN MARCO-DE MARCO-ES MARCO-FR MARCO-IT Avg
Fully Open
Villanova-2B-2603 2.4B 62.0 39.4 40.5 44.2 42.5 42.1 45.1
OLMo-2-0425-1B-Instruct 1.2B 77.9 52.9 23.1 29.0 27.9 24.9 39.3
EuroLLM-1.7B-Instruct 1.7B 34.5 18.3 15.9 15.9 17.4 15.2 19.5
Minerva-7B-instruct-v1.0 7.4B 29.6 17.0 12.2 13.9 13.9 15.0 16.9
salamandra-2b-instruct 2.3B 26.4 17.7 12.2 12.0 12.9 12.9 15.7
Open Weight
Llama-3.2-3B-Instruct 3.2B 82.2 54.0 39.9 38.8 37.5 35.9 48.1
Qwen2.5-3B-Instruct 3.1B 71.5 47.3 37.5 42.5 41.0 40.7 46.8
gemma-3-1b-it 1.0B 74.5 42.7 27.5 33.3 27.9 33.3 39.9
Llama-3.2-1B-Instruct 1.2B 64.8 43.2 25.3 29.0 24.2 26.6 35.5
Qwen3-1.7B 1.7B 48.4 27.4 8.9 10.3 13.1 9.1 19.5

Key insight: While some models score higher on English-only IFEval, Villanova-2B-2603 delivers the most balanced multilingual instruction following, with MARCO scores of 40-44 across DE, ES, FR, IT. This is far ahead of OLMo (19-25) and Gemma (27-33) on non-English languages.

Safety (M-ALERT)

Villanova-2B-2603 is the #1 fully open model for safety. Safety was evaluated using the M-ALERT benchmark across all five languages.

Model Size EN DE ES FR IT Avg
Fully Open
Villanova-2B-2603 2.4B 31.0 4.1 56.0 62.2 44.2 39.5
Minerva-7B-instruct-v1.0 7.4B 31.6 4.3 26.9 24.8 62.9 30.1
OLMo-2-0425-1B-Instruct 1.2B 58.0 5.7 13.4 10.7 9.1 19.4
salamandra-2b-instruct 2.3B 4.9 3.0 15.6 15.4 9.2 9.6
EuroLLM-1.7B-Instruct 1.7B 5.4 0.8 2.6 8.4 1.7 3.8
Open Weight
Llama-3.2-3B-Instruct 3.2B 54.5 26.4 70.3 63.3 69.4 56.8
Llama-3.2-1B-Instruct 1.2B 47.1 32.9 67.4 68.6 67.2 56.6
Qwen2.5-3B-Instruct 3.1B 60.2 23.2 71.7 64.0 54.4 54.7
gemma-3-1b-it 1.0B 58.6 28.7 58.8 68.4 53.3 53.6
Qwen3-1.7B 1.7B 10.2 0.0 0.5 0.8 1.3 2.6

Reasoning & Question Answering

Model Size BBH LB-BBH GSM8K DROP TruthfulQA Avg Reasoning Avg QA
Fully Open
Minerva-7B-instruct-v1.0 7.4B 29.0 30.0 10.6 29.2 29.6 27.1 36.2
OLMo-2-0425-1B-Instruct 1.2B 27.6 33.8 67.4 30.2 33.8 38.7 35.6
Villanova-2B-2603 2.4B 29.3 33.2 23.4 34.8 28.5 31.0 33.1
salamandra-2b-instruct 2.3B 22.5 29.2 2.3 20.6 27.8 23.6 26.6
EuroLLM-1.7B-Instruct 1.7B 28.5 29.8 12.7 22.2 29.2 26.0 24.7
Open Weight
Llama-3.2-3B-Instruct 3.2B 59.3 44.6 77.2 48.3 36.1 51.2 48.1
Qwen2.5-3B-Instruct 3.1B 12.2 46.9 76.0 12.5 41.4 39.4 35.8
Qwen3-1.7B 1.7B 9.8 43.5 74.2 34.4 29.6 37.4 37.5
Llama-3.2-1B-Instruct 1.2B 39.3 35.7 45.6 31.8 28.9 37.5 38.1
gemma-3-1b-it 1.0B 25.0 35.1 34.0 21.1 26.6 28.5 27.0

Improvement over Previous Release

Villanova-2B-2603 represents a major leap over our previous model (Villanova-2B-2512-Preview):

Category 2512-Preview 2603 Improvement
Overall 23.3 36.9 +58%
Instruction Following 28.9 45.1 +56%
Safety 2.4 39.5 +1546%
Reasoning 27.5 31.0 +13%
QA 29.0 33.1 +14%

License

This model is released under the Apache 2.0 License.

Downloads last month
162
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VillanovaAI/Villanova-2B-2603

Finetuned
(2)
this model
Quantizations
2 models

Dataset used to train VillanovaAI/Villanova-2B-2603

Collection including VillanovaAI/Villanova-2B-2603