Model Card for Villanova-2B-2603
Villanova-2B-2603 is a fully open, multilingual instruction-tuned Large Language Model developed by Villanova.AI. Part of the Villanova project, it is designed to advance open European language technology with native support for five European languages. All model weights, training data sources, and training details are publicly released.
Built on top of Villanova-2B-Base-2603 — a 2.4B-parameter model pretrained from scratch — this instruction-tuned model offers strong multilingual instruction following and safety alignment under a fully open Apache 2.0 license.
Model Family
Villanova-2B-Base-2603 — Base model (4.4T)
↳ Villanova-2B-2603 — SFT / Instruct — 📍 This model
↳ Villanova-2B-2603-GGUF — Quantized
↳ Villanova-2B-VL-2603 — Vision-Language Instruct
↳ Villanova-2B-VL-2603-GGUF — Quantized
Villanova-2B-Base-2512-Preview — Base model (2.2T) (previous version, not recommended)
↳ Villanova-2B-2512-Preview — SFT / Instruct (previous version, not recommended)
Highlights
- European-focused, fully open model released under Apache 2.0
- Native multilingual support for 5 European languages: English, French, German, Italian, and Spanish
- Strong instruction following, competitive with larger commercial models
- Robust multilingual safety alignment across all supported languages
- +58% overall improvement over our previous release (Villanova-2B-2512-Preview)
- Only 2B parameters, efficient enough for edge and on-device deployment
Model Summary
| Architecture | Decoder-only Transformer (LLaMA-based) |
| Parameters | 2.4B |
| Base Model | VillanovaAI/Villanova-2B-Base-2603 (pretrained from scratch) |
| Pre-training Data | 4.4T tokens (multilingual, two-stage) |
| Fine-tuning Data | VillanovaAI/villanova-sft-2603 |
| Languages | English, French, German, Italian, Spanish |
| Context Length | 32,768 tokens |
| Precision | bfloat16 |
| License | Apache 2.0 |
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "VillanovaAI/Villanova-2B-2603"
device = "cuda" # or "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
messages = [
{"role": "user", "content": "Explain quantum entanglement in simple terms."}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([input_text], return_tensors="pt").to(device)
generated_ids = model.generate(**model_inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
Evaluation
Villanova-2B-2603 was extensively evaluated across 25 benchmarks covering Reasoning, Question Answering, Safety, and Instruction Following in both English and multilingual settings. All evaluations were performed using identical settings and prompts for fair comparison.
Tables are sorted by the main metric (descending). Models are grouped into Fully Open and Open Weight categories.
Overall Performance
Villanova-2B-2603 is the #1 fully open model in overall average across all benchmarks.
| Model | Size | Reasoning | QA | Safety | Instr. Follow | Overall |
|---|---|---|---|---|---|---|
| Fully Open | ||||||
| Villanova-2B-2603 | 2.4B | 31.0 | 33.1 | 39.5 | 45.1 | 36.9 |
| OLMo-2-0425-1B-Instruct | 1.2B | 38.7 | 35.6 | 19.4 | 39.3 | 33.9 |
| Minerva-7B-instruct-v1.0 | 7.4B | 27.1 | 36.2 | 30.1 | 16.9 | 28.5 |
| EuroLLM-1.7B-Instruct | 1.7B | 26.0 | 24.7 | 3.8 | 19.5 | 19.5 |
| salamandra-2b-instruct | 2.3B | 23.6 | 26.6 | 9.6 | 15.7 | 20.0 |
| Open Weight | ||||||
| Llama-3.2-3B-Instruct | 3.2B | 51.2 | 48.1 | 56.8 | 48.1 | 50.4 |
| Qwen2.5-3B-Instruct | 3.1B | 39.4 | 35.8 | 54.7 | 46.8 | 42.9 |
| Llama-3.2-1B-Instruct | 1.2B | 37.5 | 38.1 | 56.6 | 35.5 | 41.1 |
| gemma-3-1b-it | 1.0B | 28.5 | 27.0 | 53.6 | 39.9 | 35.7 |
| Qwen3-1.7B | 1.7B | 37.4 | 37.5 | 2.6 | 19.5 | 26.2 |
Instruction Following
Villanova-2B-2603 is the #1 fully open model for instruction following, and is competitive with larger open weight models. The MARCO benchmark evaluates structured instruction following across all five languages.
| Model | Size | IFEval | MARCO-EN | MARCO-DE | MARCO-ES | MARCO-FR | MARCO-IT | Avg |
|---|---|---|---|---|---|---|---|---|
| Fully Open | ||||||||
| Villanova-2B-2603 | 2.4B | 62.0 | 39.4 | 40.5 | 44.2 | 42.5 | 42.1 | 45.1 |
| OLMo-2-0425-1B-Instruct | 1.2B | 77.9 | 52.9 | 23.1 | 29.0 | 27.9 | 24.9 | 39.3 |
| EuroLLM-1.7B-Instruct | 1.7B | 34.5 | 18.3 | 15.9 | 15.9 | 17.4 | 15.2 | 19.5 |
| Minerva-7B-instruct-v1.0 | 7.4B | 29.6 | 17.0 | 12.2 | 13.9 | 13.9 | 15.0 | 16.9 |
| salamandra-2b-instruct | 2.3B | 26.4 | 17.7 | 12.2 | 12.0 | 12.9 | 12.9 | 15.7 |
| Open Weight | ||||||||
| Llama-3.2-3B-Instruct | 3.2B | 82.2 | 54.0 | 39.9 | 38.8 | 37.5 | 35.9 | 48.1 |
| Qwen2.5-3B-Instruct | 3.1B | 71.5 | 47.3 | 37.5 | 42.5 | 41.0 | 40.7 | 46.8 |
| gemma-3-1b-it | 1.0B | 74.5 | 42.7 | 27.5 | 33.3 | 27.9 | 33.3 | 39.9 |
| Llama-3.2-1B-Instruct | 1.2B | 64.8 | 43.2 | 25.3 | 29.0 | 24.2 | 26.6 | 35.5 |
| Qwen3-1.7B | 1.7B | 48.4 | 27.4 | 8.9 | 10.3 | 13.1 | 9.1 | 19.5 |
Key insight: While some models score higher on English-only IFEval, Villanova-2B-2603 delivers the most balanced multilingual instruction following, with MARCO scores of 40-44 across DE, ES, FR, IT. This is far ahead of OLMo (19-25) and Gemma (27-33) on non-English languages.
Safety (M-ALERT)
Villanova-2B-2603 is the #1 fully open model for safety. Safety was evaluated using the M-ALERT benchmark across all five languages.
| Model | Size | EN | DE | ES | FR | IT | Avg |
|---|---|---|---|---|---|---|---|
| Fully Open | |||||||
| Villanova-2B-2603 | 2.4B | 31.0 | 4.1 | 56.0 | 62.2 | 44.2 | 39.5 |
| Minerva-7B-instruct-v1.0 | 7.4B | 31.6 | 4.3 | 26.9 | 24.8 | 62.9 | 30.1 |
| OLMo-2-0425-1B-Instruct | 1.2B | 58.0 | 5.7 | 13.4 | 10.7 | 9.1 | 19.4 |
| salamandra-2b-instruct | 2.3B | 4.9 | 3.0 | 15.6 | 15.4 | 9.2 | 9.6 |
| EuroLLM-1.7B-Instruct | 1.7B | 5.4 | 0.8 | 2.6 | 8.4 | 1.7 | 3.8 |
| Open Weight | |||||||
| Llama-3.2-3B-Instruct | 3.2B | 54.5 | 26.4 | 70.3 | 63.3 | 69.4 | 56.8 |
| Llama-3.2-1B-Instruct | 1.2B | 47.1 | 32.9 | 67.4 | 68.6 | 67.2 | 56.6 |
| Qwen2.5-3B-Instruct | 3.1B | 60.2 | 23.2 | 71.7 | 64.0 | 54.4 | 54.7 |
| gemma-3-1b-it | 1.0B | 58.6 | 28.7 | 58.8 | 68.4 | 53.3 | 53.6 |
| Qwen3-1.7B | 1.7B | 10.2 | 0.0 | 0.5 | 0.8 | 1.3 | 2.6 |
Reasoning & Question Answering
| Model | Size | BBH | LB-BBH | GSM8K | DROP | TruthfulQA | Avg Reasoning | Avg QA |
|---|---|---|---|---|---|---|---|---|
| Fully Open | ||||||||
| Minerva-7B-instruct-v1.0 | 7.4B | 29.0 | 30.0 | 10.6 | 29.2 | 29.6 | 27.1 | 36.2 |
| OLMo-2-0425-1B-Instruct | 1.2B | 27.6 | 33.8 | 67.4 | 30.2 | 33.8 | 38.7 | 35.6 |
| Villanova-2B-2603 | 2.4B | 29.3 | 33.2 | 23.4 | 34.8 | 28.5 | 31.0 | 33.1 |
| salamandra-2b-instruct | 2.3B | 22.5 | 29.2 | 2.3 | 20.6 | 27.8 | 23.6 | 26.6 |
| EuroLLM-1.7B-Instruct | 1.7B | 28.5 | 29.8 | 12.7 | 22.2 | 29.2 | 26.0 | 24.7 |
| Open Weight | ||||||||
| Llama-3.2-3B-Instruct | 3.2B | 59.3 | 44.6 | 77.2 | 48.3 | 36.1 | 51.2 | 48.1 |
| Qwen2.5-3B-Instruct | 3.1B | 12.2 | 46.9 | 76.0 | 12.5 | 41.4 | 39.4 | 35.8 |
| Qwen3-1.7B | 1.7B | 9.8 | 43.5 | 74.2 | 34.4 | 29.6 | 37.4 | 37.5 |
| Llama-3.2-1B-Instruct | 1.2B | 39.3 | 35.7 | 45.6 | 31.8 | 28.9 | 37.5 | 38.1 |
| gemma-3-1b-it | 1.0B | 25.0 | 35.1 | 34.0 | 21.1 | 26.6 | 28.5 | 27.0 |
Improvement over Previous Release
Villanova-2B-2603 represents a major leap over our previous model (Villanova-2B-2512-Preview):
| Category | 2512-Preview | 2603 | Improvement |
|---|---|---|---|
| Overall | 23.3 | 36.9 | +58% |
| Instruction Following | 28.9 | 45.1 | +56% |
| Safety | 2.4 | 39.5 | +1546% |
| Reasoning | 27.5 | 31.0 | +13% |
| QA | 29.0 | 33.1 | +14% |
License
This model is released under the Apache 2.0 License.
- Downloads last month
- 162