Villanova-2B-Base-2512-Preview / README.md

Update README.md

c9e203a verified 19 days ago

5.29 kB

	---
	license: apache-2.0
	language:
	- en
	- de
	- es
	- fr
	- it
	pipeline_tag: text-generation
	library_name: transformers
	---

	# Model Card for Villanova-2B-Base-2512-Preview

	<img src="https://cdn-uploads.huggingface.co/production/uploads/6426a5c798a5be164d38ae44/aULFKOno-4UaH1Le3nkGY.webp" alt="Villanova.AI logo" width="96"/>

	Villanova is a family of multilingual and multimodal Large Language Models (LLMs).
	[VillanovaAI/Villanova-2B-Base-2512-Preview](https://huggingface.co/VillanovaAI/Villanova-2B-Base-2512-Preview) is a base text-only LLM.

	> [!WARNING]
	> DISCLAIMER: This model is a preview.
	>

	## Model Summary
	Villanova-2B-Base-2512-Preview is a decoder-only transformer of 2B parameters.

	Villanova-2B-Base-2512-Preview was pre-trained from scratch on 2.2 trillion tokens drawn from a curated, high-quality corpus, in a two-stage fashion.

	It supports 5 languages: English, Italian, Spanish, French and German.


	Stage 1 (0T → 2T tokens)

	Broad, diverse multilingual data mixture with primary focus on the five core languages of the Villanova project.

	Stage 2 (2T → 2.2T tokens)

	Cosine annealing learning rate schedule over a mixture of 200B higher-quality tokens.


	## How to Use
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "VillanovaAI/Villanova-2B-Base-2512-Preview"
	device = "cuda" # for GPU usage or "cpu" for CPU usage

	# load the tokenizer and the model
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

	# prepare the model input
	prompt = "What is gravity?"
	model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

	# Generate the output
	generated_ids = model.generate(**model_inputs, max_new_tokens=128, do_sample=True, temperature=0.7)

	# Get and decode the output
	output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
	print(tokenizer.decode(output_ids, skip_special_tokens=True))
	```
	## Evaluation
	Overall performance of Villanova-2B-Base-2512-Preview on English and Multilingual Benchmarks.

	<img src="https://cdn-uploads.huggingface.co/production/uploads/6426a5c798a5be164d38ae44/F2Epnvtr34egkz8hKEQ4T.png" alt="Model size/performance" width="672"/>


	Detailed results are enlisted in the following tables.

	Global evaluation:

	\| Model \| Training Tokens (T) \| Average \| arc_easy \| hellaswag \| hellaswag_de \| hellaswag_es \| hellaswag_fr \| hellaswag_it \| openbookqa \| piqa \| sciq \| winogrande \| xcopa_it \| xnli_de \| xnli_en \| xnli_es \| xnli_fr \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| Minerva-3B-base-v1.0 \| 0.66 \| 47.20 \| 62.33 \| 46.28 \| 27.20 \| 29.69 \| 29.02 \| 40.01 \| 24.60 \| 74.27 \| 88.00 \| 56.75 \| 69.60 \| 34.54 \| 52.13 \| 36.31 \| 37.35 \|
	\| EuroLLM-1.7B \| 4 \| 52.35 \| 69.07 \| 45.04 \| 37.97 \| 40.98 \| 40.05 \| 39.46 \| 29.80 \| 72.20 \| 90.60 \| 61.25 \| 66.00 \| 47.99 \| 50.24 \| 45.58 \| 49.00 \|
	\| OLMo-2-0425-1B \| 4 \| 49.15 \| 72.73 \| 50.79 \| 29.79 \| 31.34 \| 32.60 \| 29.19 \| 30.00 \| 75.95 \| 95.30 \| 64.72 \| 52.60 \| 40.00 \| 51.77 \| 37.63 \| 42.89 \|
	\| salamandra-2b \| 13 \| 52.90 \| 71.04 \| 47.19 \| 38.01 \| 42.07 \| 40.60 \| 38.56 \| 26.80 \| 72.69 \| 91.90 \| 61.72 \| 65.40 \| 47.79 \| 51.97 \| 49.08 \| 48.67 \|
	\| Qwen3-1.7B-Base \| - \| 53.32 \| 73.61 \| 49.29 \| 37.54 \| 40.73 \| 39.27 \| 38.45 \| 30.20 \| 75.90 \| 95.80 \| 64.01 \| 64.20 \| 46.47 \| 54.50 \| 44.06 \| 45.78 \|
	\| Villanova-2B-Base-2512-Preview \| 2.2 \| 55.25 \| 75.13 \| 48.57 \| 42.06 \| 45.72 \| 44.62 \| 43.32 \| 26.60 \| 75.08 \| 94.40 \| 61.96 \| 68.40 \| 49.36 \| 52.21 \| 49.04 \| 52.33 \|

	English only:

	\| Model \| Average \| arc_easy \| hellaswag \| openbookqa \| piqa \| sciq \| winogrande \| xnli_en \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| Minerva-3B-base-v1.0 \| 57.76 \| 62.33 \| 46.28 \| 24.60 \| 74.27 \| 88.00 \| 56.75 \| 52.13 \|
	\| EuroLLM-1.7B \| 59.74 \| 69.07 \| 45.04 \| 29.80 \| 72.20 \| 90.60 \| 61.25 \| 50.24 \|
	\| OLMo-2-0425-1B \| 63.04 \| 72.73 \| 50.79 \| 30.00 \| 75.95 \| 95.30 \| 64.72 \| 51.77 \|
	\| salamandra-2b \| 60.47 \| 71.04 \| 47.19 \| 26.80 \| 72.69 \| 91.90 \| 61.72 \| 51.97 \|
	\| Qwen3-1.7B-Base \| 63.33 \| 73.61 \| 49.29 \| 30.20 \| 75.90 \| 95.80 \| 64.01 \| 54.50 \|
	\| Villanova-2B-Base-2512-Preview \| 61.99 \| 75.13 \| 48.57 \| 26.60 \| 75.08 \| 94.40 \| 61.96 \| 52.21 \|

	Multilingual Benchmarks:

	\| Model \| Average \| hellaswag_de \| hellaswag_es \| hellaswag_fr \| hellaswag_it \| xcopa_it \| xnli_de \| xnli_es \| xnli_fr \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| Minerva-3B-base-v1.0 \| 37.96 \| 27.20 \| 29.69 \| 29.02 \| 40.01 \| 69.60 \| 34.54 \| 36.31 \| 37.35 \|
	\| EuroLLM-1.7B \| 45.88 \| 37.97 \| 40.98 \| 40.05 \| 39.46 \| 66.00 \| 47.99 \| 45.58 \| 49.00 \|
	\| OLMo-2-0425-1B \| 37.01 \| 29.79 \| 31.34 \| 32.60 \| 29.19 \| 52.60 \| 40.00 \| 37.63 \| 42.89 \|
	\| salamandra-2b \| 46.27 \| 38.01 \| 42.07 \| 40.60 \| 38.56 \| 65.40 \| 47.79 \| 49.08 \| 48.67 \|
	\| Qwen3-1.7B-Base \| 44.56 \| 37.54 \| 40.73 \| 39.27 \| 38.45 \| 64.20 \| 46.47 \| 44.06 \| 45.78 \|
	\| Villanova-2B-Base-2512-Preview \| 49.36 \| 42.06 \| 45.72 \| 44.62 \| 43.32 \| 68.40 \| 49.36 \| 49.04 \| 52.33 \|