Instructions to use m-beps/llama31-8b-finetune-multit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use m-beps/llama31-8b-finetune-multit with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("DeepMount00/Llama-3.1-8b-ITA") model = PeftModel.from_pretrained(base_model, "m-beps/llama31-8b-finetune-multit") - Transformers
How to use m-beps/llama31-8b-finetune-multit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="m-beps/llama31-8b-finetune-multit") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("m-beps/llama31-8b-finetune-multit", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use m-beps/llama31-8b-finetune-multit with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "m-beps/llama31-8b-finetune-multit" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "m-beps/llama31-8b-finetune-multit", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/m-beps/llama31-8b-finetune-multit
- SGLang
How to use m-beps/llama31-8b-finetune-multit with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "m-beps/llama31-8b-finetune-multit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "m-beps/llama31-8b-finetune-multit", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "m-beps/llama31-8b-finetune-multit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "m-beps/llama31-8b-finetune-multit", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use m-beps/llama31-8b-finetune-multit with Docker Model Runner:
docker model run hf.co/m-beps/llama31-8b-finetune-multit
Llama 3.1 8B Ita โ Italian Cultural Alignment [V1]
Llama 3.1 8B Ita [V1] is a LoRA adapter fine-tuned on top of DeepMount00/Llama-3.1-8b-ITA to improve Italian cultural alignment. It was trained on the Mult-IT dataset and evaluated on the ITALIC benchmark. Unlike Qwen3, Llama 3.1 is a standard causal language model without a hybrid reasoning architecture, so no thinking-mode considerations apply.
Author: Maruf Bepary, King's College London
Research report: Alignment in Large Language Models
Model Summary
| Property | Value |
|---|---|
| Base model | DeepMount00/Llama-3.1-8b-ITA |
| PEFT type | LoRA |
| Task | Causal language modelling (Italian Q&A / instruction following) |
| Training dataset | Mult-IT (~86,929 samples) |
| Evaluation benchmark | ITALIC (10,000 questions) |
| ITALIC accuracy (V1) | 73.91% (+3.42 pp over baseline) |
| Trainable parameters | See research report |
Intended Use
This model is intended for:
- Italian language understanding โ multiple-choice Q&A, cultural knowledge, and general instruction following in Italian.
- Research โ comparing the effect of SFT on Italian cultural alignment across model families.
- Benchmarking โ comparing Italian-specific models against multilingual and fine-tuned baselines.
Not recommended for:
- High-stakes or safety-critical applications.
- Languages other than Italian.
Key Finding โ Cultural Alignment
Training on the Italian cultural Q&A dataset (Mult-IT) improves performance across almost all ITALIC categories:
| Metric | Baseline | V1 | Delta |
|---|---|---|---|
| Total | 70.49% | 73.91% | +3.42 pp |
| Culture | 72.96% | 75.45% | +2.49 pp |
| Language | 66.83% | 71.63% | +4.80 pp |
Language competence improved more than culture knowledge. The largest gains were in Synonyms (+8.76 pp), Morphology (+8.29 pp), Orthography (+7.03 pp), and Civic (+6.07 pp). Events remained flat (0.00 pp change). As Llama 3.1 does not have a hybrid reasoning architecture, fine-tuning carries no risk of reasoning-mode degradation.
Training Details
LoRA Configuration
| Parameter | Value |
|---|---|
LoRA rank (r) |
24 |
| LoRA alpha | 48 |
| LoRA dropout | 0.1 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Bias | none |
Training Hyperparameters
| Parameter | Value |
|---|---|
| Sequence packing | Yes (max 2,048 tokens per slot) |
| Max sequence length | 2,048 tokens |
Note: full training hyperparameters are detailed in the research report.
Framework & Hardware
| Component | Version / Spec |
|---|---|
| TRL | 0.21.0 |
| PEFT | 0.17.0 |
| Transformers | 4.55.0 |
| PyTorch | 2.5.1+cu121 |
| Hardware | NVIDIA GeForce RTX 3090 |
Training Dataset โ Mult-IT
- Dataset: Mult-IT โ Multiple Choice Questions on Multiple Topics in Italian
- Source: CALAMITA Shared Task @ CLiC-it 2024
- Language: Italian
- Size: ~86,929 training samples
- Format: JSONL, multiple-choice Q&A
- Reference: Mult-IT: Multiple Choice Questions on Multiple Topics in Italian (2024)
ITALIC Benchmark Results
Benchmark: ITALIC (NAACL 2025) โ Italian Culture-Aware Natural Language Benchmark
Format: Zero-shot, multiple-choice (12 categories, 10,000 questions)
System prompt: "Sei un assistente utile."
V1 vs Baseline
| Category | Baseline | V1 | ฮ |
|---|---|---|---|
| Art | 70.10 | 71.31 | +1.21 |
| Civic | 71.22 | 77.29 | +6.07 |
| Events | 82.61 | 82.61 | 0.00 |
| Geography | 79.26 | 80.90 | +1.64 |
| History | 77.40 | 79.28 | +1.88 |
| Literature | 67.17 | 71.24 | +4.07 |
| Tourism | 71.73 | 72.04 | +0.31 |
| Lexicon | 81.51 | 83.76 | +2.25 |
| Morphology | 52.14 | 60.43 | +8.29 |
| Orthography | 53.04 | 60.07 | +7.03 |
| Synonyms | 81.15 | 89.91 | +8.76 |
| Syntax | 53.65 | 54.31 | +0.66 |
| Culture (subtotal) | 72.96 | 75.45 | +2.49 |
| Language (subtotal) | 66.83 | 71.63 | +4.80 |
| Total | 70.49 | 73.91 | +3.42 |
Comparison with Other Models (ITALIC Total)
| Model | Total | Parameters |
|---|---|---|
| Llama 3.1 70B | 83.61% | 70B |
| GPT-4o Mini | 82.22% | ~8B |
| Magistral Small (No Thinking) | 76.06% | 24B |
| Qwen3 8B (No Thinking) [V3] | 73.81% | 8B |
| Qwen3 8B (No Thinking) [V1] | 73.77% | 8B |
| Llama 3.1 8B Ita [V1] | 73.91% | 8B |
| Qwen3 8B (No Thinking) baseline | 70.17% | 8B |
| Llama 3.1 8B Ita (baseline) | 70.49% | 8B |
| LLaMAntino-3 8B | 68.37% | 8B |
| Llama 3.1 8B | 66.38% | 8B |
All scores evaluated under identical zero-shot conditions on the ITALIC benchmark.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model_id = "DeepMount00/Llama-3.1-8b-ITA"
adapter_id = "maruf-bepary/llama-3.1-8b-ita-italian-v1"
# Load tokeniser and base model
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()
# Example: Italian multiple-choice question
messages = [
{"role": "system", "content": "Sei un assistente utile."},
{
"role": "user",
"content": (
"Qual รจ la capitale d'Italia?\n"
"A) Milano\nB) Roma\nC) Napoli\nD) Torino\n\n"
"Rispondi con la lettera della risposta corretta."
),
},
]
# Apply LLaMA-3 chat template
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=64,
do_sample=False,
temperature=None,
top_p=None,
)
response = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[-1]:],
skip_special_tokens=True,
)
print(response)
# Expected output: "B"
Limitations
- Morphology (60.43%) and Syntax (54.31%) remain the weakest categories despite improvement.
- Benchmark scope โ evaluation was conducted solely on ITALIC; performance on other Italian benchmarks is unverified.
- Single-GPU training โ training used one RTX 3090; multi-GPU configurations may yield different results.
- Dataset bias โ Mult-IT is a multiple-choice dataset; generalisation to open-ended Italian generation tasks is unverified.
- Events category showed no improvement (0.00 pp), suggesting the training data may lack current-events coverage.
References
Related resources:
- Research report: Alignment in Large Language Models
- Base model: DeepMount00/Llama-3.1-8b-ITA
- ITALIC benchmark: RiTA-nlp/ITALIC
- Mult-IT dataset: sapienzanlp/Mult-IT
- PEFT documentation: huggingface.co/docs/peft
- Downloads last month
- -
Model tree for m-beps/llama31-8b-finetune-multit
Base model
meta-llama/Llama-3.1-8B