AiXPA Fine-tuned Llama 3.1 8B Model (No Ground Document)
This model is a fine-tuned version of Meta-Llama-3.1-8B-Instruct, specialized for the AiXPA project in the domain of Italian Public Administration (PA). It was trained using supervised fine-tuning (SFT) with LoRA (Low-Rank Adaptation) techniques on a dialogue dataset between an assistant and a PA user, without reference documents as context.
Model Details
Model Description
This model is based on Meta-Llama-3.1-8B-Instruct and has been fine-tuned using the Stefano-M-Community/final_all_no_ground dataset for Italian Public Administration dialogue tasks. The model uses 4-bit quantization and LoRA adapters for efficient training and inference, making it suitable for deployment on consumer hardware while maintaining strong performance in PA-specific conversations without reference documents as context.
- Developed by: LanD (FBK)
- Model type: Causal Language Model (Fine-tuned)
- Language(s) (NLP): Italian (primarily)
- License: Please refer to the original Llama 3.1 license
- Finetuned from model: meta-llama/Meta-Llama-3.1-8B-Instruct
Model Sources [optional]
- Repository: [More Information Needed]
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
Direct Use
This model can be used directly for text generation tasks, particularly those related to the domain it was fine-tuned on. The model maintains the instruction-following capabilities of the base Llama 3.1 model while being specialized for specific use cases defined in the training dataset. This variant is particularly suited for scenarios where reference documents are not available as context.
Downstream Use
The model can be further fine-tuned for specific tasks or integrated into larger applications that require text generation capabilities. The LoRA adapters make it easy to switch between different specialized versions. This variant may be particularly useful for applications that need to operate without reference ground truth data.
Out-of-Scope Use
This model should not be used for generating harmful, misleading, or inappropriate content. It may not perform well on tasks significantly different from its training domain without additional fine-tuning. The model is specifically designed for scenarios without ground truth, so it may not be optimal for tasks that heavily rely on reference data.
Bias, Risks, and Limitations
This model inherits the biases and limitations present in the base Llama 3.1 model and may have additional biases introduced through the fine-tuning dataset. Key considerations include:
- Domain Specificity: The model has been fine-tuned on a specific dataset and may not generalize well to domains outside its training scope
- No Ground Document Dependency: This variant is trained without reference documents as context, which may affect its performance on tasks requiring document-based evaluation
- Quantization Effects: 4-bit quantization may introduce minor degradation in model performance compared to full precision
- Context Limitations: Maximum context length of 4,200 tokens may limit performance on very long documents
- Language Bias: Primarily trained on Italian content, may have limited performance in other languages
Recommendations
- Thoroughly evaluate the model on your specific use case before deployment
- Consider the potential for biased outputs and implement appropriate safeguards
- Monitor model performance and outputs in production environments
- Be aware of the model's training domain when applying to new tasks
- Consider additional fine-tuning for specialized applications outside the training domain
- This variant is particularly suitable for scenarios where reference documents are not available as context
How to Get Started with the Model
Use the code below to get started with the model:
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load the base model and tokenizer
base_model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.float16,
device_map="auto"
)
# Load the LoRA adapter
model = PeftModel.from_pretrained(base_model, "Stefano-M-Community/aixpa_no_ground")
# Generate text
prompt = "Ciao, mi aiuti a scrivere un'azione sullo sport?"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Details
Training Data
The model was fine-tuned on the Stefano-M-Community/final_all_no_ground dataset from Hugging Face, which contains Italian Public Administration dialogue data between an assistant and PA users without reference documents. This dataset was used for both training and evaluation.
Training Procedure
The model was trained using supervised fine-tuning (SFT) with LoRA (Low-Rank Adaptation) techniques. The training utilized 4-bit quantization for memory efficiency and multi-GPU training with 4 processes.
Training Hyperparameters
- Training regime: Mixed precision training with 4-bit quantization
- LoRA Configuration:
- Rank: 16
- Alpha: 32
- Dropout: 0.0
- Sequence Length: 4,200 tokens
- Learning Rate: 5e-5
- Scheduler: Cosine annealing
- Batch Size: 4 (training), 1 (evaluation)
- Gradient Accumulation Steps: 2
- Number of Epochs: 10
- Weight Decay: 0.01
- Warmup Ratio: 0.03
- Early Stopping Patience: 5 epochs
Training Infrastructure
- Hardware: Multi-GPU setup (4 processes)
- Framework:
- Accelerate for distributed training
- DeepSpeed for optimization
- PEFT for LoRA implementation
- Logging: Weights & Biases (WandB)
- Evaluation Frequency: Every 35 steps
- Checkpoint Saving: Every 35 steps
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model was evaluated using the same dataset used for training: Stefano-M-Community/final_all_no_ground. Evaluation was performed every 35 training steps to monitor training progress and prevent overfitting.
Factors
- Training Progress: Monitored throughout training with early stopping patience of 5 epochs
- Loss Metrics: Custom loss function implementation for supervised fine-tuning
- Computational Efficiency: Evaluated performance with 4-bit quantization
- No Ground Document Scenarios: Specialized evaluation for scenarios without reference documents as context
Metrics
- Training Loss: Monitored during training with logging every 10 steps
- Evaluation Loss: Computed every 35 steps on the evaluation dataset
- Early Stopping: Implemented with patience of 5 epochs to prevent overfitting
Results
Evaluation results are logged in Weights & Biases during training. The model was trained for up to 10 epochs with early stopping mechanism to ensure optimal performance without overfitting.
Evaluation Loss Performance:
- The model (purple line in eval/loss graph) shows a rapid decrease from ~1.23 at step 0 to ~0.86 around step 18-20
- Minimum loss achieved: approximately 0.86 around step 18-20
- Loss then increases to ~0.97-0.98 between steps 35-40, and ~1.03 at step 43
- The model shows signs of overfitting after the minimum point, which is typical for this training approach
Summary
The fine-tuned model demonstrates improved performance on Italian Public Administration dialogue tasks while maintaining the general capabilities of the base Llama 3.1 model. The LoRA adaptation approach allows for efficient fine-tuning while preserving most of the original model's knowledge. This variant is specifically optimized for PA conversations without reference documents as context.
Model Examination
The model uses LoRA (Low-Rank Adaptation) which allows for parameter-efficient fine-tuning. This approach:
- Preserves the original model weights while adding small adapter modules
- Enables efficient switching between different task-specific adaptations
- Reduces memory requirements during training and inference
- Maintains interpretability by keeping the base model architecture intact
- This variant is specifically designed for Italian language tasks without reference documents as context
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
The environmental impact of this model is reduced compared to training from scratch due to:
- Efficient Training: LoRA adaptation requires significantly less compute than full model training
- 4-bit Quantization: Reduces memory usage and energy consumption during training
- Hardware Type: Multi-GPU setup (specific hardware configuration may vary)
- Training Approach: Parameter-efficient fine-tuning reduces overall computational requirements
Note: Specific carbon emission calculations would require detailed hardware specifications and training duration measurements.
Technical Specifications
Model Architecture and Objective
- Base Architecture: Llama 3.1 (8B parameters)
- Adaptation Method: LoRA (Low-Rank Adaptation)
- Objective: Supervised Fine-tuning for Italian Public Administration dialogue tasks without reference documents as context
- Quantization: 4-bit quantization for efficient training and inference
- Maximum Context Length: 4,200 tokens
Compute Infrastructure
Hardware
- Training Setup: Multi-GPU configuration (4 processes)
- Memory Optimization: 4-bit quantization with LoRA adapters
- Distributed Training: Accelerate framework for multi-GPU coordination
Software
- Framework: PyTorch with Transformers library
- Training Libraries:
- PEFT 0.17.1 (Parameter-Efficient Fine-Tuning)
- Accelerate (distributed training)
- DeepSpeed (optimization)
- TRL (Transformer Reinforcement Learning)
- Monitoring: Weights & Biases (WandB)
- Configuration Management: DeepSpeed configuration for memory optimization
Citation
BibTeX:
@misc{aixpa_llama31_8b_lora_no_ground,
title={AiXPA Fine-tuned Llama 3.1 8B Model (No Ground Document)},
author={LanD (FBK)},
year={2025},
howpublished={Hugging Face Model Repository},
note={Fine-tuned from meta-llama/Meta-Llama-3.1-8B-Instruct using LoRA, trained on Italian Public Administration dialogue data without reference documents}
}
APA:
LanD (FBK). (2025). AiXPA Fine-tuned Llama 3.1 8B Model (No Ground Document). Hugging Face Model Repository. Fine-tuned from meta-llama/Meta-Llama-3.1-8B-Instruct using LoRA, trained on Italian Public Administration dialogue data without reference documents.
Glossary
- LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning technique that adds trainable low-rank matrices to existing model weights
- SFT (Supervised Fine-Tuning): Training method using labeled data to improve model performance on specific tasks
- 4-bit Quantization: Technique to reduce model memory usage by representing weights with 4-bit precision
- Multi-GPU Training: Distributed training approach using multiple GPUs to accelerate training
- No Ground Document: Training approach that does not rely on reference documents as context
Model Card Authors
LanD (FBK)
Model Card Contact
For questions or issues regarding this model, please contact the LanD (FBK) team through the appropriate channels.
Framework versions
- PEFT 0.17.1
- Downloads last month
- 3
Model tree for Stefano-M-Community/aixpa_no_ground
Base model
meta-llama/Llama-3.1-8B