danie08 commited on
Commit
0782d09
·
1 Parent(s): 90b770d

added model and card

Browse files
README.md ADDED
@@ -0,0 +1,265 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:meta-llama/Meta-Llama-3.1-8B-Instruct
7
+ - lora
8
+ - sft
9
+ - transformers
10
+ - trl
11
+ ---
12
+
13
+ # AiXPA Fine-tuned Llama 3.1 8B Model (No Ground Document)
14
+
15
+ This model is a fine-tuned version of Meta-Llama-3.1-8B-Instruct, specialized for the AiXPA project in the domain of Italian Public Administration (PA). It was trained using supervised fine-tuning (SFT) with LoRA (Low-Rank Adaptation) techniques on a dialogue dataset between an assistant and a PA user, without reference documents as context.
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ This model is based on Meta-Llama-3.1-8B-Instruct and has been fine-tuned using the Stefano-M-Community/final_all_no_ground dataset for Italian Public Administration dialogue tasks. The model uses 4-bit quantization and LoRA adapters for efficient training and inference, making it suitable for deployment on consumer hardware while maintaining strong performance in PA-specific conversations without reference documents as context.
22
+
23
+ - **Developed by:** LanD (FBK)
24
+ - **Model type:** Causal Language Model (Fine-tuned)
25
+ - **Language(s) (NLP):** Italian (primarily)
26
+ - **License:** Please refer to the original Llama 3.1 license
27
+ - **Finetuned from model:** meta-llama/Meta-Llama-3.1-8B-Instruct
28
+
29
+ ### Model Sources [optional]
30
+
31
+ <!-- Provide the basic links for the model. -->
32
+
33
+ - **Repository:** [More Information Needed]
34
+ - **Paper [optional]:** [More Information Needed]
35
+ - **Demo [optional]:** [More Information Needed]
36
+
37
+ ## Uses
38
+
39
+ ### Direct Use
40
+
41
+ This model can be used directly for text generation tasks, particularly those related to the domain it was fine-tuned on. The model maintains the instruction-following capabilities of the base Llama 3.1 model while being specialized for specific use cases defined in the training dataset. This variant is particularly suited for scenarios where reference documents are not available as context.
42
+
43
+ ### Downstream Use
44
+
45
+ The model can be further fine-tuned for specific tasks or integrated into larger applications that require text generation capabilities. The LoRA adapters make it easy to switch between different specialized versions. This variant may be particularly useful for applications that need to operate without reference ground truth data.
46
+
47
+ ### Out-of-Scope Use
48
+
49
+ This model should not be used for generating harmful, misleading, or inappropriate content. It may not perform well on tasks significantly different from its training domain without additional fine-tuning. The model is specifically designed for scenarios without ground truth, so it may not be optimal for tasks that heavily rely on reference data.
50
+
51
+ ## Bias, Risks, and Limitations
52
+
53
+ This model inherits the biases and limitations present in the base Llama 3.1 model and may have additional biases introduced through the fine-tuning dataset. Key considerations include:
54
+
55
+ - **Domain Specificity:** The model has been fine-tuned on a specific dataset and may not generalize well to domains outside its training scope
56
+ - **No Ground Document Dependency:** This variant is trained without reference documents as context, which may affect its performance on tasks requiring document-based evaluation
57
+ - **Quantization Effects:** 4-bit quantization may introduce minor degradation in model performance compared to full precision
58
+ - **Context Limitations:** Maximum context length of 4,200 tokens may limit performance on very long documents
59
+ - **Language Bias:** Primarily trained on Italian content, may have limited performance in other languages
60
+
61
+ ### Recommendations
62
+
63
+ - Thoroughly evaluate the model on your specific use case before deployment
64
+ - Consider the potential for biased outputs and implement appropriate safeguards
65
+ - Monitor model performance and outputs in production environments
66
+ - Be aware of the model's training domain when applying to new tasks
67
+ - Consider additional fine-tuning for specialized applications outside the training domain
68
+ - This variant is particularly suitable for scenarios where reference documents are not available as context
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model:
73
+
74
+ ```python
75
+ from transformers import AutoTokenizer, AutoModelForCausalLM
76
+ from peft import PeftModel
77
+ import torch
78
+
79
+ # Load the base model and tokenizer
80
+ base_model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
81
+ tokenizer = AutoTokenizer.from_pretrained(base_model_id)
82
+ base_model = AutoModelForCausalLM.from_pretrained(
83
+ base_model_id,
84
+ torch_dtype=torch.float16,
85
+ device_map="auto"
86
+ )
87
+
88
+ # Load the LoRA adapter
89
+ model = PeftModel.from_pretrained(base_model, "Stefano-M-Community/aixpa_no_ground")
90
+
91
+ # Generate text
92
+ prompt = "Ciao, mi aiuti a scrivere un'azione sullo sport?"
93
+ inputs = tokenizer(prompt, return_tensors="pt")
94
+ with torch.no_grad():
95
+ outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7)
96
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
97
+ print(response)
98
+ ```
99
+
100
+ ## Training Details
101
+
102
+ ### Training Data
103
+
104
+ The model was fine-tuned on the `Stefano-M-Community/final_all_no_ground` dataset from Hugging Face, which contains Italian Public Administration dialogue data between an assistant and PA users without reference documents. This dataset was used for both training and evaluation.
105
+
106
+ ### Training Procedure
107
+
108
+ The model was trained using supervised fine-tuning (SFT) with LoRA (Low-Rank Adaptation) techniques. The training utilized 4-bit quantization for memory efficiency and multi-GPU training with 4 processes.
109
+
110
+ #### Training Hyperparameters
111
+
112
+ - **Training regime:** Mixed precision training with 4-bit quantization
113
+ - **LoRA Configuration:**
114
+ - Rank: 16
115
+ - Alpha: 32
116
+ - Dropout: 0.0
117
+ - **Sequence Length:** 4,200 tokens
118
+ - **Learning Rate:** 5e-5
119
+ - **Scheduler:** Cosine annealing
120
+ - **Batch Size:** 4 (training), 1 (evaluation)
121
+ - **Gradient Accumulation Steps:** 2
122
+ - **Number of Epochs:** 10
123
+ - **Weight Decay:** 0.01
124
+ - **Warmup Ratio:** 0.03
125
+ - **Early Stopping Patience:** 5 epochs
126
+
127
+ #### Training Infrastructure
128
+
129
+ - **Hardware:** Multi-GPU setup (4 processes)
130
+ - **Framework:**
131
+ - Accelerate for distributed training
132
+ - DeepSpeed for optimization
133
+ - PEFT for LoRA implementation
134
+ - **Logging:** Weights & Biases (WandB)
135
+ - **Evaluation Frequency:** Every 35 steps
136
+ - **Checkpoint Saving:** Every 35 steps
137
+
138
+ ## Evaluation
139
+
140
+ ### Testing Data, Factors & Metrics
141
+
142
+ #### Testing Data
143
+
144
+ The model was evaluated using the same dataset used for training: `Stefano-M-Community/final_all_no_ground`. Evaluation was performed every 35 training steps to monitor training progress and prevent overfitting.
145
+
146
+ #### Factors
147
+
148
+ - **Training Progress:** Monitored throughout training with early stopping patience of 5 epochs
149
+ - **Loss Metrics:** Custom loss function implementation for supervised fine-tuning
150
+ - **Computational Efficiency:** Evaluated performance with 4-bit quantization
151
+ - **No Ground Document Scenarios:** Specialized evaluation for scenarios without reference documents as context
152
+
153
+ #### Metrics
154
+
155
+ - **Training Loss:** Monitored during training with logging every 10 steps
156
+ - **Evaluation Loss:** Computed every 35 steps on the evaluation dataset
157
+ - **Early Stopping:** Implemented with patience of 5 epochs to prevent overfitting
158
+
159
+ ### Results
160
+
161
+ Evaluation results are logged in Weights & Biases during training. The model was trained for up to 10 epochs with early stopping mechanism to ensure optimal performance without overfitting.
162
+
163
+ **Evaluation Loss Performance:**
164
+
165
+ ![Evaluation Loss Curve](eval_loss_no_ground.png)
166
+
167
+ - The model (purple line in eval/loss graph) shows a rapid decrease from ~1.23 at step 0 to ~0.86 around step 18-20
168
+ - Minimum loss achieved: approximately 0.86 around step 18-20
169
+ - Loss then increases to ~0.97-0.98 between steps 35-40, and ~1.03 at step 43
170
+ - The model shows signs of overfitting after the minimum point, which is typical for this training approach
171
+
172
+ #### Summary
173
+
174
+ The fine-tuned model demonstrates improved performance on Italian Public Administration dialogue tasks while maintaining the general capabilities of the base Llama 3.1 model. The LoRA adaptation approach allows for efficient fine-tuning while preserving most of the original model's knowledge. This variant is specifically optimized for PA conversations without reference documents as context.
175
+
176
+
177
+
178
+ ## Model Examination
179
+
180
+ The model uses LoRA (Low-Rank Adaptation) which allows for parameter-efficient fine-tuning. This approach:
181
+
182
+ - Preserves the original model weights while adding small adapter modules
183
+ - Enables efficient switching between different task-specific adaptations
184
+ - Reduces memory requirements during training and inference
185
+ - Maintains interpretability by keeping the base model architecture intact
186
+ - This variant is specifically designed for Italian language tasks without reference documents as context
187
+
188
+ ## Environmental Impact
189
+
190
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
191
+
192
+ The environmental impact of this model is reduced compared to training from scratch due to:
193
+
194
+ - **Efficient Training:** LoRA adaptation requires significantly less compute than full model training
195
+ - **4-bit Quantization:** Reduces memory usage and energy consumption during training
196
+ - **Hardware Type:** Multi-GPU setup (specific hardware configuration may vary)
197
+ - **Training Approach:** Parameter-efficient fine-tuning reduces overall computational requirements
198
+
199
+ *Note: Specific carbon emission calculations would require detailed hardware specifications and training duration measurements.*
200
+
201
+ ## Technical Specifications
202
+
203
+ ### Model Architecture and Objective
204
+
205
+ - **Base Architecture:** Llama 3.1 (8B parameters)
206
+ - **Adaptation Method:** LoRA (Low-Rank Adaptation)
207
+ - **Objective:** Supervised Fine-tuning for Italian Public Administration dialogue tasks without reference documents as context
208
+ - **Quantization:** 4-bit quantization for efficient training and inference
209
+ - **Maximum Context Length:** 4,200 tokens
210
+
211
+ ### Compute Infrastructure
212
+
213
+ #### Hardware
214
+
215
+ - **Training Setup:** Multi-GPU configuration (4 processes)
216
+ - **Memory Optimization:** 4-bit quantization with LoRA adapters
217
+ - **Distributed Training:** Accelerate framework for multi-GPU coordination
218
+
219
+ #### Software
220
+
221
+ - **Framework:** PyTorch with Transformers library
222
+ - **Training Libraries:**
223
+ - PEFT 0.17.1 (Parameter-Efficient Fine-Tuning)
224
+ - Accelerate (distributed training)
225
+ - DeepSpeed (optimization)
226
+ - TRL (Transformer Reinforcement Learning)
227
+ - **Monitoring:** Weights & Biases (WandB)
228
+ - **Configuration Management:** DeepSpeed configuration for memory optimization
229
+
230
+ ## Citation
231
+
232
+ **BibTeX:**
233
+
234
+ ```bibtex
235
+ @misc{aixpa_llama31_8b_lora_no_ground,
236
+ title={AiXPA Fine-tuned Llama 3.1 8B Model (No Ground Document)},
237
+ author={LanD (FBK)},
238
+ year={2025},
239
+ howpublished={Hugging Face Model Repository},
240
+ note={Fine-tuned from meta-llama/Meta-Llama-3.1-8B-Instruct using LoRA, trained on Italian Public Administration dialogue data without reference documents}
241
+ }
242
+ ```
243
+
244
+ **APA:**
245
+
246
+ LanD (FBK). (2025). *AiXPA Fine-tuned Llama 3.1 8B Model (No Ground Document)*. Hugging Face Model Repository. Fine-tuned from meta-llama/Meta-Llama-3.1-8B-Instruct using LoRA, trained on Italian Public Administration dialogue data without reference documents.
247
+
248
+ ## Glossary
249
+
250
+ - **LoRA (Low-Rank Adaptation):** A parameter-efficient fine-tuning technique that adds trainable low-rank matrices to existing model weights
251
+ - **SFT (Supervised Fine-Tuning):** Training method using labeled data to improve model performance on specific tasks
252
+ - **4-bit Quantization:** Technique to reduce model memory usage by representing weights with 4-bit precision
253
+ - **Multi-GPU Training:** Distributed training approach using multiple GPUs to accelerate training
254
+ - **No Ground Document:** Training approach that does not rely on reference documents as context
255
+
256
+ ## Model Card Authors
257
+
258
+ LanD (FBK)
259
+
260
+ ## Model Card Contact
261
+
262
+ For questions or issues regarding this model, please contact the LanD (FBK) team through the appropriate channels.
263
+ ### Framework versions
264
+
265
+ - PEFT 0.17.1
adapter_config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "meta-llama/Meta-Llama-3.1-8B-Instruct",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.0,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "o_proj",
29
+ "down_proj",
30
+ "gate_proj",
31
+ "q_proj",
32
+ "k_proj",
33
+ "v_proj",
34
+ "up_proj"
35
+ ],
36
+ "target_parameters": null,
37
+ "task_type": "CAUSAL_LM",
38
+ "trainable_token_indices": null,
39
+ "use_dora": false,
40
+ "use_qalora": false,
41
+ "use_rslora": false
42
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9ec405aca27b8548f72739eddef63f69f48e46df8274bd9951f74bd3730055a
3
+ size 83946192
eval_loss_no_ground.png ADDED