Update README.md
Browse files
README.md
CHANGED
|
@@ -38,18 +38,73 @@ improved ability to accomplish the tasks of assisting indivudals with budgeting
|
|
| 38 |
|
| 39 |
|
| 40 |
- **Developed by:** Timothy Austin Rodriguez
|
| 41 |
-
- **Funded by
|
| 42 |
- **Training type:** LoRA - Few Shot Prompting (3)
|
| 43 |
- **Language(s) (NLP):** Python
|
| 44 |
- **License:** MIT
|
| 45 |
-
- **Finetuned from model
|
| 46 |
|
| 47 |
### Training Data
|
| 48 |
|
| 49 |
This model is trained on a procedurally generated synthetic dataset that provides structured prompts and responses to assist the underlying Fino-1 8B model
|
| 50 |
with creating executable python code which creates and exports budget spreadsheet to a Microsoft Excel .xlsx format. THis dataset (attached to this repository) is comprised
|
| 51 |
-
of 3000 examples which were divided into a train/validation split of 2500 for training and 500 for validation. The code used to create this dataset including
|
| 52 |
-
located in the ipynb files attached to this repository.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
## Uses
|
| 55 |
|
|
@@ -208,10 +263,10 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
| 208 |
|
| 209 |
[More Information Needed]
|
| 210 |
|
| 211 |
-
## Model Card Authors
|
| 212 |
|
| 213 |
-
|
| 214 |
|
| 215 |
## Model Card Contact
|
| 216 |
|
| 217 |
-
|
|
|
|
| 38 |
|
| 39 |
|
| 40 |
- **Developed by:** Timothy Austin Rodriguez
|
| 41 |
+
- **Funded by:** University of Virginia
|
| 42 |
- **Training type:** LoRA - Few Shot Prompting (3)
|
| 43 |
- **Language(s) (NLP):** Python
|
| 44 |
- **License:** MIT
|
| 45 |
+
- **Finetuned from model:** Fino1-8B [which is fine tuned from Llama 3.1 8B Instruct]
|
| 46 |
|
| 47 |
### Training Data
|
| 48 |
|
| 49 |
This model is trained on a procedurally generated synthetic dataset that provides structured prompts and responses to assist the underlying Fino-1 8B model
|
| 50 |
with creating executable python code which creates and exports budget spreadsheet to a Microsoft Excel .xlsx format. THis dataset (attached to this repository) is comprised
|
| 51 |
+
of 3000 examples which were divided into a train/validation split of 2500 for training and 500 for validation. The code used to create and randomize this dataset including
|
| 52 |
+
the seeds (42 for randomization, 60 for creation) can be located in the ipynb files attached to this repository. This dataset is called budget_dataset.csv
|
| 53 |
+
|
| 54 |
+
While not used for trianing this model, a secondary dataset for the purposes of improving the model's performance on short, medium and long term goal planning was developed
|
| 55 |
+
via procedural generation. This dataset was generated much like the first through random procedural generation of 3000 examples of prompts and responses. random seeds, and
|
| 56 |
+
train/validation split code can be located in the same ipynb file as the budget dataset. This dataset is called goals_dataset.csv. This dataset was not used to train the final
|
| 57 |
+
model due to poor performance encountered when leveraging LoRA for addtional training. The model actually performed worse when prompted with an example from the validation dataset
|
| 58 |
+
after training than before training. A deeper exploration of why this occured is warrented and other training/tuning methods should be considered beyond LoRA for future enhancement
|
| 59 |
+
of this model.
|
| 60 |
+
|
| 61 |
+
## Training Method
|
| 62 |
+
|
| 63 |
+
The method of training/tuning for this model is the Parameter-Efficient Fine-Tuning method called Low-Rank Adaptation or LoRA. LoRA is a fine tuning approach that is well
|
| 64 |
+
suited to tuning a model for domain specific tasks such as creating personal financial plans. LoRA is significantly more efficient than full fine tuning requiring fewer compute
|
| 65 |
+
resources and is much more memory efficient as fewere model weights are changed. In many cases LoRA implementation yeilds results very similar to full fine tuning without the
|
| 66 |
+
heavy computational expense inherent with full fine tuning. This method was chosen given the time allocated for training this model, limited compute resouces due to competing
|
| 67 |
+
requests for GPU time on the University of Virginia's Rivanna High Performance Computing cluster and the desire to have similar results to full fine tuning desptie the lack of
|
| 68 |
+
compute resouces required. LoRA Tuning hyperparameter values were selected through experimentation and can be found in one of the ipynb files attached to this repository and in
|
| 69 |
+
the summary below.
|
| 70 |
+
|
| 71 |
+
Hyperparameters
|
| 72 |
+
- LORA_R = 64
|
| 73 |
+
- LORA_ALPHA = 64
|
| 74 |
+
- LORA_DROPOUT = 0.05
|
| 75 |
+
|
| 76 |
+
Tuning/Training Settings
|
| 77 |
+
- earning_rate = 0.00001
|
| 78 |
+
- epochs = 5
|
| 79 |
+
|
| 80 |
+
Secondarily, this model makes use of Few Shot Prompting due to the aforementioned poor performance of LoRA when training on the goals dataset. It was found that few shot
|
| 81 |
+
prompting improves the ability of the model to provide the desired response structure without degrading the model's performance as was noted with LoRA implementation reguardless
|
| 82 |
+
of the Hyperparameters that were selected. Examples code for how to implement the appropriate few shot prompting is availabe in one of the provide ipynb files in this repository.
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
## Evaluation
|
| 86 |
+
|
| 87 |
+
| Model | GSM8K | MMLU | Budget Example | Goals Example |
|
| 88 |
+
|----------------------------|-------|-------|---------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
|
| 89 |
+
| Fino-1 8B | 63.33 | 66.84 | Provides code but not in desired format, provides narrative (not desired) | Reasonable Response, not desired structure |
|
| 90 |
+
| Llama-3.2-3B-Instruct | 50.00 | 61.11 | Provides code but not in desired format. | Close to desired format but recommends inappropriate savings vehicle for long term goal |
|
| 91 |
+
| Ministral-8B-Instruct-2410 | 66.66 | 64.50 | Provides code but not in desired format. | Reasonable format but recommends inappropriate savings vehicle for long term goal |
|
| 92 |
+
| FinPlan-1 | 53.33 | 65.73 | Provides code in desired format | Reasonable format, and reasonable savings/investment vehicles recommended. |
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
The benchmarks chosen, GSM8K, MMLU and the two synthetic dataset examples were selected to provide a view of the performance of the model both in terms of its generalization
|
| 96 |
+
ability as well as it's ability to perform the tasks it is trained to accomplish. As the underlying model that FinPlan-1 is based on, Fino-1 8B is a natural comparsion model
|
| 97 |
+
to evaluate for benchmarking. Further, the Llama 3.2-3B Instruct model is a newer version of the model which underlies Fino-1 8B albeit a smaller version parameter wise. Given
|
| 98 |
+
this model's rather decent performance on the financial planning tasks it serves as a good comparsion for FinPlan-1. Finally Ministral 8B instruct -2410 model is of comparable
|
| 99 |
+
size parameter wise to FinPlan-1 and was originally considered as a potential base model to train for FinPlan-1, thus making it a good model for comparison.Since the tasks this model is tuned to accomplish are non standard and domain specific, the
|
| 100 |
+
benchmark for these tasks comes from the validation/hold out split of the training dataset and its evaluation is somewhat subjective. For each of these models, the Budget and Goals examples were
|
| 101 |
+
presented to the model in either a zero shot prompt (budget) or a three shot prompt (goals). Only the trained FinPlan-1 model was able to provide the desired format for the excel file
|
| 102 |
+
for the budget task while both Fino-1 8B and FinPlan-1 performed well on the goals dataset. For measurement of generalizability and retention of reasoning skill, all four models
|
| 103 |
+
were benchmarked on GSM8K (grade school mathematics reasoning) as well as MMLU (general reasoning). While the domain specific LoRA tuning certainly led to a degredation in FinPlan-1's
|
| 104 |
+
benchmark scores with respect to its underlying model Fino-1 8B, the drop in performance is rather small for MMLU and GSM8K performance remains above Llama 3.2 -3B Instruct.
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
|
| 108 |
|
| 109 |
## Uses
|
| 110 |
|
|
|
|
| 263 |
|
| 264 |
[More Information Needed]
|
| 265 |
|
| 266 |
+
## Model Card Authors
|
| 267 |
|
| 268 |
+
Timothy Austin Rodriguez
|
| 269 |
|
| 270 |
## Model Card Contact
|
| 271 |
|
| 272 |
+
tar3kh@virginia.edu
|