ThinkTim21 commited on
Commit
5b2eee0
·
verified ·
1 Parent(s): 84f5979

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -7
README.md CHANGED
@@ -38,18 +38,73 @@ improved ability to accomplish the tasks of assisting indivudals with budgeting
38
 
39
 
40
  - **Developed by:** Timothy Austin Rodriguez
41
- - **Funded by [optional]:** University of Virginia
42
  - **Training type:** LoRA - Few Shot Prompting (3)
43
  - **Language(s) (NLP):** Python
44
  - **License:** MIT
45
- - **Finetuned from model [optional]:** Fino1-8B [which is fine tuned from Llama 3.1 8B Instruct]
46
 
47
  ### Training Data
48
 
49
  This model is trained on a procedurally generated synthetic dataset that provides structured prompts and responses to assist the underlying Fino-1 8B model
50
  with creating executable python code which creates and exports budget spreadsheet to a Microsoft Excel .xlsx format. THis dataset (attached to this repository) is comprised
51
- of 3000 examples which were divided into a train/validation split of 2500 for training and 500 for validation. The code used to create this dataset including the seeds() can be
52
- located in the ipynb files attached to this repository.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
  ## Uses
55
 
@@ -208,10 +263,10 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
208
 
209
  [More Information Needed]
210
 
211
- ## Model Card Authors [optional]
212
 
213
- [More Information Needed]
214
 
215
  ## Model Card Contact
216
 
217
- [More Information Needed]
 
38
 
39
 
40
  - **Developed by:** Timothy Austin Rodriguez
41
+ - **Funded by:** University of Virginia
42
  - **Training type:** LoRA - Few Shot Prompting (3)
43
  - **Language(s) (NLP):** Python
44
  - **License:** MIT
45
+ - **Finetuned from model:** Fino1-8B [which is fine tuned from Llama 3.1 8B Instruct]
46
 
47
  ### Training Data
48
 
49
  This model is trained on a procedurally generated synthetic dataset that provides structured prompts and responses to assist the underlying Fino-1 8B model
50
  with creating executable python code which creates and exports budget spreadsheet to a Microsoft Excel .xlsx format. THis dataset (attached to this repository) is comprised
51
+ of 3000 examples which were divided into a train/validation split of 2500 for training and 500 for validation. The code used to create and randomize this dataset including
52
+ the seeds (42 for randomization, 60 for creation) can be located in the ipynb files attached to this repository. This dataset is called budget_dataset.csv
53
+
54
+ While not used for trianing this model, a secondary dataset for the purposes of improving the model's performance on short, medium and long term goal planning was developed
55
+ via procedural generation. This dataset was generated much like the first through random procedural generation of 3000 examples of prompts and responses. random seeds, and
56
+ train/validation split code can be located in the same ipynb file as the budget dataset. This dataset is called goals_dataset.csv. This dataset was not used to train the final
57
+ model due to poor performance encountered when leveraging LoRA for addtional training. The model actually performed worse when prompted with an example from the validation dataset
58
+ after training than before training. A deeper exploration of why this occured is warrented and other training/tuning methods should be considered beyond LoRA for future enhancement
59
+ of this model.
60
+
61
+ ## Training Method
62
+
63
+ The method of training/tuning for this model is the Parameter-Efficient Fine-Tuning method called Low-Rank Adaptation or LoRA. LoRA is a fine tuning approach that is well
64
+ suited to tuning a model for domain specific tasks such as creating personal financial plans. LoRA is significantly more efficient than full fine tuning requiring fewer compute
65
+ resources and is much more memory efficient as fewere model weights are changed. In many cases LoRA implementation yeilds results very similar to full fine tuning without the
66
+ heavy computational expense inherent with full fine tuning. This method was chosen given the time allocated for training this model, limited compute resouces due to competing
67
+ requests for GPU time on the University of Virginia's Rivanna High Performance Computing cluster and the desire to have similar results to full fine tuning desptie the lack of
68
+ compute resouces required. LoRA Tuning hyperparameter values were selected through experimentation and can be found in one of the ipynb files attached to this repository and in
69
+ the summary below.
70
+
71
+ Hyperparameters
72
+ - LORA_R = 64
73
+ - LORA_ALPHA = 64
74
+ - LORA_DROPOUT = 0.05
75
+
76
+ Tuning/Training Settings
77
+ - earning_rate = 0.00001
78
+ - epochs = 5
79
+
80
+ Secondarily, this model makes use of Few Shot Prompting due to the aforementioned poor performance of LoRA when training on the goals dataset. It was found that few shot
81
+ prompting improves the ability of the model to provide the desired response structure without degrading the model's performance as was noted with LoRA implementation reguardless
82
+ of the Hyperparameters that were selected. Examples code for how to implement the appropriate few shot prompting is availabe in one of the provide ipynb files in this repository.
83
+
84
+
85
+ ## Evaluation
86
+
87
+ | Model | GSM8K | MMLU | Budget Example | Goals Example |
88
+ |----------------------------|-------|-------|---------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
89
+ | Fino-1 8B | 63.33 | 66.84 | Provides code but not in desired format, provides narrative (not desired) | Reasonable Response, not desired structure |
90
+ | Llama-3.2-3B-Instruct | 50.00 | 61.11 | Provides code but not in desired format. | Close to desired format but recommends inappropriate savings vehicle for long term goal |
91
+ | Ministral-8B-Instruct-2410 | 66.66 | 64.50 | Provides code but not in desired format. | Reasonable format but recommends inappropriate savings vehicle for long term goal |
92
+ | FinPlan-1 | 53.33 | 65.73 | Provides code in desired format | Reasonable format, and reasonable savings/investment vehicles recommended. |
93
+
94
+
95
+ The benchmarks chosen, GSM8K, MMLU and the two synthetic dataset examples were selected to provide a view of the performance of the model both in terms of its generalization
96
+ ability as well as it's ability to perform the tasks it is trained to accomplish. As the underlying model that FinPlan-1 is based on, Fino-1 8B is a natural comparsion model
97
+ to evaluate for benchmarking. Further, the Llama 3.2-3B Instruct model is a newer version of the model which underlies Fino-1 8B albeit a smaller version parameter wise. Given
98
+ this model's rather decent performance on the financial planning tasks it serves as a good comparsion for FinPlan-1. Finally Ministral 8B instruct -2410 model is of comparable
99
+ size parameter wise to FinPlan-1 and was originally considered as a potential base model to train for FinPlan-1, thus making it a good model for comparison.Since the tasks this model is tuned to accomplish are non standard and domain specific, the
100
+ benchmark for these tasks comes from the validation/hold out split of the training dataset and its evaluation is somewhat subjective. For each of these models, the Budget and Goals examples were
101
+ presented to the model in either a zero shot prompt (budget) or a three shot prompt (goals). Only the trained FinPlan-1 model was able to provide the desired format for the excel file
102
+ for the budget task while both Fino-1 8B and FinPlan-1 performed well on the goals dataset. For measurement of generalizability and retention of reasoning skill, all four models
103
+ were benchmarked on GSM8K (grade school mathematics reasoning) as well as MMLU (general reasoning). While the domain specific LoRA tuning certainly led to a degredation in FinPlan-1's
104
+ benchmark scores with respect to its underlying model Fino-1 8B, the drop in performance is rather small for MMLU and GSM8K performance remains above Llama 3.2 -3B Instruct.
105
+
106
+
107
+
108
 
109
  ## Uses
110
 
 
263
 
264
  [More Information Needed]
265
 
266
+ ## Model Card Authors
267
 
268
+ Timothy Austin Rodriguez
269
 
270
  ## Model Card Contact
271
 
272
+ tar3kh@virginia.edu