ThinkTim21
/

FinPlan-1

@@ -38,18 +38,73 @@ improved ability to accomplish the tasks of assisting indivudals with budgeting
 - **Developed by:** Timothy Austin Rodriguez
-- **Funded by [optional]:** University of Virginia
 - **Training type:** LoRA - Few Shot Prompting (3)
 - **Language(s) (NLP):** Python
 - **License:** MIT
-- **Finetuned from model [optional]:** Fino1-8B [which is fine tuned from Llama 3.1 8B Instruct]
 ### Training Data
 This model is trained on a procedurally generated synthetic dataset that provides structured prompts and responses to assist the underlying Fino-1 8B model
 with creating executable python code which creates and exports budget spreadsheet to a Microsoft Excel .xlsx format. THis dataset (attached to this repository) is comprised
-of 3000 examples which were divided into a train/validation split of 2500 for training and 500 for validation. The code used to create this dataset including the seeds() can be
-located in the ipynb files attached to this repository.
 ## Uses
@@ -208,10 +263,10 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 [More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
 ## Model Card Contact
-[More Information Needed]

 - **Developed by:** Timothy Austin Rodriguez
+- **Funded by:** University of Virginia
 - **Training type:** LoRA - Few Shot Prompting (3)
 - **Language(s) (NLP):** Python
 - **License:** MIT
+- **Finetuned from model:** Fino1-8B [which is fine tuned from Llama 3.1 8B Instruct]
 ### Training Data
 This model is trained on a procedurally generated synthetic dataset that provides structured prompts and responses to assist the underlying Fino-1 8B model
 with creating executable python code which creates and exports budget spreadsheet to a Microsoft Excel .xlsx format. THis dataset (attached to this repository) is comprised
+of 3000 examples which were divided into a train/validation split of 2500 for training and 500 for validation. The code used to create and randomize this dataset including
+the seeds (42 for randomization, 60 for creation) can be located in the ipynb files attached to this repository. This dataset is called budget_dataset.csv
+While not used for trianing this model, a secondary dataset for the purposes of improving the model's performance on short, medium and long term goal planning was developed
+via procedural generation. This dataset was generated much like the first through random procedural generation of 3000 examples of prompts and responses. random seeds, and
+train/validation split code can be located in the same ipynb file as the budget dataset. This dataset is called goals_dataset.csv. This dataset was not used to train the final
+model due to poor performance encountered when leveraging LoRA for addtional training. The model actually performed worse when prompted with an example from the validation dataset
+after training than before training. A deeper exploration of why this occured is warrented and other training/tuning methods should be considered beyond LoRA for future enhancement
+of this model.
+## Training Method
+The method of training/tuning for this model is the Parameter-Efficient Fine-Tuning method called Low-Rank Adaptation or LoRA. LoRA is a fine tuning approach that is well
+suited to tuning a model for domain specific tasks such as creating personal financial plans. LoRA is significantly more efficient than full fine tuning requiring fewer compute
+resources and is much more memory efficient as fewere model weights are changed. In many cases LoRA implementation yeilds results very similar to full fine tuning without the
+heavy computational expense inherent with full fine tuning. This method was chosen given the time allocated for training this model, limited compute resouces due to competing
+requests for GPU time on the University of Virginia's Rivanna High Performance Computing cluster and the desire to have similar results to full fine tuning desptie the lack of
+compute resouces required. LoRA Tuning hyperparameter values were selected through experimentation and can be found in one of the ipynb files attached to this repository and in
+the summary below.
+Hyperparameters
+- LORA_R = 64
+- LORA_ALPHA = 64
+- LORA_DROPOUT = 0.05
+Tuning/Training Settings
+- earning_rate = 0.00001
+- epochs = 5
+Secondarily, this model makes use of Few Shot Prompting due to the aforementioned poor performance of LoRA when training on the goals dataset. It was found that few shot
+prompting improves the ability of the model to provide the desired response structure without degrading the model's performance as was noted with LoRA implementation reguardless
+of the Hyperparameters that were selected. Examples code for how to implement the appropriate few shot prompting is availabe in one of the provide ipynb files in this repository.
+## Evaluation
+| Model                      | GSM8K | MMLU  | Budget Example                                                            | Goals Example                                                                           |
+|----------------------------|-------|-------|---------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
+| Fino-1 8B                  | 63.33 | 66.84 | Provides code but not in desired format, provides narrative (not desired) | Reasonable Response, not desired structure                                              |
+| Llama-3.2-3B-Instruct      | 50.00 | 61.11 | Provides code but not in desired format.                                  | Close to desired format but recommends inappropriate savings vehicle for long term goal |
+| Ministral-8B-Instruct-2410 | 66.66 | 64.50 | Provides code but not in desired format.                                  | Reasonable format but recommends inappropriate savings vehicle for long term goal       |
+| FinPlan-1                  | 53.33 | 65.73 | Provides code in desired format                                           | Reasonable format, and reasonable savings/investment vehicles recommended.              |
+The benchmarks chosen, GSM8K, MMLU and the two synthetic dataset examples were selected to provide a view of the performance of the model both in terms of its generalization
+ability as well as it's ability to perform the tasks it is trained to accomplish. As the underlying model that FinPlan-1 is based on, Fino-1 8B is a natural comparsion model
+to evaluate for benchmarking. Further, the Llama 3.2-3B Instruct model is a newer version of the model which underlies Fino-1 8B albeit a smaller version parameter wise. Given
+this model's rather decent performance on the financial planning tasks it serves as a good comparsion for FinPlan-1. Finally Ministral 8B instruct -2410 model is of comparable
+size parameter wise to FinPlan-1 and was originally considered as a potential base model to train for FinPlan-1, thus making it a good model for comparison.Since the tasks this model is tuned to accomplish are non standard and domain specific, the
+benchmark for these tasks comes from the validation/hold out split of the training dataset and its evaluation is somewhat subjective. For each of these models, the Budget and Goals examples were
+presented to the model in either a zero shot prompt (budget) or a three shot prompt (goals). Only the trained FinPlan-1 model was able to provide the desired format for the excel file
+for the budget task while both Fino-1 8B and FinPlan-1 performed well on the goals dataset. For measurement of generalizability and retention of reasoning skill, all four models
+were benchmarked on GSM8K (grade school mathematics reasoning) as well as MMLU (general reasoning). While the domain specific LoRA tuning certainly led to a degredation in FinPlan-1's
+benchmark scores with respect to its underlying model Fino-1 8B, the drop in performance is rather small for MMLU and GSM8K performance remains above Llama 3.2 -3B Instruct.
 ## Uses
 [More Information Needed]
+## Model Card Authors
+Timothy Austin Rodriguez
 ## Model Card Contact
+tar3kh@virginia.edu