library_name: transformers
tags: []
FinPlan-1
FinPlan-1 is a an LLM trained to assist with the creation of basic personal financial plans for individuals. This model is built off of the Fino1 model which is itself a version of Llama-3.1-8B-Instruct, which was CoT fine tuned to improve its finaicial reasoning ability.
Model Details
Model Description -- Introduction
According to Bankrate’s 2025 Emergency Savings Report, only 41% of American’s would be able to use their personal savings to pay for a $1,000 emergency expense, with the rest “financing it with a credit card they’d pay off over time, reducing their spending on other things, taking out a personal loan, borrowing from family or friends or other methods.”
The financial health of American’s is based on a number of factors but one important component is basic financial literacy and having a financial plan. The financial planning component is one area I think LLMs can be of assistance. This LLM is my attempt to further train and fine tune a model which has been trained on financial reasoning tasks to assist individuals with two key aspects of financial planning.
- Assist with the creation of a budget spreadsheet to enable individuals to keep track of their finances and understand where their money is going.
- Provide assistance with planning for short, medium and long term goals including breaking those goals down into monthly savings targets, and suggesting broad investment vehicles to fit each goal's timeframe.
While current LLM's can perform these tasks to an extent, they are often inconsistent with their responce structure, can sometimes struggle with breaking down basic mathematics and frequently go beyond the basic tasks at hand reccomending inappropriate savings and investiment vehicles for individual savings goals. The Fino-1 8B model is certainly well trained for the corporate financial reasoning tasks but its reccomendations for savings and investment vehicles were often too agressive for short term goals and may reccomend long term savings vehicles which carry tax penalties if not used approporately. This model uses LoRA on a proceedureally generated budgeting dataset as well as few shot prompting using a separate dataset based around short, medium and long term goals to enchance the ability of Fino-1 8B to accomplish these tasks.
The results of this training and prompting method are encouraging as the model consistently produces budget spreadsheets (through the generation of executable python code) as well as somewhat reliable savings plan assistance with the use of few shot prompting. These training methods do have an impact on this model's performance on standard benchmarks like gsm8k and mmlu resulting in drops in performance on both tasks compared with the base model, however this loss in generalization is made up for in the model's improved ability to accomplish the tasks of assisting indivudals with budgeting and fixed term savinings goals.
- Developed by: Timothy Austin Rodriguez
- Funded by: University of Virginia
- Training type: LoRA - Few Shot Prompting (3)
- Language(s) (NLP): Python
- License: MIT
- Finetuned from model: Fino1-8B [which is fine tuned from Llama 3.1 8B Instruct]
Training Data
This model is trained on a procedurally generated synthetic dataset that provides structured prompts and responses to assist the underlying Fino-1 8B model with creating executable python code which creates and exports budget spreadsheet to a Microsoft Excel .xlsx format. THis dataset (attached to this repository) is comprised of 3000 examples which were divided into a train/validation split of 2500 for training and 500 for validation. The code used to create and randomize this dataset including the seeds (42 for randomization, 60 for creation) can be located in the ipynb files attached to this repository. This dataset is called budget_dataset.csv
While not used for trianing this model, a secondary dataset for the purposes of improving the model's performance on short, medium and long term goal planning was developed via procedural generation. This dataset was generated much like the first through random procedural generation of 3000 examples of prompts and responses. random seeds, and train/validation split code can be located in the same ipynb file as the budget dataset. This dataset is called goals_dataset.csv. This dataset was not used to train the final model due to poor performance encountered when leveraging LoRA for addtional training. The model actually performed worse when prompted with an example from the validation dataset after training than before training. A deeper exploration of why this occured is warrented and other training/tuning methods should be considered beyond LoRA for future enhancement of this model.
Training Method
The method of training/tuning for this model is the Parameter-Efficient Fine-Tuning method called Low-Rank Adaptation or LoRA. LoRA is a fine tuning approach that is well suited to tuning a model for domain specific tasks such as creating personal financial plans. LoRA is significantly more efficient than full fine tuning requiring fewer compute resources and is much more memory efficient as fewere model weights are changed. In many cases LoRA implementation yeilds results very similar to full fine tuning without the heavy computational expense inherent with full fine tuning. This method was chosen given the time allocated for training this model, limited compute resouces due to competing requests for GPU time on the University of Virginia's Rivanna High Performance Computing cluster and the desire to have similar results to full fine tuning desptie the lack of compute resouces required. LoRA Tuning hyperparameter values were selected through experimentation and can be found in one of the ipynb files attached to this repository and in the summary below.
Hyperparameters
- LORA_R = 64
- LORA_ALPHA = 64
- LORA_DROPOUT = 0.05
Tuning/Training Settings
- earning_rate = 0.00001
- epochs = 5
Secondarily, this model makes use of Few Shot Prompting due to the aforementioned poor performance of LoRA when training on the goals dataset. It was found that few shot prompting improves the ability of the model to provide the desired response structure without degrading the model's performance as was noted with LoRA implementation reguardless of the Hyperparameters that were selected. Examples code for how to implement the appropriate few shot prompting is availabe in one of the provide ipynb files in this repository.
Evaluation
| Model | GSM8K | MMLU | Budget Example | Goals Example |
|---|---|---|---|---|
| Fino-1 8B | 63.33 | 66.84 | Provides code but not in desired format, provides narrative (not desired) | Reasonable Response, not desired structure |
| Llama-3.2-3B-Instruct | 50.00 | 61.11 | Provides code but not in desired format. | Close to desired format but recommends inappropriate savings vehicle for long term goal |
| Ministral-8B-Instruct-2410 | 66.66 | 64.50 | Provides code but not in desired format. | Reasonable format but recommends inappropriate savings vehicle for long term goal |
| FinPlan-1 | 53.33 | 65.73 | Provides code in desired format | Reasonable format, and reasonable savings/investment vehicles recommended. |
The benchmarks chosen, GSM8K, MMLU and the two synthetic dataset examples were selected to provide a view of the performance of the model both in terms of its generalization ability as well as it's ability to perform the tasks it is trained to accomplish. As the underlying model that FinPlan-1 is based on, Fino-1 8B is a natural comparsion model to evaluate for benchmarking. Further, the Llama 3.2-3B Instruct model is a newer version of the model which underlies Fino-1 8B albeit a smaller version parameter wise. Given this model's rather decent performance on the financial planning tasks it serves as a good comparsion for FinPlan-1. Finally Ministral 8B instruct -2410 model is of comparable size parameter wise to FinPlan-1 and was originally considered as a potential base model to train for FinPlan-1, thus making it a good model for comparison.Since the tasks this model is tuned to accomplish are non standard and domain specific, the benchmark for these tasks comes from the validation/hold out split of the training dataset and its evaluation is somewhat subjective. For each of these models, the Budget and Goals examples were presented to the model in either a zero shot prompt (budget) or a three shot prompt (goals). Only the trained FinPlan-1 model was able to provide the desired format for the excel file for the budget task while both Fino-1 8B and FinPlan-1 performed well on the goals dataset. For measurement of generalizability and retention of reasoning skill, all four models were benchmarked on GSM8K (grade school mathematics reasoning) as well as MMLU (general reasoning). While the domain specific LoRA tuning certainly led to a degredation in FinPlan-1's benchmark scores with respect to its underlying model Fino-1 8B, the drop in performance is rather small for MMLU and GSM8K performance remains above Llama 3.2 -3B Instruct.
Intended Useage
As described above this model is intended to be used to assist with the creation of simple financial plans for individuals, specifically for assistance with the creation of a budget spreadsheet for tracking expenseses as well as planning for, short, medium and long term savings goals. While this model can be prompted on a wide range of other tasks, it is not recommended to use this model for those purposes as it has been speficially fine tuned for these two tasks and perforamnce on tasks outside that scope could be diminished.
See below for the basic code required in order to import the model from huggingface using torch. Note the tokenizer is pulled from the Fino-1 8B repository as it was not changed from the base Fino-1 8B model.
import os
os.environ['HF_HOME'] = "your/directory/here"
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from datasets import load_dataset #datasets is huggingface's dataset package
from peft import get_peft_model, LoraConfig, TaskType
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import PIL
import lm_eval
tokenizer = AutoTokenizer.from_pretrained("TheFinAI/Fino1-8B")
model = AutoModelForCausalLM.from_pretrained("ThinkTim21/FinPlan-1")
# Prepare the model and tokenizer
tokenizer.pad_token = tokenizer.eos_token # set padding token to EOS token
model.config.poad_token_id = tokenizer.pad_token_id # set the padding token for model
budget = pd.read_csv("budget_dataset.csv") # use the dataset attached to this repo
goals = pd.read_csv("goals_dataset.csv") # use the dataset attached to this repo
budget['instruct_lora'] = budget.apply(
lambda row: f"Q: {row['question']}\n\nA: ",
axis=1
)
goals['instruct_lora'] = goals.apply(
lambda row: f"Q: {row['question']}\n\nA: ",
axis=1
)
from datasets import load_dataset, Dataset #datasets is huggingface's dataset package
budget = budget.sample(frac = 1, random_state = 42) # randomly shuffle DF
train_budget = budget[:2500]
val_budget = budget[2500:]
train_budget = Dataset.from_pandas(train_budget)
val_budget = Dataset.from_pandas(val_budget)
train_budget = train_budget.map(lambda samples: tokenizer(samples['instruct']), batched = True)
val_budget = val_budget.map(lambda samples: tokenizer(samples['instruct']), batched = True)
goals = goals.sample(frac = 1, random_state = 42) # randomly shuffle DF
train_goals = goals[:2500]
val_goals = goals[2500:]
train_goals = Dataset.from_pandas(train_goals)
val_goals = Dataset.from_pandas(val_goals)
train_goals = train_goals.map(lambda samples: tokenizer(samples['instruct']), batched = True)
val_goals = val_goals.map(lambda samples: tokenizer(samples['instruct']), batched = True)
formatted_prompt = f"Q: {val_goals[0]['question']}\n\nA: "
inputs = tokenizer.encode(formatted_prompt, return_tensors = "pt").to(model.device)
output = model.generate(inputs, max_new_tokens = 800, pad_token_id = tokenizer.pad_token_id, do_sample = False)
generated_text = tokenizer.decode(output[0], skip_special_tokens = True)
print(generated_text)
Prompt Format
The prompt format varies between the budget task and the goals task.
For the budget task, the following prompt method is reccomended.
For the goals task, I reccomend using Few Shot Prompting, making use of the goals_dataset.csv file as your base and then adding your prefered prompt in the following format to the few shot examples derived from the goals dataset.
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
Training Data
[More Information Needed]
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
[More Information Needed]
Hardware
[More Information Needed]
Software
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors
Timothy Austin Rodriguez