| Abstract for Transactify...... | |
| Transactify is an LSTM-based model designed to predict the category of online payment transactions from their descriptions. | |
| By analyzing textual inputs like "Live concert stream on YouTube" or "Coffee at Starbucks," it classifies transactions into categories such as "Movies & Entertainment" or "Food & Dining." | |
| This model helps users track and organize their spending across various sectors, providing better financial insights and budgeting. | |
| Transactify is trained on real-world transaction data for improved accuracy and generalization. | |
| Table of contents.... | |
| 1.Data Collection: | |
| The dataset consists of 5,000 transaction records generated using ChatGPT, each containing a transaction description and its corresponding category. | |
| Example entries include descriptions like "Live concert stream on YouTube" (Movies & Entertainment) and "Coffee at Starbucks" (Food & Dining). | |
| These records cover various spending categories such as Lifestyle, Movies & Entertainment, Food & Dining, and others. | |
| 2.Data Preprocessing: | |
| The preprocessing step involves several natural language processing (NLP) tasks to clean and prepare the text data for model training. | |
| These include: | |
| Lowercasing all text. | |
| Removing digits and punctuation using regular expressions (regex). | |
| Tokenizing the cleaned text to convert it into a sequence of tokens. | |
| Applying text_to_sequences to transform the tokenized words into numerical sequences. | |
| Using pad_sequences to ensure all sequences have the same length for input into the LSTM model. | |
| Label encoding the target categories to convert them into numerical labels. | |
| After preprocessing, the data is split into training and testing sets to build and validate the model. | |
| 3.Model Building: | |
| Embedding Layer: Converts tokenized transaction descriptions into dense vectors, capturing word semantics and relationships. | |
| LSTM Layer: Learns sequential patterns from the embedded text, helping the model understand the context and relationships between words over time. | |
| Dropout Layer: Introduces regularization by randomly turning off neurons during training, reducing overfitting and improving the model's generalization. | |
| Dense Layer with Softmax Activation: Outputs a probability distribution across categories, allowing the model to predict the correct category for each transaction description. | |
| Model Compilation: Compiled with the Adam optimizer for efficient learning, sparse categorical cross-entropy loss for multi-class classification, and accuracy as the evaluation metric. | |
| Model Training: The model is trained for 50 epochs with a batch size of 8, using a validation set to monitor performance and adjust during training. | |
| Saving the Model and Preprocessing Objects: | |
| The trained model is saved as transactify.h5 for future use. | |
| The tokenizer and label encoder used during preprocessing are saved using joblib as tokenizer.joblib and label_encoder.joblib, respectively, | |
| ensuring they can be reused for consistent tokenization and label encoding when making predictions on new data. | |
| 4.Prediction: | |
| Once trained, the model is used to predict the category of new transaction descriptions. | |
| The output provides the category label, enabling users to classify their spending based on transaction descriptions. | |
| 5.Conclusion: | |
| The Transactify model effectively categorizes transaction descriptions using LSTM networks. | |
| However, to improve the accuracy and reliability of predictions, a larger and more diverse dataset is necessary. | |
| Expanding the dataset will help the model generalize better across various spending behaviors and conditions. | |
| This enhancement will lead to more precise predictions, enabling users to gain deeper insights into their spending patterns. | |
| Future work should focus on collecting additional data to refine the model's performance and applicability in real-world scenarios. | |
|  |