Update README.md
Browse files
README.md
CHANGED
|
@@ -11,18 +11,55 @@ metrics:
|
|
| 11 |
- f1
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
| 15 |
|
| 16 |
-
|
| 17 |
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
-
|
|
|
|
|
|
|
| 24 |
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
|
| 28 |
|
|
|
|
| 11 |
- f1
|
| 12 |
---
|
| 13 |
|
| 14 |
+
This model is a fine-tuned version of the BERT language model, specifically adapted for multi-label classification tasks in the
|
| 15 |
+
financial regulatory domain. It is built upon the pre-trained ProsusAI/finbert model, which has been further fine-tuned using a diverse
|
| 16 |
+
dataset of financial regulatory texts. This allows the model to accurately classify text into multiple relevant categories simultaneously.
|
| 17 |
|
| 18 |
+
# Model Architecture
|
| 19 |
|
| 20 |
+
- **Base Model**: BERT
|
| 21 |
+
- **Pre-trained Model**: ProsusAI/finbert
|
| 22 |
+
- **Task**: Multi-label classification
|
| 23 |
|
| 24 |
+
## Intended Use
|
| 25 |
|
| 26 |
+
This model is intended for multi-label classification tasks related to the following categories:
|
| 27 |
|
| 28 |
+
- Regulatory
|
| 29 |
+
- Compliance
|
| 30 |
+
- Risks
|
| 31 |
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
## Performance
|
| 35 |
+
|
| 36 |
+
Performance metrics on the validation set:
|
| 37 |
+
|
| 38 |
+
- F1 Score: 0.8637
|
| 39 |
+
- ROC AUC: 0.9044
|
| 40 |
+
- Accuracy: 0.6155
|
| 41 |
+
|
| 42 |
+
## Limitations and Ethical Considerations
|
| 43 |
+
|
| 44 |
+
- This model's performance may vary depending on the specific nature of the text data and label distribution.
|
| 45 |
+
- Class imbalance in the dataset.
|
| 46 |
+
|
| 47 |
+
## Dataset Information
|
| 48 |
+
|
| 49 |
+
- **Training Dataset**: Number of samples: 6562
|
| 50 |
+
- **Validation Dataset**: Number of samples: 929
|
| 51 |
+
- **Test Dataset**: Number of samples: 1884
|
| 52 |
+
|
| 53 |
+
## Training Details
|
| 54 |
+
|
| 55 |
+
- **Training Strategy**: Fine-tuning BERT with a randomly initialized classification head.
|
| 56 |
+
- **Optimizer**: Adam
|
| 57 |
+
- **Learning Rate**: 1e-4
|
| 58 |
+
- **Batch Size**: 16
|
| 59 |
+
- **Number of Epochs**: 2
|
| 60 |
+
- **Evaluation Strategy**: Epoch
|
| 61 |
+
- **Weight Decay**: 0.01
|
| 62 |
+
- **Metric for Best Model**: F1 Score
|
| 63 |
|
| 64 |
|
| 65 |
|