| | --- |
| | library_name: transformers |
| | license: apache-2.0 |
| | base_model: answerdotai/ModernBERT-base |
| | tags: |
| | - generated_from_trainer |
| | metrics: |
| | - accuracy |
| | - f1 |
| | - precision |
| | - recall |
| | model-index: |
| | - name: modernbert_fingpt_results |
| | results: [] |
| | datasets: |
| | - FinGPT/fingpt-sentiment-train |
| | --- |
| | |
| | # ModernBERT Fine-tuned for Financial Text Sentiment Analysis |
| |
|
| | This project fine-tunes the **ModernBERT** model on the **FinGPT** sentiment dataset for financial text sentiment analysis. |
| |
|
| | ## Dataset & Model |
| |
|
| | - **Model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) |
| | - **Dataset**: [FinGPT/fingpt-sentiment-train](https://huggingface.co/datasets/FinGPT/fingpt-sentiment-train) |
| | - **Task**: Multi-class sentiment classification (9 categories) |
| | - **Domain**: Financial text analysis |
| |
|
| | ### ModernBert |
| | ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. |
| | It leverages architectural improvements such as Rotary Positional Embeddings (RoPE) for long-context support, Local-Global Alternating Attention for efficiency on long inputs, Unpadding and Flash Attention for efficient inference. |
| |
|
| | ### FinGPT Sentiment Analysis Dataset |
| | Contains 76,772 test rows (17,919,695 tokens) |
| |
|
| | ## Sentiment Categories |
| |
|
| | The model classifies text into 9 fine-grained sentiment levels: |
| |
|
| | | Label ID | Sentiment Category | Description | |
| | |----------|-------------------|-------------| |
| | | 0 | Strong Negative | Very pessimistic | |
| | | 1 | Moderately Negative | Somewhat pessimistic | |
| | | 2 | Mildly Negative | Slightly pessimistic | |
| | | 3 | Negative | General negative sentiment | |
| | | 4 | Neutral | No clear positive or negative bias | |
| | | 5 | Mildly Positive | Slightly optimistic | |
| | | 6 | Moderately Positive | Somewhat optimistic | |
| | | 7 | Positive | General positive sentiment | |
| | | 8 | Strong Positive | Very optimistic | |
| |
|
| | ## Model Configuration |
| |
|
| | ### Parameters |
| | - **Max Sequence Length**: 512 tokens |
| | - **Batch Size**: 16 |
| | - **Learning Rate**: 2e-5 with warmup |
| | - **Epochs**: 3 with early stopping |
| | - **Optimizer**: AdamW with weight decay (0.01) |
| |
|
| | ### Features |
| | - **Early Stopping**: Prevents overfitting (patience=3) |
| | - **Best Model Loading**: Automatically loads best checkpoint |
| | - **Mixed Precision**: FP16 training for speed optimization |
| | - **Stratified Splitting**: 80/20 train/validation split |
| |
|
| | ## Evaluation Metrics |
| | - **Accuracy**: Overall classification accuracy |
| | - **F1-Score**: Weighted F1-score across all classes |
| | - **Precision**: Weighted precision |
| | - **Recall**: Weighted recall |
| | - **Confusion Matrix**: Visual analysis of classification performance |
| | - **Classification Report**: Detailed per-class metrics |
| |
|
| | ## Performance |
| |
|
| | ### Training Time (on T4 GPU) |
| | - **Total Training**: ~30-45 minutes |
| | - **Per Epoch**: ~10-15 minutes |
| | - **Evaluation**: ~2-3 minutes |
| |
|
| | ### Training Results (Actual) |
| |
|
| | - Loss: 0.3741 |
| | - Accuracy: 0.9043 |
| | - F1: 0.9026 |
| | - Precision: 0.9022 |
| | - Recall: 0.9043 |
| |
|
| |
|
| | | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall | |
| | |:-------------:|:------:|:-----:|:---------------:|:--------:|:------:|:---------:|:------:| |
| | | 0.9551 | 0.1302 | 500 | 0.8504 | 0.6769 | 0.6623 | 0.6589 | 0.6769 | |
| | | 0.6639 | 0.2605 | 1000 | 0.7921 | 0.7162 | 0.6952 | 0.7444 | 0.7162 | |
| | | 0.5221 | 0.3907 | 1500 | 0.5066 | 0.8134 | 0.8083 | 0.8147 | 0.8134 | |
| | | 0.4415 | 0.5210 | 2000 | 0.4247 | 0.8381 | 0.8363 | 0.8410 | 0.8381 | |
| | | 0.4276 | 0.6512 | 2500 | 0.3884 | 0.8594 | 0.8486 | 0.8484 | 0.8594 | |
| | | 0.3767 | 0.7815 | 3000 | 0.3472 | 0.8756 | 0.8661 | 0.8689 | 0.8756 | |
| | | 0.3281 | 0.9117 | 3500 | 0.3463 | 0.8754 | 0.8631 | 0.8611 | 0.8754 | |
| | | 0.2419 | 1.0419 | 4000 | 0.3556 | 0.8883 | 0.8737 | 0.8728 | 0.8883 | |
| | | 0.2859 | 1.1722 | 4500 | 0.3162 | 0.8922 | 0.8859 | 0.8829 | 0.8922 | |
| | | 0.226 | 1.3024 | 5000 | 0.3269 | 0.8914 | 0.8857 | 0.8851 | 0.8914 | |
| | | 0.2378 | 1.4327 | 5500 | 0.3281 | 0.8903 | 0.8834 | 0.8881 | 0.8903 | |
| | | 0.2654 | 1.5629 | 6000 | 0.3038 | 0.8938 | 0.8862 | 0.8896 | 0.8938 | |
| | | 0.2319 | 1.6931 | 6500 | 0.3032 | 0.8993 | 0.8919 | 0.8905 | 0.8993 | |
| | | 0.2116 | 1.8234 | 7000 | 0.3013 | 0.9023 | 0.8919 | 0.8937 | 0.9023 | |
| | | 0.1922 | 1.9536 | 7500 | 0.2959 | 0.9017 | 0.8968 | 0.8941 | 0.9017 | |
| | | 0.1536 | 2.0839 | 8000 | 0.3983 | 0.9009 | 0.8986 | 0.9000 | 0.9009 | |
| | | 0.1438 | 2.2141 | 8500 | 0.3982 | 0.8990 | 0.8968 | 0.8954 | 0.8990 | |
| | | 0.1329 | 2.3444 | 9000 | 0.3809 | 0.9021 | 0.8990 | 0.8968 | 0.9021 | |
| | | 0.1175 | 2.4746 | 9500 | 0.3944 | 0.9019 | 0.8991 | 0.8977 | 0.9019 | |
| | | 0.1634 | 2.6048 | 10000 | 0.3899 | 0.9043 | 0.8999 | 0.8989 | 0.9043 | |
| | | 0.1049 | 2.7351 | 10500 | 0.4006 | 0.9037 | 0.9016 | 0.9009 | 0.9037 | |
| | | 0.1247 | 2.8653 | 11000 | 0.3828 | 0.9053 | 0.9019 | 0.9006 | 0.9053 | |
| | | 0.1511 | 2.9956 | 11500 | 0.3741 | 0.9043 | 0.9026 | 0.9022 | 0.9043 | |
| |
|
| | ## Deployment Options |
| | - **API Deployment**: Create REST API using FastAPI |
| | - **Batch Processing**: Set up automated sentiment analysis pipeline |
| | - **Real-time Analysis**: Integrate with financial data streams |
| |
|
| | ## References |
| | - [ModernBERT Paper](https://arxiv.org/abs/2412.13663) |
| | - [FinGPT Project](https://github.com/AI4Finance-Foundation/FinGPT) |
| | - [Hugging Face Transformers](https://huggingface.co/docs/transformers) |
| | - [Financial Sentiment Analysis Survey](https://arxiv.org/abs/2212.14197) |