--- library_name: transformers license: apache-2.0 base_model: answerdotai/ModernBERT-base tags: - generated_from_trainer metrics: - accuracy - f1 - precision - recall model-index: - name: modernbert_fingpt_results results: [] datasets: - FinGPT/fingpt-sentiment-train --- # ModernBERT Fine-tuned for Financial Text Sentiment Analysis This project fine-tunes the **ModernBERT** model on the **FinGPT** sentiment dataset for financial text sentiment analysis. ## Dataset & Model - **Model**: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) - **Dataset**: [FinGPT/fingpt-sentiment-train](https://huggingface.co/datasets/FinGPT/fingpt-sentiment-train) - **Task**: Multi-class sentiment classification (9 categories) - **Domain**: Financial text analysis ### ModernBert ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. It leverages architectural improvements such as Rotary Positional Embeddings (RoPE) for long-context support, Local-Global Alternating Attention for efficiency on long inputs, Unpadding and Flash Attention for efficient inference. ### FinGPT Sentiment Analysis Dataset Contains 76,772 test rows (17,919,695 tokens) ## Sentiment Categories The model classifies text into 9 fine-grained sentiment levels: | Label ID | Sentiment Category | Description | |----------|-------------------|-------------| | 0 | Strong Negative | Very pessimistic | | 1 | Moderately Negative | Somewhat pessimistic | | 2 | Mildly Negative | Slightly pessimistic | | 3 | Negative | General negative sentiment | | 4 | Neutral | No clear positive or negative bias | | 5 | Mildly Positive | Slightly optimistic | | 6 | Moderately Positive | Somewhat optimistic | | 7 | Positive | General positive sentiment | | 8 | Strong Positive | Very optimistic | ## Model Configuration ### Parameters - **Max Sequence Length**: 512 tokens - **Batch Size**: 16 - **Learning Rate**: 2e-5 with warmup - **Epochs**: 3 with early stopping - **Optimizer**: AdamW with weight decay (0.01) ### Features - **Early Stopping**: Prevents overfitting (patience=3) - **Best Model Loading**: Automatically loads best checkpoint - **Mixed Precision**: FP16 training for speed optimization - **Stratified Splitting**: 80/20 train/validation split ## Evaluation Metrics - **Accuracy**: Overall classification accuracy - **F1-Score**: Weighted F1-score across all classes - **Precision**: Weighted precision - **Recall**: Weighted recall - **Confusion Matrix**: Visual analysis of classification performance - **Classification Report**: Detailed per-class metrics ## Performance ### Training Time (on T4 GPU) - **Total Training**: ~30-45 minutes - **Per Epoch**: ~10-15 minutes - **Evaluation**: ~2-3 minutes ### Training Results (Actual) - Loss: 0.3741 - Accuracy: 0.9043 - F1: 0.9026 - Precision: 0.9022 - Recall: 0.9043 | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall | |:-------------:|:------:|:-----:|:---------------:|:--------:|:------:|:---------:|:------:| | 0.9551 | 0.1302 | 500 | 0.8504 | 0.6769 | 0.6623 | 0.6589 | 0.6769 | | 0.6639 | 0.2605 | 1000 | 0.7921 | 0.7162 | 0.6952 | 0.7444 | 0.7162 | | 0.5221 | 0.3907 | 1500 | 0.5066 | 0.8134 | 0.8083 | 0.8147 | 0.8134 | | 0.4415 | 0.5210 | 2000 | 0.4247 | 0.8381 | 0.8363 | 0.8410 | 0.8381 | | 0.4276 | 0.6512 | 2500 | 0.3884 | 0.8594 | 0.8486 | 0.8484 | 0.8594 | | 0.3767 | 0.7815 | 3000 | 0.3472 | 0.8756 | 0.8661 | 0.8689 | 0.8756 | | 0.3281 | 0.9117 | 3500 | 0.3463 | 0.8754 | 0.8631 | 0.8611 | 0.8754 | | 0.2419 | 1.0419 | 4000 | 0.3556 | 0.8883 | 0.8737 | 0.8728 | 0.8883 | | 0.2859 | 1.1722 | 4500 | 0.3162 | 0.8922 | 0.8859 | 0.8829 | 0.8922 | | 0.226 | 1.3024 | 5000 | 0.3269 | 0.8914 | 0.8857 | 0.8851 | 0.8914 | | 0.2378 | 1.4327 | 5500 | 0.3281 | 0.8903 | 0.8834 | 0.8881 | 0.8903 | | 0.2654 | 1.5629 | 6000 | 0.3038 | 0.8938 | 0.8862 | 0.8896 | 0.8938 | | 0.2319 | 1.6931 | 6500 | 0.3032 | 0.8993 | 0.8919 | 0.8905 | 0.8993 | | 0.2116 | 1.8234 | 7000 | 0.3013 | 0.9023 | 0.8919 | 0.8937 | 0.9023 | | 0.1922 | 1.9536 | 7500 | 0.2959 | 0.9017 | 0.8968 | 0.8941 | 0.9017 | | 0.1536 | 2.0839 | 8000 | 0.3983 | 0.9009 | 0.8986 | 0.9000 | 0.9009 | | 0.1438 | 2.2141 | 8500 | 0.3982 | 0.8990 | 0.8968 | 0.8954 | 0.8990 | | 0.1329 | 2.3444 | 9000 | 0.3809 | 0.9021 | 0.8990 | 0.8968 | 0.9021 | | 0.1175 | 2.4746 | 9500 | 0.3944 | 0.9019 | 0.8991 | 0.8977 | 0.9019 | | 0.1634 | 2.6048 | 10000 | 0.3899 | 0.9043 | 0.8999 | 0.8989 | 0.9043 | | 0.1049 | 2.7351 | 10500 | 0.4006 | 0.9037 | 0.9016 | 0.9009 | 0.9037 | | 0.1247 | 2.8653 | 11000 | 0.3828 | 0.9053 | 0.9019 | 0.9006 | 0.9053 | | 0.1511 | 2.9956 | 11500 | 0.3741 | 0.9043 | 0.9026 | 0.9022 | 0.9043 | ## Deployment Options - **API Deployment**: Create REST API using FastAPI - **Batch Processing**: Set up automated sentiment analysis pipeline - **Real-time Analysis**: Integrate with financial data streams ## References - [ModernBERT Paper](https://arxiv.org/abs/2412.13663) - [FinGPT Project](https://github.com/AI4Finance-Foundation/FinGPT) - [Hugging Face Transformers](https://huggingface.co/docs/transformers) - [Financial Sentiment Analysis Survey](https://arxiv.org/abs/2212.14197)