Update README.md

514217b verified 7 months ago

5.82 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: answerdotai/ModernBERT-base
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	model-index:
	- name: modernbert_fingpt_results
	results: []
	datasets:
	- FinGPT/fingpt-sentiment-train
	---

	# ModernBERT Fine-tuned for Financial Text Sentiment Analysis

	This project fine-tunes the ModernBERT model on the FinGPT sentiment dataset for financial text sentiment analysis.

	## Dataset & Model

	- Model: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
	- Dataset: [FinGPT/fingpt-sentiment-train](https://huggingface.co/datasets/FinGPT/fingpt-sentiment-train)
	- Task: Multi-class sentiment classification (9 categories)
	- Domain: Financial text analysis

	### ModernBert
	ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens.
	It leverages architectural improvements such as Rotary Positional Embeddings (RoPE) for long-context support, Local-Global Alternating Attention for efficiency on long inputs, Unpadding and Flash Attention for efficient inference.

	### FinGPT Sentiment Analysis Dataset
	Contains 76,772 test rows (17,919,695 tokens)

	## Sentiment Categories

	The model classifies text into 9 fine-grained sentiment levels:

	\| Label ID \| Sentiment Category \| Description \|
	\|----------\|-------------------\|-------------\|
	\| 0 \| Strong Negative \| Very pessimistic \|
	\| 1 \| Moderately Negative \| Somewhat pessimistic \|
	\| 2 \| Mildly Negative \| Slightly pessimistic \|
	\| 3 \| Negative \| General negative sentiment \|
	\| 4 \| Neutral \| No clear positive or negative bias \|
	\| 5 \| Mildly Positive \| Slightly optimistic \|
	\| 6 \| Moderately Positive \| Somewhat optimistic \|
	\| 7 \| Positive \| General positive sentiment \|
	\| 8 \| Strong Positive \| Very optimistic \|

	## Model Configuration

	### Parameters
	- Max Sequence Length: 512 tokens
	- Batch Size: 16
	- Learning Rate: 2e-5 with warmup
	- Epochs: 3 with early stopping
	- Optimizer: AdamW with weight decay (0.01)

	### Features
	- Early Stopping: Prevents overfitting (patience=3)
	- Best Model Loading: Automatically loads best checkpoint
	- Mixed Precision: FP16 training for speed optimization
	- Stratified Splitting: 80/20 train/validation split

	## Evaluation Metrics
	- Accuracy: Overall classification accuracy
	- F1-Score: Weighted F1-score across all classes
	- Precision: Weighted precision
	- Recall: Weighted recall
	- Confusion Matrix: Visual analysis of classification performance
	- Classification Report: Detailed per-class metrics

	## Performance

	### Training Time (on T4 GPU)
	- Total Training: ~30-45 minutes
	- Per Epoch: ~10-15 minutes
	- Evaluation: ~2-3 minutes

	### Training Results (Actual)

	- Loss: 0.3741
	- Accuracy: 0.9043
	- F1: 0.9026
	- Precision: 0.9022
	- Recall: 0.9043


	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| F1 \| Precision \| Recall \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|:--------:\|:------:\|:---------:\|:------:\|
	\| 0.9551 \| 0.1302 \| 500 \| 0.8504 \| 0.6769 \| 0.6623 \| 0.6589 \| 0.6769 \|
	\| 0.6639 \| 0.2605 \| 1000 \| 0.7921 \| 0.7162 \| 0.6952 \| 0.7444 \| 0.7162 \|
	\| 0.5221 \| 0.3907 \| 1500 \| 0.5066 \| 0.8134 \| 0.8083 \| 0.8147 \| 0.8134 \|
	\| 0.4415 \| 0.5210 \| 2000 \| 0.4247 \| 0.8381 \| 0.8363 \| 0.8410 \| 0.8381 \|
	\| 0.4276 \| 0.6512 \| 2500 \| 0.3884 \| 0.8594 \| 0.8486 \| 0.8484 \| 0.8594 \|
	\| 0.3767 \| 0.7815 \| 3000 \| 0.3472 \| 0.8756 \| 0.8661 \| 0.8689 \| 0.8756 \|
	\| 0.3281 \| 0.9117 \| 3500 \| 0.3463 \| 0.8754 \| 0.8631 \| 0.8611 \| 0.8754 \|
	\| 0.2419 \| 1.0419 \| 4000 \| 0.3556 \| 0.8883 \| 0.8737 \| 0.8728 \| 0.8883 \|
	\| 0.2859 \| 1.1722 \| 4500 \| 0.3162 \| 0.8922 \| 0.8859 \| 0.8829 \| 0.8922 \|
	\| 0.226 \| 1.3024 \| 5000 \| 0.3269 \| 0.8914 \| 0.8857 \| 0.8851 \| 0.8914 \|
	\| 0.2378 \| 1.4327 \| 5500 \| 0.3281 \| 0.8903 \| 0.8834 \| 0.8881 \| 0.8903 \|
	\| 0.2654 \| 1.5629 \| 6000 \| 0.3038 \| 0.8938 \| 0.8862 \| 0.8896 \| 0.8938 \|
	\| 0.2319 \| 1.6931 \| 6500 \| 0.3032 \| 0.8993 \| 0.8919 \| 0.8905 \| 0.8993 \|
	\| 0.2116 \| 1.8234 \| 7000 \| 0.3013 \| 0.9023 \| 0.8919 \| 0.8937 \| 0.9023 \|
	\| 0.1922 \| 1.9536 \| 7500 \| 0.2959 \| 0.9017 \| 0.8968 \| 0.8941 \| 0.9017 \|
	\| 0.1536 \| 2.0839 \| 8000 \| 0.3983 \| 0.9009 \| 0.8986 \| 0.9000 \| 0.9009 \|
	\| 0.1438 \| 2.2141 \| 8500 \| 0.3982 \| 0.8990 \| 0.8968 \| 0.8954 \| 0.8990 \|
	\| 0.1329 \| 2.3444 \| 9000 \| 0.3809 \| 0.9021 \| 0.8990 \| 0.8968 \| 0.9021 \|
	\| 0.1175 \| 2.4746 \| 9500 \| 0.3944 \| 0.9019 \| 0.8991 \| 0.8977 \| 0.9019 \|
	\| 0.1634 \| 2.6048 \| 10000 \| 0.3899 \| 0.9043 \| 0.8999 \| 0.8989 \| 0.9043 \|
	\| 0.1049 \| 2.7351 \| 10500 \| 0.4006 \| 0.9037 \| 0.9016 \| 0.9009 \| 0.9037 \|
	\| 0.1247 \| 2.8653 \| 11000 \| 0.3828 \| 0.9053 \| 0.9019 \| 0.9006 \| 0.9053 \|
	\| 0.1511 \| 2.9956 \| 11500 \| 0.3741 \| 0.9043 \| 0.9026 \| 0.9022 \| 0.9043 \|

	## Deployment Options
	- API Deployment: Create REST API using FastAPI
	- Batch Processing: Set up automated sentiment analysis pipeline
	- Real-time Analysis: Integrate with financial data streams

	## References
	- [ModernBERT Paper](https://arxiv.org/abs/2412.13663)
	- [FinGPT Project](https://github.com/AI4Finance-Foundation/FinGPT)
	- [Hugging Face Transformers](https://huggingface.co/docs/transformers)
	- [Financial Sentiment Analysis Survey](https://arxiv.org/abs/2212.14197)

	---
	library_name: transformers
	license: apache-2.0
	base_model: answerdotai/ModernBERT-base
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	model-index:
	- name: modernbert_fingpt_results
	results: []
	datasets:
	- FinGPT/fingpt-sentiment-train
	---

	# ModernBERT Fine-tuned for Financial Text Sentiment Analysis

	This project fine-tunes the ModernBERT model on the FinGPT sentiment dataset for financial text sentiment analysis.

	## Dataset & Model

	- Model: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
	- Dataset: [FinGPT/fingpt-sentiment-train](https://huggingface.co/datasets/FinGPT/fingpt-sentiment-train)
	- Task: Multi-class sentiment classification (9 categories)
	- Domain: Financial text analysis

	### ModernBert
	ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens.
	It leverages architectural improvements such as Rotary Positional Embeddings (RoPE) for long-context support, Local-Global Alternating Attention for efficiency on long inputs, Unpadding and Flash Attention for efficient inference.

	### FinGPT Sentiment Analysis Dataset
	Contains 76,772 test rows (17,919,695 tokens)

	## Sentiment Categories

	The model classifies text into 9 fine-grained sentiment levels:

	\| Label ID \| Sentiment Category \| Description \|
	\|----------\|-------------------\|-------------\|
	\| 0 \| Strong Negative \| Very pessimistic \|
	\| 1 \| Moderately Negative \| Somewhat pessimistic \|
	\| 2 \| Mildly Negative \| Slightly pessimistic \|
	\| 3 \| Negative \| General negative sentiment \|
	\| 4 \| Neutral \| No clear positive or negative bias \|
	\| 5 \| Mildly Positive \| Slightly optimistic \|
	\| 6 \| Moderately Positive \| Somewhat optimistic \|
	\| 7 \| Positive \| General positive sentiment \|
	\| 8 \| Strong Positive \| Very optimistic \|

	## Model Configuration

	### Parameters
	- Max Sequence Length: 512 tokens
	- Batch Size: 16
	- Learning Rate: 2e-5 with warmup
	- Epochs: 3 with early stopping
	- Optimizer: AdamW with weight decay (0.01)

	### Features
	- Early Stopping: Prevents overfitting (patience=3)
	- Best Model Loading: Automatically loads best checkpoint
	- Mixed Precision: FP16 training for speed optimization
	- Stratified Splitting: 80/20 train/validation split

	## Evaluation Metrics
	- Accuracy: Overall classification accuracy
	- F1-Score: Weighted F1-score across all classes
	- Precision: Weighted precision
	- Recall: Weighted recall
	- Confusion Matrix: Visual analysis of classification performance
	- Classification Report: Detailed per-class metrics

	## Performance

	### Training Time (on T4 GPU)
	- Total Training: ~30-45 minutes
	- Per Epoch: ~10-15 minutes
	- Evaluation: ~2-3 minutes

	### Training Results (Actual)

	- Loss: 0.3741
	- Accuracy: 0.9043
	- F1: 0.9026
	- Precision: 0.9022
	- Recall: 0.9043


	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| F1 \| Precision \| Recall \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|:--------:\|:------:\|:---------:\|:------:\|
	\| 0.9551 \| 0.1302 \| 500 \| 0.8504 \| 0.6769 \| 0.6623 \| 0.6589 \| 0.6769 \|
	\| 0.6639 \| 0.2605 \| 1000 \| 0.7921 \| 0.7162 \| 0.6952 \| 0.7444 \| 0.7162 \|
	\| 0.5221 \| 0.3907 \| 1500 \| 0.5066 \| 0.8134 \| 0.8083 \| 0.8147 \| 0.8134 \|
	\| 0.4415 \| 0.5210 \| 2000 \| 0.4247 \| 0.8381 \| 0.8363 \| 0.8410 \| 0.8381 \|
	\| 0.4276 \| 0.6512 \| 2500 \| 0.3884 \| 0.8594 \| 0.8486 \| 0.8484 \| 0.8594 \|
	\| 0.3767 \| 0.7815 \| 3000 \| 0.3472 \| 0.8756 \| 0.8661 \| 0.8689 \| 0.8756 \|
	\| 0.3281 \| 0.9117 \| 3500 \| 0.3463 \| 0.8754 \| 0.8631 \| 0.8611 \| 0.8754 \|
	\| 0.2419 \| 1.0419 \| 4000 \| 0.3556 \| 0.8883 \| 0.8737 \| 0.8728 \| 0.8883 \|
	\| 0.2859 \| 1.1722 \| 4500 \| 0.3162 \| 0.8922 \| 0.8859 \| 0.8829 \| 0.8922 \|
	\| 0.226 \| 1.3024 \| 5000 \| 0.3269 \| 0.8914 \| 0.8857 \| 0.8851 \| 0.8914 \|
	\| 0.2378 \| 1.4327 \| 5500 \| 0.3281 \| 0.8903 \| 0.8834 \| 0.8881 \| 0.8903 \|
	\| 0.2654 \| 1.5629 \| 6000 \| 0.3038 \| 0.8938 \| 0.8862 \| 0.8896 \| 0.8938 \|
	\| 0.2319 \| 1.6931 \| 6500 \| 0.3032 \| 0.8993 \| 0.8919 \| 0.8905 \| 0.8993 \|
	\| 0.2116 \| 1.8234 \| 7000 \| 0.3013 \| 0.9023 \| 0.8919 \| 0.8937 \| 0.9023 \|
	\| 0.1922 \| 1.9536 \| 7500 \| 0.2959 \| 0.9017 \| 0.8968 \| 0.8941 \| 0.9017 \|
	\| 0.1536 \| 2.0839 \| 8000 \| 0.3983 \| 0.9009 \| 0.8986 \| 0.9000 \| 0.9009 \|
	\| 0.1438 \| 2.2141 \| 8500 \| 0.3982 \| 0.8990 \| 0.8968 \| 0.8954 \| 0.8990 \|
	\| 0.1329 \| 2.3444 \| 9000 \| 0.3809 \| 0.9021 \| 0.8990 \| 0.8968 \| 0.9021 \|
	\| 0.1175 \| 2.4746 \| 9500 \| 0.3944 \| 0.9019 \| 0.8991 \| 0.8977 \| 0.9019 \|
	\| 0.1634 \| 2.6048 \| 10000 \| 0.3899 \| 0.9043 \| 0.8999 \| 0.8989 \| 0.9043 \|
	\| 0.1049 \| 2.7351 \| 10500 \| 0.4006 \| 0.9037 \| 0.9016 \| 0.9009 \| 0.9037 \|
	\| 0.1247 \| 2.8653 \| 11000 \| 0.3828 \| 0.9053 \| 0.9019 \| 0.9006 \| 0.9053 \|
	\| 0.1511 \| 2.9956 \| 11500 \| 0.3741 \| 0.9043 \| 0.9026 \| 0.9022 \| 0.9043 \|

	## Deployment Options
	- API Deployment: Create REST API using FastAPI
	- Batch Processing: Set up automated sentiment analysis pipeline
	- Real-time Analysis: Integrate with financial data streams

	## References
	- [ModernBERT Paper](https://arxiv.org/abs/2412.13663)
	- [FinGPT Project](https://github.com/AI4Finance-Foundation/FinGPT)
	- [Hugging Face Transformers](https://huggingface.co/docs/transformers)
	- [Financial Sentiment Analysis Survey](https://arxiv.org/abs/2212.14197)