Hananguyen12 commited on
Commit
22e5ad6
·
verified ·
1 Parent(s): 9ca6636

Updated LAPEFT model with proper 3-class configuration

Browse files
Files changed (2) hide show
  1. README.md +47 -197
  2. config.json +9 -10
README.md CHANGED
@@ -6,55 +6,37 @@ tags:
6
  - financial-nlp
7
  - lora
8
  - peft
9
- - lexicon-augmented
10
  - bert
11
- - gated-fusion
12
  language:
13
  - en
 
 
14
  datasets:
15
  - financial-phrasebank
16
- metrics:
17
- - accuracy
18
- - f1
19
- library_name: transformers
20
- pipeline_tag: text-classification
21
  widget:
22
- - text: "The company reported excellent quarterly results with revenue growth exceeding expectations."
23
  example_title: "Positive Financial News"
24
- - text: "The stock price remains stable with no significant market movements expected."
25
- example_title: "Neutral Market Update"
26
- - text: "The company faces potential bankruptcy due to mounting debt and declining sales."
27
  example_title: "Negative Financial Outlook"
28
  ---
29
 
30
- # 🏦 LAPEFT: Lexicon-Augmented PEFT for Financial Sentiment Analysis
31
-
32
- A state-of-the-art financial sentiment analysis model that combines **BERT-base-uncased** with **LoRA (Low-Rank Adaptation)** and **lexicon-based features** using a novel **gated fusion mechanism**.
33
 
34
- ## 🌟 Key Features
35
 
36
- - 🎯 **High Accuracy**: 85-90% on financial sentiment tasks
37
- - ⚡ **Efficient**: 98% parameter reduction via LoRA (only 2-3M trainable parameters)
38
- - 🧠 **Smart Fusion**: Novel gated mechanism combining transformer + lexicon features
39
- - 💼 **Financial Domain**: Enhanced with Loughran-McDonald financial dictionary
40
- - 🚀 **Production Ready**: Optimized for real-world deployment
41
 
42
- ## 🏗️ Architecture
 
 
 
 
43
 
44
- ### Gated Fusion Innovation
45
- ```
46
- Input Text → BERT Encoder → Transformer Features ↘
47
- → Gated Fusion → Classification
48
- Input Text → Lexicon Analyzer → Lexicon Features ↗
49
- ```
50
-
51
- The model intelligently learns when to trust:
52
- - **Transformer features**: For complex contextual understanding
53
- - **Lexicon features**: For domain-specific financial sentiment
54
 
55
- ## 🚀 Quick Start
56
-
57
- ### Option 1: Simple Pipeline (Recommended)
58
 
59
  ```python
60
  from transformers import pipeline
@@ -66,198 +48,66 @@ classifier = pipeline(
66
  )
67
 
68
  # Analyze sentiment
69
- text = "The company reported strong quarterly earnings with revenue exceeding expectations."
70
  result = classifier(text)
71
  print(result)
72
  # Output: [{'label': 'POSITIVE', 'score': 0.9234}]
73
  ```
74
 
75
- ### Option 2: Advanced Usage with LoRA
76
 
77
  ```python
78
  from transformers import BertTokenizer, BertForSequenceClassification
79
  from peft import PeftModel
80
- import torch
81
 
82
  # Load model components
83
- base_model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)
 
 
 
84
  model = PeftModel.from_pretrained(base_model, "Hananguyen12/LAPEFT-Financial-Sentiment-Analysis")
85
- tokenizer = BertTokenizer.from_pretrained("Hananguyen12/LAPEFT-Financial-Sentiment-Analysis")
86
-
87
- def predict_sentiment(text):
88
- inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
89
-
90
- with torch.no_grad():
91
- outputs = model(**inputs)
92
- probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
93
- predicted_class = torch.argmax(probs, dim=-1)
94
-
95
- labels = ["Negative", "Neutral", "Positive"]
96
- return {
97
- "sentiment": labels[predicted_class],
98
- "confidence": probs.max().item(),
99
- "all_scores": {
100
- "negative": probs[0][0].item(),
101
- "neutral": probs[0][1].item(),
102
- "positive": probs[0][2].item()
103
- }
104
- }
105
-
106
- # Example
107
- result = predict_sentiment("The company's financial outlook appears promising.")
108
- print(result)
109
- ```
110
-
111
- ### Option 3: Batch Processing
112
-
113
- ```python
114
- def analyze_financial_texts(texts):
115
- classifier = pipeline("text-classification", model="Hananguyen12/LAPEFT-Financial-Sentiment-Analysis")
116
- results = classifier(texts)
117
-
118
- # Format results
119
- formatted_results = []
120
- for text, result in zip(texts, results):
121
- formatted_results.append({
122
- "text": text,
123
- "sentiment": result['label'],
124
- "confidence": result['score']
125
- })
126
-
127
- return formatted_results
128
-
129
- # Example usage
130
- financial_news = [
131
- "Revenue growth exceeded analyst expectations this quarter.",
132
- "The company maintains a stable market position despite challenges.",
133
- "Declining sales have raised serious concerns about future profitability."
134
- ]
135
-
136
- results = analyze_financial_texts(financial_news)
137
- for r in results:
138
- print(f"Text: {r['text']}")
139
- print(f"Sentiment: {r['sentiment']} (Confidence: {r['confidence']:.3f})")
140
- print("-" * 50)
141
- ```
142
-
143
- ## 📊 Performance Metrics
144
-
145
- | Metric | Score |
146
- |--------|--------|
147
- | **Overall Accuracy** | **87.5%** |
148
- | **Macro F1-Score** | **0.864** |
149
- | **Weighted F1-Score** | **0.875** |
150
- | Negative F1 | 0.842 |
151
- | Neutral F1 | 0.891 |
152
- | Positive F1 | 0.859 |
153
- | **Parameters** | **2.3M** (98% reduction) |
154
-
155
- ## 🎯 Use Cases
156
-
157
- ### 📈 Financial Applications
158
- - **News Sentiment**: Analyze financial news and press releases
159
- - **Social Media**: Monitor Twitter/Reddit for market sentiment
160
- - **Research Reports**: Process analyst reports and earnings calls
161
- - **Risk Management**: Early warning system for negative sentiment
162
 
163
- ### 🔧 Integration Examples
164
- - **Trading Algorithms**: Sentiment as a trading signal
165
- - **Portfolio Management**: Monitor holdings sentiment
166
- - **Market Research**: Sector and company sentiment trends
167
- - **Compliance**: Screen communications for risk
168
 
169
- ## 💡 Real-World Examples
 
 
 
170
 
171
- ```python
172
- # Financial news analysis
173
- news_texts = [
174
- "Apple Inc. reported record iPhone sales in Q4 2024",
175
- "Tesla stock volatile amid production concerns",
176
- "Microsoft Azure revenue growth slows but remains profitable",
177
- "Amazon faces regulatory scrutiny over market practices"
178
- ]
179
-
180
- classifier = pipeline("text-classification", model="Hananguyen12/LAPEFT-Financial-Sentiment-Analysis")
181
- for text in news_texts:
182
- result = classifier(text)[0]
183
- print(f"📰 {text}")
184
- print(f"💭 Sentiment: {result['label']} ({result['score']:.2%} confidence)")
185
- print()
186
  ```
187
 
188
- ## 🔬 Technical Details
189
-
190
- ### Model Architecture
191
- - **Base**: BERT-base-uncased (768 hidden dimensions)
192
- - **LoRA Config**: rank=16, alpha=32, dropout=0.1
193
- - **Target Modules**: query, key, value, dense layers
194
- - **Fusion**: Gated mechanism with learnable weights
195
- - **Lexicon**: VADER + Loughran-McDonald financial dictionary
196
-
197
- ### Training Setup
198
- - **Dataset**: Financial PhraseBank + custom financial datasets
199
- - **Samples**: 3,000+ annotated financial sentences
200
- - **Hardware**: GPU-optimized with mixed precision (FP16)
201
- - **Optimization**: AdamW with cosine learning rate schedule
202
- - **Memory**: Gradient checkpointing + optimized batch processing
203
-
204
- ### Performance Optimizations
205
- - ⚡ **Fast Inference**: LoRA enables quick predictions
206
- - 💾 **Low Memory**: Efficient architecture for deployment
207
- - 🌐 **Scalable**: Ready for production environments
208
- - 📱 **Edge Compatible**: Lightweight enough for mobile/edge
209
 
210
- ## 📚 Model Comparison
 
 
211
 
212
- | Model | Accuracy | F1-Score | Trainable Params | Training Time |
213
- |-------|----------|----------|------------------|---------------|
214
- | BERT-base-uncased | 82.1% | 0.798 | 110M | 4+ hours |
215
- | FinBERT | 84.3% | 0.831 | 110M | 3+ hours |
216
- | **LAPEFT (Ours)** | **87.5%** | **0.864** | **2.3M** | **45 mins** |
217
 
218
- ## 🚨 Important Notes
 
 
 
219
 
220
- ### Best Performance Tips
221
- - **Text Length**: Optimal for 50-500 words
222
- - **Domain**: Designed specifically for financial content
223
- - **Language**: Optimized for English text
224
- - **Context**: Works best with complete sentences
225
 
226
- ### Limitations
227
- - Primarily trained on English financial text
228
  - Performance may vary on non-financial content
229
- - Requires tokenization for optimal results
230
 
231
- ## 🤝 Contributing
232
-
233
- We welcome contributions! Areas where you can help:
234
- - Testing on new financial datasets
235
- - Improving documentation and examples
236
- - Adding support for other languages
237
- - Performance optimizations
238
-
239
- ## 📖 Citation
240
 
241
  ```bibtex
242
  @misc{lapeft_financial_sentiment_2025,
243
- title={LAPEFT: Lexicon-Augmented PEFT for Financial Sentiment Analysis},
244
  author={Hananguyen12},
245
  year={2025},
246
- publisher={Hugging Face Hub},
247
  url={https://huggingface.co/Hananguyen12/LAPEFT-Financial-Sentiment-Analysis}
248
  }
249
  ```
250
-
251
- ## 📄 License
252
-
253
- Apache 2.0 License - See LICENSE file for details.
254
-
255
- ## 🎉 Acknowledgments
256
-
257
- - Hugging Face team for the transformers library
258
- - Microsoft for the LoRA technique
259
- - Loughran-McDonald for the financial sentiment lexicon
260
-
261
- ---
262
-
263
- *Built with ❤️ for the financial NLP community*
 
6
  - financial-nlp
7
  - lora
8
  - peft
 
9
  - bert
 
10
  language:
11
  - en
12
+ pipeline_tag: text-classification
13
+ library_name: transformers
14
  datasets:
15
  - financial-phrasebank
 
 
 
 
 
16
  widget:
17
+ - text: "The company reported excellent quarterly results with strong revenue growth."
18
  example_title: "Positive Financial News"
19
+ - text: "Market conditions remain stable with no significant changes expected."
20
+ example_title: "Neutral Market Update"
21
+ - text: "The company faces potential bankruptcy due to mounting debt."
22
  example_title: "Negative Financial Outlook"
23
  ---
24
 
25
+ # 🏦 LAPEFT: Financial Sentiment Analysis
 
 
26
 
27
+ A fine-tuned BERT model with LoRA for financial sentiment analysis. This model classifies financial text into three categories: Negative, Neutral, and Positive.
28
 
29
+ ## Model Details
 
 
 
 
30
 
31
+ - **Base Model**: bert-base-uncased
32
+ - **Fine-tuning**: LoRA (Low-Rank Adaptation)
33
+ - **Classes**: 3 (Negative, Neutral, Positive)
34
+ - **Domain**: Financial text analysis
35
+ - **Language**: English
36
 
37
+ ## Usage
 
 
 
 
 
 
 
 
 
38
 
39
+ ### Quick Start with Pipeline
 
 
40
 
41
  ```python
42
  from transformers import pipeline
 
48
  )
49
 
50
  # Analyze sentiment
51
+ text = "The company reported strong quarterly earnings."
52
  result = classifier(text)
53
  print(result)
54
  # Output: [{'label': 'POSITIVE', 'score': 0.9234}]
55
  ```
56
 
57
+ ### Advanced Usage
58
 
59
  ```python
60
  from transformers import BertTokenizer, BertForSequenceClassification
61
  from peft import PeftModel
 
62
 
63
  # Load model components
64
+ base_model = BertForSequenceClassification.from_pretrained(
65
+ "bert-base-uncased",
66
+ num_labels=3
67
+ )
68
  model = PeftModel.from_pretrained(base_model, "Hananguyen12/LAPEFT-Financial-Sentiment-Analysis")
69
+ tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
+ # Inference
72
+ text = "The quarterly results exceeded expectations."
73
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
 
 
74
 
75
+ with torch.no_grad():
76
+ outputs = model(**inputs)
77
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
78
+ predicted_class = torch.argmax(predictions, dim=-1)
79
 
80
+ labels = ["NEGATIVE", "NEUTRAL", "POSITIVE"]
81
+ print(f"Predicted: {labels[predicted_class]}")
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  ```
83
 
84
+ ## Model Performance
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
+ - Optimized for financial text analysis
87
+ - Efficient LoRA fine-tuning approach
88
+ - Suitable for real-time sentiment analysis
89
 
90
+ ## Use Cases
 
 
 
 
91
 
92
+ - Financial news sentiment analysis
93
+ - Social media monitoring for financial content
94
+ - Investment research and analysis
95
+ - Risk assessment based on sentiment
96
 
97
+ ## Limitations
 
 
 
 
98
 
99
+ - Trained primarily on English financial text
 
100
  - Performance may vary on non-financial content
101
+ - Best suited for sentences and short paragraphs
102
 
103
+ ## Citation
 
 
 
 
 
 
 
 
104
 
105
  ```bibtex
106
  @misc{lapeft_financial_sentiment_2025,
107
+ title={LAPEFT: Financial Sentiment Analysis with LoRA},
108
  author={Hananguyen12},
109
  year={2025},
110
+ publisher={Hugging Face},
111
  url={https://huggingface.co/Hananguyen12/LAPEFT-Financial-Sentiment-Analysis}
112
  }
113
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -15,23 +15,22 @@
15
  "model_type": "bert",
16
  "num_attention_heads": 12,
17
  "num_hidden_layers": 12,
18
- "num_labels": 3,
19
  "pad_token_id": 0,
20
  "position_embedding_type": "absolute",
21
- "problem_type": "single_label_classification",
22
- "torch_dtype": "float32",
23
- "transformers_version": "4.36.0",
24
  "type_vocab_size": 2,
25
  "use_cache": true,
26
  "vocab_size": 30522,
 
 
27
  "id2label": {
28
- "0": "negative",
29
- "1": "neutral",
30
- "2": "positive"
31
  },
32
  "label2id": {
33
- "negative": 0,
34
- "neutral": 1,
35
- "positive": 2
36
  }
37
  }
 
15
  "model_type": "bert",
16
  "num_attention_heads": 12,
17
  "num_hidden_layers": 12,
 
18
  "pad_token_id": 0,
19
  "position_embedding_type": "absolute",
20
+ "transformers_version": "4.30.0",
 
 
21
  "type_vocab_size": 2,
22
  "use_cache": true,
23
  "vocab_size": 30522,
24
+ "num_labels": 3,
25
+ "problem_type": "single_label_classification",
26
  "id2label": {
27
+ "0": "NEGATIVE",
28
+ "1": "NEUTRAL",
29
+ "2": "POSITIVE"
30
  },
31
  "label2id": {
32
+ "NEGATIVE": 0,
33
+ "NEUTRAL": 1,
34
+ "POSITIVE": 2
35
  }
36
  }