Hananguyen12
/

LAPEFT-Financial-Sentiment-Analysis

@@ -6,55 +6,37 @@ tags:
 - financial-nlp
 - lora
 - peft
-- lexicon-augmented
 - bert
-- gated-fusion
 language:
 - en
 datasets:
 - financial-phrasebank
-metrics:
-- accuracy
-- f1
-library_name: transformers
-pipeline_tag: text-classification
 widget:
-- text: "The company reported excellent quarterly results with revenue growth exceeding expectations."
   example_title: "Positive Financial News"
-- text: "The stock price remains stable with no significant market movements expected."
-  example_title: "Neutral Market Update"
-- text: "The company faces potential bankruptcy due to mounting debt and declining sales."
   example_title: "Negative Financial Outlook"
 ---
-# 🏦 LAPEFT: Lexicon-Augmented PEFT for Financial Sentiment Analysis
-A state-of-the-art financial sentiment analysis model that combines **BERT-base-uncased** with **LoRA (Low-Rank Adaptation)** and **lexicon-based features** using a novel **gated fusion mechanism**.
-## 🌟 Key Features
-- 🎯 **High Accuracy**: 85-90% on financial sentiment tasks
-- ⚡ **Efficient**: 98% parameter reduction via LoRA (only 2-3M trainable parameters)
-- 🧠 **Smart Fusion**: Novel gated mechanism combining transformer + lexicon features
-- 💼 **Financial Domain**: Enhanced with Loughran-McDonald financial dictionary
-- 🚀 **Production Ready**: Optimized for real-world deployment
-## 🏗️ Architecture
-### Gated Fusion Innovation
-```
-Input Text → BERT Encoder → Transformer Features ↘
-                                                   → Gated Fusion → Classification
-Input Text → Lexicon Analyzer → Lexicon Features ↗
-```
-The model intelligently learns when to trust:
-- **Transformer features**: For complex contextual understanding
-- **Lexicon features**: For domain-specific financial sentiment
-## 🚀 Quick Start
-### Option 1: Simple Pipeline (Recommended)
 ```python
 from transformers import pipeline
@@ -66,198 +48,66 @@ classifier = pipeline(
 )
 # Analyze sentiment
-text = "The company reported strong quarterly earnings with revenue exceeding expectations."
 result = classifier(text)
 print(result)
 # Output: [{'label': 'POSITIVE', 'score': 0.9234}]
 ```
-### Option 2: Advanced Usage with LoRA
 ```python
 from transformers import BertTokenizer, BertForSequenceClassification
 from peft import PeftModel
-import torch
 # Load model components
-base_model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)
 model = PeftModel.from_pretrained(base_model, "Hananguyen12/LAPEFT-Financial-Sentiment-Analysis")
-tokenizer = BertTokenizer.from_pretrained("Hananguyen12/LAPEFT-Financial-Sentiment-Analysis")
-def predict_sentiment(text):
-    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
-    with torch.no_grad():
-        outputs = model(**inputs)
-        probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
-        predicted_class = torch.argmax(probs, dim=-1)
-    labels = ["Negative", "Neutral", "Positive"]
-    return {
-        "sentiment": labels[predicted_class],
-        "confidence": probs.max().item(),
-        "all_scores": {
-            "negative": probs[0][0].item(),
-            "neutral": probs[0][1].item(),
-            "positive": probs[0][2].item()
-        }
-    }
-# Example
-result = predict_sentiment("The company's financial outlook appears promising.")
-print(result)
-```
-### Option 3: Batch Processing
-```python
-def analyze_financial_texts(texts):
-    classifier = pipeline("text-classification", model="Hananguyen12/LAPEFT-Financial-Sentiment-Analysis")
-    results = classifier(texts)
-    # Format results
-    formatted_results = []
-    for text, result in zip(texts, results):
-        formatted_results.append({
-            "text": text,
-            "sentiment": result['label'],
-            "confidence": result['score']
-        })
-    return formatted_results
-# Example usage
-financial_news = [
-    "Revenue growth exceeded analyst expectations this quarter.",
-    "The company maintains a stable market position despite challenges.",
-    "Declining sales have raised serious concerns about future profitability."
-]
-results = analyze_financial_texts(financial_news)
-for r in results:
-    print(f"Text: {r['text']}")
-    print(f"Sentiment: {r['sentiment']} (Confidence: {r['confidence']:.3f})")
-    print("-" * 50)
-```
-## 📊 Performance Metrics
-| Metric | Score |
-|--------|--------|
-| **Overall Accuracy** | **87.5%** |
-| **Macro F1-Score** | **0.864** |
-| **Weighted F1-Score** | **0.875** |
-| Negative F1 | 0.842 |
-| Neutral F1 | 0.891 |
-| Positive F1 | 0.859 |
-| **Parameters** | **2.3M** (98% reduction) |
-## 🎯 Use Cases
-### 📈 Financial Applications
-- **News Sentiment**: Analyze financial news and press releases
-- **Social Media**: Monitor Twitter/Reddit for market sentiment
-- **Research Reports**: Process analyst reports and earnings calls
-- **Risk Management**: Early warning system for negative sentiment
-### 🔧 Integration Examples
-- **Trading Algorithms**: Sentiment as a trading signal
-- **Portfolio Management**: Monitor holdings sentiment
-- **Market Research**: Sector and company sentiment trends
-- **Compliance**: Screen communications for risk
-## 💡 Real-World Examples
-```python
-# Financial news analysis
-news_texts = [
-    "Apple Inc. reported record iPhone sales in Q4 2024",
-    "Tesla stock volatile amid production concerns",
-    "Microsoft Azure revenue growth slows but remains profitable",
-    "Amazon faces regulatory scrutiny over market practices"
-]
-classifier = pipeline("text-classification", model="Hananguyen12/LAPEFT-Financial-Sentiment-Analysis")
-for text in news_texts:
-    result = classifier(text)[0]
-    print(f"📰 {text}")
-    print(f"💭 Sentiment: {result['label']} ({result['score']:.2%} confidence)")
-    print()
 ```
-## 🔬 Technical Details
-### Model Architecture
-- **Base**: BERT-base-uncased (768 hidden dimensions)
-- **LoRA Config**: rank=16, alpha=32, dropout=0.1
-- **Target Modules**: query, key, value, dense layers
-- **Fusion**: Gated mechanism with learnable weights
-- **Lexicon**: VADER + Loughran-McDonald financial dictionary
-### Training Setup
-- **Dataset**: Financial PhraseBank + custom financial datasets
-- **Samples**: 3,000+ annotated financial sentences
-- **Hardware**: GPU-optimized with mixed precision (FP16)
-- **Optimization**: AdamW with cosine learning rate schedule
-- **Memory**: Gradient checkpointing + optimized batch processing
-### Performance Optimizations
-- ⚡ **Fast Inference**: LoRA enables quick predictions
-- 💾 **Low Memory**: Efficient architecture for deployment
-- 🌐 **Scalable**: Ready for production environments
-- 📱 **Edge Compatible**: Lightweight enough for mobile/edge
-## 📚 Model Comparison
-| Model | Accuracy | F1-Score | Trainable Params | Training Time |
-|-------|----------|----------|------------------|---------------|
-| BERT-base-uncased | 82.1% | 0.798 | 110M | 4+ hours |
-| FinBERT | 84.3% | 0.831 | 110M | 3+ hours |
-| **LAPEFT (Ours)** | **87.5%** | **0.864** | **2.3M** | **45 mins** |
-## 🚨 Important Notes
-### Best Performance Tips
-- **Text Length**: Optimal for 50-500 words
-- **Domain**: Designed specifically for financial content
-- **Language**: Optimized for English text
-- **Context**: Works best with complete sentences
-### Limitations
-- Primarily trained on English financial text
 - Performance may vary on non-financial content
-- Requires tokenization for optimal results
-## 🤝 Contributing
-We welcome contributions! Areas where you can help:
-- Testing on new financial datasets
-- Improving documentation and examples
-- Adding support for other languages
-- Performance optimizations
-## 📖 Citation
 ```bibtex
 @misc{lapeft_financial_sentiment_2025,
-  title={LAPEFT: Lexicon-Augmented PEFT for Financial Sentiment Analysis},
   author={Hananguyen12},
   year={2025},
-  publisher={Hugging Face Hub},
   url={https://huggingface.co/Hananguyen12/LAPEFT-Financial-Sentiment-Analysis}
 }
 ```
-## 📄 License
-Apache 2.0 License - See LICENSE file for details.
-## 🎉 Acknowledgments
-- Hugging Face team for the transformers library
-- Microsoft for the LoRA technique
-- Loughran-McDonald for the financial sentiment lexicon
----
-*Built with ❤️ for the financial NLP community*

 - financial-nlp
 - lora
 - peft
 - bert
 language:
 - en
+pipeline_tag: text-classification
+library_name: transformers
 datasets:
 - financial-phrasebank
 widget:
+- text: "The company reported excellent quarterly results with strong revenue growth."
   example_title: "Positive Financial News"
+- text: "Market conditions remain stable with no significant changes expected."
+  example_title: "Neutral Market Update"
+- text: "The company faces potential bankruptcy due to mounting debt."
   example_title: "Negative Financial Outlook"
 ---
+# 🏦 LAPEFT: Financial Sentiment Analysis
+A fine-tuned BERT model with LoRA for financial sentiment analysis. This model classifies financial text into three categories: Negative, Neutral, and Positive.
+## Model Details
+- **Base Model**: bert-base-uncased
+- **Fine-tuning**: LoRA (Low-Rank Adaptation)
+- **Classes**: 3 (Negative, Neutral, Positive)
+- **Domain**: Financial text analysis
+- **Language**: English
+## Usage
+### Quick Start with Pipeline
 ```python
 from transformers import pipeline
 )
 # Analyze sentiment
+text = "The company reported strong quarterly earnings."
 result = classifier(text)
 print(result)
 # Output: [{'label': 'POSITIVE', 'score': 0.9234}]
 ```
+### Advanced Usage
 ```python
 from transformers import BertTokenizer, BertForSequenceClassification
 from peft import PeftModel
 # Load model components
+base_model = BertForSequenceClassification.from_pretrained(
+    "bert-base-uncased",
+    num_labels=3
+)
 model = PeftModel.from_pretrained(base_model, "Hananguyen12/LAPEFT-Financial-Sentiment-Analysis")
+tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+# Inference
+text = "The quarterly results exceeded expectations."
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
+with torch.no_grad():
+    outputs = model(**inputs)
+    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    predicted_class = torch.argmax(predictions, dim=-1)
+labels = ["NEGATIVE", "NEUTRAL", "POSITIVE"]
+print(f"Predicted: {labels[predicted_class]}")
 ```
+## Model Performance
+- Optimized for financial text analysis
+- Efficient LoRA fine-tuning approach
+- Suitable for real-time sentiment analysis
+## Use Cases
+- Financial news sentiment analysis
+- Social media monitoring for financial content
+- Investment research and analysis
+- Risk assessment based on sentiment
+## Limitations
+- Trained primarily on English financial text
 - Performance may vary on non-financial content
+- Best suited for sentences and short paragraphs
+## Citation
 ```bibtex
 @misc{lapeft_financial_sentiment_2025,
+  title={LAPEFT: Financial Sentiment Analysis with LoRA},
   author={Hananguyen12},
   year={2025},
+  publisher={Hugging Face},
   url={https://huggingface.co/Hananguyen12/LAPEFT-Financial-Sentiment-Analysis}
 }
 ```

config.json CHANGED Viewed

@@ -15,23 +15,22 @@
   "model_type": "bert",
   "num_attention_heads": 12,
   "num_hidden_layers": 12,
-  "num_labels": 3,
   "pad_token_id": 0,
   "position_embedding_type": "absolute",
-  "problem_type": "single_label_classification",
-  "torch_dtype": "float32",
-  "transformers_version": "4.36.0",
   "type_vocab_size": 2,
   "use_cache": true,
   "vocab_size": 30522,
   "id2label": {
-    "0": "negative",
-    "1": "neutral",
-    "2": "positive"
   },
   "label2id": {
-    "negative": 0,
-    "neutral": 1,
-    "positive": 2
   }
 }

   "model_type": "bert",
   "num_attention_heads": 12,
   "num_hidden_layers": 12,
   "pad_token_id": 0,
   "position_embedding_type": "absolute",
+  "transformers_version": "4.30.0",
   "type_vocab_size": 2,
   "use_cache": true,
   "vocab_size": 30522,
+  "num_labels": 3,
+  "problem_type": "single_label_classification",
   "id2label": {
+    "0": "NEGATIVE",
+    "1": "NEUTRAL",
+    "2": "POSITIVE"
   },
   "label2id": {
+    "NEGATIVE": 0,
+    "NEUTRAL": 1,
+    "POSITIVE": 2
   }
 }