File size: 4,077 Bytes
4ea2e77 aae765d 4ea2e77 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | ---
language: en
license: apache-2.0
tags:
- finance
- sentiment-analysis
- text-classification
- llama
- qlora
pipeline_tag: text-classification
library_name: transformers
---
# Llama Sentiment Classifier (QLoRA Fine-Tuned)
This repository provides a **LLaMA-based financial sentiment classifier** fine-tuned using **QLoRA** for **3-class sentiment classification** on finance-domain text. The model is designed for downstream applications including **sentiment-driven alpha signal generation** and market-neutral trading strategies.
---
## Model Summary
- **Backbone**: Meta-LLaMA3-8B
- **Task**: Financial sentiment classification (3 classes)
- **Fine-tuning method**: **QLoRA** (4-bit quantization + LoRA adapters)
- **Training framework**: **SWIFT** (Scalable lightWeight Infrastructure for Fine-Tuning)
- **Architecture**: LlamaForSequenceClassification
---
## Labels
This model performs **single-label classification** into 3 discrete classes.
In the paper, the unified sentiment label space is:
- **-1** = Negative
- **0** = Neutral
- **1** = Positive
> Note: your Hugging Face config may show `LABEL_0`, `LABEL_1`, `LABEL_2`.
---
## Datasets
### Part 1 — Fine-tuning datasets (4 sources), can be accessed through Romeo777777/FinLlaMa_Training_Dataset
Training uses 4 finance-domain sentiment datasets:
1. **Financial PhraseBank v1.0**
- 4,840 manually annotated sentences
- 3-class sentiment (positive/neutral/negative)
2. **NASDAQ News Sentiment**
- synthetic sentiment dataset generated using GPT-4o
- labeled into positive/neutral/negative
3. **Twitter Financial News Sentiment**
- 11,932 finance-related tweets
- original label encoding: 0 bearish, 1 bullish, 2 neutral
4. **FIQA2018 (FiQA)**
- financial opinion mining dataset
- sentiment score in [-1, 1] (continuous), then discretized
---
### Part 2 — Real-world news evaluation dataset
For downstream evaluation and trading strategy experiments, the paper uses a news + price dataset covering:
- **502 S&P 500 companies**
- **~77,000 news headlines**
- time range: **2024-01-01 to 2025-05-30**
---
## Preprocessing
### Label normalization (unified mapping)
All datasets are standardized into the unified label space: {-1, 0, 1}.
Mapping rules:
- **Financial PhraseBank**:
`"negative" → -1`, `"neutral" → 0`, `"positive" → 1`
- **FIQA2018** (score in [-1, 1]):
`< -0.2 → -1`, `> 0.2 → 1`, otherwise `0`
- **Twitter Financial Sentiment**:
`0 → -1`, `2 → 0`, `1 → 1`
- **NASDAQ News** (0~5 score):
`<= 1 → -1`, `>= 4 → 1`, otherwise `0`
### Tokenization
Uses the official **LLaMA 3 tokenizer**.
---
## Training Method
### QLoRA fine-tuning
The model is fine-tuned with **QLoRA**, which keeps the backbone weights in 4-bit quantized form and trains LoRA adapters in higher precision.
The paper’s QLoRA setup includes NF4 quantization + double quantization for memory efficiency.
### SWIFT training framework
Fine-tuning is orchestrated using **SWIFT**, a modular training framework that simplifies adapter integration and efficient training for quantized models.
---
## Hyperparameters (paper)
- LoRA rank **r = 16**
- LoRA alpha **= 32**
- LoRA dropout **= 0.1**
- learning rate **1e-4**, optimizer **AdamW**
- batch size **8**, epochs **1**
- gradient accumulation **4**
- gradient checkpointing enabled
- cosine learning rate schedule
---
## Evaluation Results (paper)
On a test set of **9,064** finance news samples, the fine-tuned model reports:
- **Accuracy**: **92.18%**
- **Micro-F1**: **0.9218**
- **Macro-F1**: **0.5787**
Class-wise performance highlights strong **Positive** and **Neutral** performance, but weaker **Negative** performance due to class imbalance.
---
## Usage
```python
from transformers import pipeline
clf = pipeline(
"text-classification",
model="Romeo777777/Llama_Sentiment_Classifier",
tokenizer="Romeo777777/Llama_Sentiment_Classifier",
return_all_scores=True,
)
print(clf("Bitcoin is pumping hard today!")) |