| | --- |
| | language: en |
| | license: apache-2.0 |
| | tags: |
| | - finance |
| | - sentiment-analysis |
| | - text-classification |
| | - llama |
| | - qlora |
| | pipeline_tag: text-classification |
| | library_name: transformers |
| | --- |
| | |
| | # Llama Sentiment Classifier (QLoRA Fine-Tuned) |
| |
|
| | This repository provides a **LLaMA-based financial sentiment classifier** fine-tuned using **QLoRA** for **3-class sentiment classification** on finance-domain text. The model is designed for downstream applications including **sentiment-driven alpha signal generation** and market-neutral trading strategies. |
| |
|
| | --- |
| |
|
| | ## Model Summary |
| |
|
| | - **Backbone**: Meta-LLaMA3-8B |
| | - **Task**: Financial sentiment classification (3 classes) |
| | - **Fine-tuning method**: **QLoRA** (4-bit quantization + LoRA adapters) |
| | - **Training framework**: **SWIFT** (Scalable lightWeight Infrastructure for Fine-Tuning) |
| | - **Architecture**: LlamaForSequenceClassification |
| |
|
| | --- |
| |
|
| | ## Labels |
| |
|
| | This model performs **single-label classification** into 3 discrete classes. |
| |
|
| | In the paper, the unified sentiment label space is: |
| |
|
| | - **-1** = Negative |
| | - **0** = Neutral |
| | - **1** = Positive |
| |
|
| | > Note: your Hugging Face config may show `LABEL_0`, `LABEL_1`, `LABEL_2`. |
| | |
| | --- |
| | |
| | ## Datasets |
| | |
| | ### Part 1 — Fine-tuning datasets (4 sources), can be accessed through Romeo777777/FinLlaMa_Training_Dataset |
| | |
| | |
| | Training uses 4 finance-domain sentiment datasets: |
| | |
| | 1. **Financial PhraseBank v1.0** |
| | - 4,840 manually annotated sentences |
| | - 3-class sentiment (positive/neutral/negative) |
| | |
| | 2. **NASDAQ News Sentiment** |
| | - synthetic sentiment dataset generated using GPT-4o |
| | - labeled into positive/neutral/negative |
| | |
| | 3. **Twitter Financial News Sentiment** |
| | - 11,932 finance-related tweets |
| | - original label encoding: 0 bearish, 1 bullish, 2 neutral |
| | |
| | 4. **FIQA2018 (FiQA)** |
| | - financial opinion mining dataset |
| | - sentiment score in [-1, 1] (continuous), then discretized |
| | |
| | --- |
| | |
| | ### Part 2 — Real-world news evaluation dataset |
| | |
| | For downstream evaluation and trading strategy experiments, the paper uses a news + price dataset covering: |
| | |
| | - **502 S&P 500 companies** |
| | - **~77,000 news headlines** |
| | - time range: **2024-01-01 to 2025-05-30** |
| | |
| | --- |
| | |
| | ## Preprocessing |
| | |
| | ### Label normalization (unified mapping) |
| | |
| | All datasets are standardized into the unified label space: {-1, 0, 1}. |
| | |
| | Mapping rules: |
| | |
| | - **Financial PhraseBank**: |
| | `"negative" → -1`, `"neutral" → 0`, `"positive" → 1` |
| | |
| | - **FIQA2018** (score in [-1, 1]): |
| | `< -0.2 → -1`, `> 0.2 → 1`, otherwise `0` |
| | |
| | - **Twitter Financial Sentiment**: |
| | `0 → -1`, `2 → 0`, `1 → 1` |
| | |
| | - **NASDAQ News** (0~5 score): |
| | `<= 1 → -1`, `>= 4 → 1`, otherwise `0` |
| | |
| | ### Tokenization |
| | |
| | Uses the official **LLaMA 3 tokenizer**. |
| | |
| | --- |
| | |
| | ## Training Method |
| | |
| | ### QLoRA fine-tuning |
| | |
| | The model is fine-tuned with **QLoRA**, which keeps the backbone weights in 4-bit quantized form and trains LoRA adapters in higher precision. |
| | |
| | The paper’s QLoRA setup includes NF4 quantization + double quantization for memory efficiency. |
| | |
| | ### SWIFT training framework |
| | |
| | Fine-tuning is orchestrated using **SWIFT**, a modular training framework that simplifies adapter integration and efficient training for quantized models. |
| | |
| | --- |
| | |
| | ## Hyperparameters (paper) |
| | |
| | - LoRA rank **r = 16** |
| | - LoRA alpha **= 32** |
| | - LoRA dropout **= 0.1** |
| | - learning rate **1e-4**, optimizer **AdamW** |
| | - batch size **8**, epochs **1** |
| | - gradient accumulation **4** |
| | - gradient checkpointing enabled |
| | - cosine learning rate schedule |
| | |
| | --- |
| | |
| | ## Evaluation Results (paper) |
| | |
| | On a test set of **9,064** finance news samples, the fine-tuned model reports: |
| | |
| | - **Accuracy**: **92.18%** |
| | - **Micro-F1**: **0.9218** |
| | - **Macro-F1**: **0.5787** |
| | |
| | Class-wise performance highlights strong **Positive** and **Neutral** performance, but weaker **Negative** performance due to class imbalance. |
| | |
| | --- |
| | |
| | ## Usage |
| | |
| | ```python |
| | from transformers import pipeline |
| | |
| | clf = pipeline( |
| | "text-classification", |
| | model="Romeo777777/Llama_Sentiment_Classifier", |
| | tokenizer="Romeo777777/Llama_Sentiment_Classifier", |
| | return_all_scores=True, |
| | ) |
| | |
| | print(clf("Bitcoin is pumping hard today!")) |