Spaces:

mobadara
/

finbert-sentiment-api

Running

App Files Files Community

mobadara commited on 16 days ago

Commit

18eba4e

verified ·

1 Parent(s): 6f3c1a9

Sync from GitHub via hub-sync

Browse files

Files changed (4) hide show

models/README.md +30 -0
models/finbert-sentiment-model/config.json +40 -0
models/finbert-sentiment-model/tokenizer.json +0 -0
models/finbert-sentiment-model/tokenizer_config.json +14 -0

models/README.md ADDED Viewed

	@@ -0,0 +1,30 @@

+# FinBERT Sentiment Analyzer (Fine-Tuned)
+## Model Description
+This is a fine-tuned version of `ProsusAI/finbert` designed specifically for classifying the sentiment of financial news headlines into three distinct categories: **Positive, Negative, and Neutral**.
+This model serves as the core inference engine for the FinBERT Sentiment Analyzer FastAPI backend.
+## Dataset & Class Imbalance Strategy
+The model was trained on a heavily cleaned and preprocessed version of the Financial PhraseBank dataset. During exploratory data analysis, a severe class imbalance was identified, with the **Neutral** class representing roughly 61% of the data.
+To prevent the model from collapsing into a majority-class predictor, we implemented a custom MLOps training strategy:
+1. **Dynamic Class Weights:** Penalty weights were calculated using the balanced heuristic ($N / (C \times n_i)$).
+2. **Custom Loss Function:** A custom Hugging Face `Trainer` subclass was built to inject these weights directly into a PyTorch `CrossEntropyLoss` function during gradient descent, heavily penalizing misclassifications of the minority (Positive/Negative) classes.
+## Evaluation Results
+The model was evaluated on a strictly segregated test set (1,000 samples) pulled directly from the Hugging Face Hub to ensure zero data leakage.
+* **Macro F1-Score:** `0.9394`
+* **Accuracy:** `0.9600`
+* **Validation Loss:** `0.1891`
+*(Note: Macro F1-Score was prioritized over standard accuracy to validate true performance across the minority classes).*
+## Intended Use
+This model is intended to be loaded into a FastAPI application for real-time financial sentiment inference. The heavy weight files (`.safetensors`) are hosted on the Hugging Face Hub under the repository name `finbert-finetuned`, while the tokenizer configurations and application logic reside in the associated GitHub repository.
+## Developer
+**Muyiwa J. Obadara**
+Data Scientist & AI Engineer

models/finbert-sentiment-model/config.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "add_cross_attention": false,
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": null,
+  "classifier_dropout": null,
+  "dtype": "float32",
+  "eos_token_id": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "positive",
+    "1": "negative",
+    "2": "neutral"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "is_decoder": false,
+  "label2id": {
+    "negative": 1,
+    "neutral": 2,
+    "positive": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "tie_word_embeddings": true,
+  "transformers_version": "5.0.0",
+  "type_vocab_size": 2,
+  "use_cache": false,
+  "vocab_size": 30522
+}

models/finbert-sentiment-model/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

models/finbert-sentiment-model/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "backend": "tokenizers",
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "is_local": false,
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}