mobadara commited on
Commit
18eba4e
·
verified ·
1 Parent(s): 6f3c1a9

Sync from GitHub via hub-sync

Browse files
models/README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # FinBERT Sentiment Analyzer (Fine-Tuned)
3
+
4
+ ## Model Description
5
+ This is a fine-tuned version of `ProsusAI/finbert` designed specifically for classifying the sentiment of financial news headlines into three distinct categories: **Positive, Negative, and Neutral**.
6
+
7
+ This model serves as the core inference engine for the FinBERT Sentiment Analyzer FastAPI backend.
8
+
9
+ ## Dataset & Class Imbalance Strategy
10
+ The model was trained on a heavily cleaned and preprocessed version of the Financial PhraseBank dataset. During exploratory data analysis, a severe class imbalance was identified, with the **Neutral** class representing roughly 61% of the data.
11
+
12
+ To prevent the model from collapsing into a majority-class predictor, we implemented a custom MLOps training strategy:
13
+ 1. **Dynamic Class Weights:** Penalty weights were calculated using the balanced heuristic ($N / (C \times n_i)$).
14
+ 2. **Custom Loss Function:** A custom Hugging Face `Trainer` subclass was built to inject these weights directly into a PyTorch `CrossEntropyLoss` function during gradient descent, heavily penalizing misclassifications of the minority (Positive/Negative) classes.
15
+
16
+ ## Evaluation Results
17
+ The model was evaluated on a strictly segregated test set (1,000 samples) pulled directly from the Hugging Face Hub to ensure zero data leakage.
18
+
19
+ * **Macro F1-Score:** `0.9394`
20
+ * **Accuracy:** `0.9600`
21
+ * **Validation Loss:** `0.1891`
22
+
23
+ *(Note: Macro F1-Score was prioritized over standard accuracy to validate true performance across the minority classes).*
24
+
25
+ ## Intended Use
26
+ This model is intended to be loaded into a FastAPI application for real-time financial sentiment inference. The heavy weight files (`.safetensors`) are hosted on the Hugging Face Hub under the repository name `finbert-finetuned`, while the tokenizer configurations and application logic reside in the associated GitHub repository.
27
+
28
+ ## Developer
29
+ **Muyiwa J. Obadara**
30
+ Data Scientist & AI Engineer
models/finbert-sentiment-model/config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_cross_attention": false,
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": null,
8
+ "classifier_dropout": null,
9
+ "dtype": "float32",
10
+ "eos_token_id": null,
11
+ "gradient_checkpointing": false,
12
+ "hidden_act": "gelu",
13
+ "hidden_dropout_prob": 0.1,
14
+ "hidden_size": 768,
15
+ "id2label": {
16
+ "0": "positive",
17
+ "1": "negative",
18
+ "2": "neutral"
19
+ },
20
+ "initializer_range": 0.02,
21
+ "intermediate_size": 3072,
22
+ "is_decoder": false,
23
+ "label2id": {
24
+ "negative": 1,
25
+ "neutral": 2,
26
+ "positive": 0
27
+ },
28
+ "layer_norm_eps": 1e-12,
29
+ "max_position_embeddings": 512,
30
+ "model_type": "bert",
31
+ "num_attention_heads": 12,
32
+ "num_hidden_layers": 12,
33
+ "pad_token_id": 0,
34
+ "position_embedding_type": "absolute",
35
+ "tie_word_embeddings": true,
36
+ "transformers_version": "5.0.0",
37
+ "type_vocab_size": 2,
38
+ "use_cache": false,
39
+ "vocab_size": 30522
40
+ }
models/finbert-sentiment-model/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
models/finbert-sentiment-model/tokenizer_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "cls_token": "[CLS]",
4
+ "do_lower_case": true,
5
+ "is_local": false,
6
+ "mask_token": "[MASK]",
7
+ "model_max_length": 512,
8
+ "pad_token": "[PAD]",
9
+ "sep_token": "[SEP]",
10
+ "strip_accents": null,
11
+ "tokenize_chinese_chars": true,
12
+ "tokenizer_class": "BertTokenizer",
13
+ "unk_token": "[UNK]"
14
+ }