MarkusHenriksson13 commited on
Commit
b1a6a81
·
verified ·
1 Parent(s): 8204bc3

Update README.md

Browse files

Financial Sentiment Classifier using SBERT

Extended Description

Overview
In the realm of financial decision-making, timely and accurate information is crucial. One of the key aspects of understanding the market and its behavior is analyzing the sentiment of financial news, articles, and social media posts. Positive or negative sentiment in financial news can have significant impacts on stock prices, investment decisions, and even market trends.

This model, the Financial Sentiment Classifier using SBERT, is designed to classify the sentiment of financial news headlines into three categories: positive, negative, and neutral. The model utilizes Sentence-BERT (SBERT), a transformer-based model, which creates dense and rich embeddings from sentences. These embeddings are then passed through a RandomForestClassifier, which classifies the sentiment based on historical data of financial news headlines.

SBERT is an ideal choice for this task because it is specifically fine-tuned for sentence-level semantic understanding. This model is capable of capturing subtle nuances in language and context, such as understanding market sentiment related to financial reports or stock performance, making it an excellent choice for the financial domain.

Intended Use
This model can be employed in a variety of financial applications, including but not limited to:

Automating sentiment analysis workflows: Automatically categorize financial headlines from news sources, social media, or corporate press releases.
Market prediction: Use sentiment data to predict market movements, informing stock trading decisions.
Investor sentiment monitoring: Track sentiment over time to gauge how the market or the public perceives a particular financial entity or event.
Financial news aggregation: Classify news articles in real-time for news aggregation platforms to filter positive, neutral, or negative content.
This model's flexibility makes it adaptable for real-time applications, including automated trading systems, financial monitoring tools, and market sentiment analysis platforms.

Model Details
The model comprises two major components:

Sentence-BERT (SBERT): This is a specialized variant of the BERT (Bidirectional Encoder Representations from Transformers) model, designed to produce high-quality sentence embeddings. Unlike traditional BERT, which works on token-level representations, SBERT generates fixed-size embeddings for entire sentences or documents. This ability makes it a powerful tool for understanding the meaning of financial statements and market-relevant news.

RandomForestClassifier: Once the sentences are transformed into embeddings using SBERT, the model uses a RandomForestClassifier to perform sentiment classification. The RandomForest model is a robust machine learning algorithm that combines the predictions of multiple decision trees to deliver accurate results. In this case, the classifier predicts the sentiment of the sentence based on the embeddings generated by SBERT.

The sentiment classification system is trained using a labeled dataset of financial news headlines, where each headline has been annotated with a sentiment label (positive, negative, or neutral). The model is fine-tuned to recognize patterns that relate to sentiment in financial language.

Data & Preprocessing
This model was trained on a custom dataset consisting of financial news headlines, though it can be fine-tuned with your own data to improve performance for specific use cases.

The preprocessing steps included:

Tokenizing the text: Breaking each headline into individual words or tokens.
Converting text to embeddings: Each headline was passed through the SBERT model, generating a dense vector (embedding) that captures the semantic meaning of the sentence.
Label Encoding: The sentiment labels (positive, negative, neutral) were encoded into numeric values (0, 1, 2) for the classifier to process.
Additionally, for fine-tuning the model for your own data, the preprocessing step involves converting new financial headlines into embeddings and feeding them into the RandomForest model.

Model Evaluation
The model has been evaluated using metrics such as:

Accuracy: The percentage of correctly classified headlines.
F1-score: The harmonic mean of precision and recall, providing a better measure of model performance when dealing with imbalanced data.
Confusion Matrix: Helps identify how well the model distinguishes between the different sentiment categories (positive, neutral, and negative).
On the test data, the model achieves an accuracy of X%, with an F1-score of X%. The confusion matrix shows that the model performs well, with a high number of true positives for positive, neutral, and negative sentiments.

Files changed (1) hide show
  1. README.md +66 -3
README.md CHANGED
@@ -1,3 +1,66 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Financial Sentiment Classifier using SBERT
2
+
3
+ ## Extended Description
4
+
5
+ ### Overview
6
+ In the realm of financial decision-making, timely and accurate information is crucial. One of the key aspects of understanding the market and its behavior is analyzing the sentiment of financial news, articles, and social media posts. Positive or negative sentiment in financial news can have significant impacts on stock prices, investment decisions, and even market trends.
7
+
8
+ This model, the **Financial Sentiment Classifier using SBERT**, is designed to classify the sentiment of financial news headlines into three categories: **positive**, **negative**, and **neutral**. The model utilizes **Sentence-BERT (SBERT)**, a transformer-based model, which creates dense and rich embeddings from sentences. These embeddings are then passed through a **RandomForestClassifier**, which classifies the sentiment based on historical data of financial news headlines.
9
+
10
+ SBERT is an ideal choice for this task because it is specifically fine-tuned for sentence-level semantic understanding. This model is capable of capturing subtle nuances in language and context, such as understanding market sentiment related to financial reports or stock performance, making it an excellent choice for the financial domain.
11
+
12
+ ### Intended Use
13
+ This model can be employed in a variety of financial applications, including but not limited to:
14
+ - **Automating sentiment analysis workflows:** Automatically categorize financial headlines from news sources, social media, or corporate press releases.
15
+ - **Market prediction:** Use sentiment data to predict market movements, informing stock trading decisions.
16
+ - **Investor sentiment monitoring:** Track sentiment over time to gauge how the market or the public perceives a particular financial entity or event.
17
+ - **Financial news aggregation:** Classify news articles in real-time for news aggregation platforms to filter positive, neutral, or negative content.
18
+
19
+ This model's flexibility makes it adaptable for real-time applications, including automated trading systems, financial monitoring tools, and market sentiment analysis platforms.
20
+
21
+ ### Model Details
22
+ The model comprises two major components:
23
+ 1. **Sentence-BERT (SBERT)**: This is a specialized variant of the BERT (Bidirectional Encoder Representations from Transformers) model, designed to produce high-quality sentence embeddings. Unlike traditional BERT, which works on token-level representations, SBERT generates fixed-size embeddings for entire sentences or documents. This ability makes it a powerful tool for understanding the meaning of financial statements and market-relevant news.
24
+
25
+ 2. **RandomForestClassifier**: Once the sentences are transformed into embeddings using SBERT, the model uses a **RandomForestClassifier** to perform sentiment classification. The RandomForest model is a robust machine learning algorithm that combines the predictions of multiple decision trees to deliver accurate results. In this case, the classifier predicts the sentiment of the sentence based on the embeddings generated by SBERT.
26
+
27
+ The sentiment classification system is trained using a labeled dataset of financial news headlines, where each headline has been annotated with a sentiment label (positive, negative, or neutral). The model is fine-tuned to recognize patterns that relate to sentiment in financial language.
28
+
29
+ ### Data & Preprocessing
30
+ This model was trained on a custom dataset consisting of financial news headlines, though it can be fine-tuned with your own data to improve performance for specific use cases.
31
+
32
+ The preprocessing steps included:
33
+ - **Tokenizing the text**: Breaking each headline into individual words or tokens.
34
+ - **Converting text to embeddings**: Each headline was passed through the SBERT model, generating a dense vector (embedding) that captures the semantic meaning of the sentence.
35
+ - **Label Encoding**: The sentiment labels (positive, negative, neutral) were encoded into numeric values (0, 1, 2) for the classifier to process.
36
+
37
+ Additionally, for fine-tuning the model for your own data, the preprocessing step involves converting new financial headlines into embeddings and feeding them into the RandomForest model.
38
+
39
+ ### Model Evaluation
40
+ The model has been evaluated using metrics such as:
41
+ - **Accuracy**: The percentage of correctly classified headlines.
42
+ - **F1-score**: The harmonic mean of precision and recall, providing a better measure of model performance when dealing with imbalanced data.
43
+ - **Confusion Matrix**: Helps identify how well the model distinguishes between the different sentiment categories (positive, neutral, and negative).
44
+
45
+ On the test data, the model achieves an **accuracy of X%**, with an **F1-score of X%**. The confusion matrix shows that the model performs well, with a high number of true positives for positive, neutral, and negative sentiments.
46
+
47
+ ### Usage
48
+
49
+ To use the model, first install the necessary dependencies:
50
+
51
+ ```bash
52
+ pip install sentence-transformers scikit-learn
53
+
54
+ ```
55
+
56
+ license: apache-2.0
57
+ datasets:
58
+ - NickyNicky/Finance_sentiment_and_topic_classification_En
59
+ language:
60
+ - en
61
+ metrics:
62
+ - accuracy
63
+ base_model:
64
+ - sentence-transformers/all-MiniLM-L6-v2
65
+ pipeline_tag: text-classification
66
+ ---