| # FinBERT Sentiment Analysis on English/Quotes Dataset | |
| ## π Overview | |
| This repository hosts the FinBERT model fine-tuned for sentiment analysis using the English/Quotes dataset. The model classifies text into sentiment categories such as positive, negative, or neutral. | |
| ## π Model Details | |
| - **Model Architecture:** FinBERT (BERT-based model for sentiment analysis) | |
| - **Task:** Sentiment Analysis | |
| - **Dataset:** English/quotes dataset | |
| - **Fine-tuning Framework:** Hugging Face Transformers | |
| ## π Usage | |
| ### Installation | |
| ```bash | |
| pip install transformers torch | |
| ``` | |
| ### Loading the Model | |
| ```python | |
| from transformers import BertTokenizer, BertForSequenceClassification | |
| import torch | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| model_name = "Aventiq-AI/finbert-english/quotes" | |
| model = BertForSequenceClassification.from_pretrained(model_name).to(device) | |
| tokenizer = BertTokenizer.from_pretrained(model_name) | |
| ``` | |
| ### Sentiment Classification Inference | |
| ```python | |
| def predict_sentiment(text): | |
| inputs = tokenizer(text, padding="max_length", truncation=True, max_length=128, return_tensors="pt") | |
| inputs = {key: val.to(device) for key, val in inputs.items()} # Move inputs to device | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| logits = outputs.logits | |
| prediction = torch.argmax(logits, dim=-1).item() | |
| label_map = {0: "negative", 1: "neutral", 2: "positive"} | |
| return label_map[prediction] | |
| # Test on the original 5 quotes | |
| original_quotes = [ | |
| "βBe yourself; everyone else is already taken.β", | |
| "βI'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.β", | |
| "βTwo things are infinite: the universe and human stupidity; and I'm not sure about the universe.β", | |
| "βSo many books, so little time.β", | |
| "βA room without books is like a body without a soul.β" | |
| ] | |
| print("Predictions for original quotes:") | |
| for quote in original_quotes: | |
| pred = predict_sentiment(quote) | |
| print(f"Quote: {quote}\nPredicted Sentiment: {pred}\n") | |
| # Test on a new example | |
| new_quote = "Life is beautiful when you smile." | |
| print("Prediction for new quote:") | |
| print(f"Quote: {new_quote}\nPredicted Sentiment: {predict_sentiment(new_quote)}") | |
| ``` | |
| ## π Evaluation Metric: Accuracy & F1 Score | |
| For sentiment analysis, accuracy and F1-score are key evaluation metrics. The model achieves: | |
| - **Accuracy:** 88% | |
| - **F1 Score:** 0.85 | |
| ## π Repository Structure | |
| ``` | |
| . | |
| βββ model/ # Contains the fine-tuned model files | |
| βββ tokenizer_config/ # Tokenizer configuration and vocabulary files | |
| βββ model.safetensors/ # Model weights | |
| βββ README.md # Model documentation | |
| ``` | |
| ## β οΈ Limitations | |
| - The model may struggle with ambiguous phrases. | |
| - Performance might vary across different jurisdictions and terminologies. | |
| - The dataset primarily contains English text, making it less effective for multilingual applications. | |
| ## π€ Contributing | |
| Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements. |