# FinBERT Sentiment Analysis on English/Quotes Dataset ## 📌 Overview This repository hosts the FinBERT model fine-tuned for sentiment analysis using the English/Quotes dataset. The model classifies text into sentiment categories such as positive, negative, or neutral. ## 🏗 Model Details - **Model Architecture:** FinBERT (BERT-based model for sentiment analysis) - **Task:** Sentiment Analysis - **Dataset:** English/quotes dataset - **Fine-tuning Framework:** Hugging Face Transformers ## 🚀 Usage ### Installation ```bash pip install transformers torch ``` ### Loading the Model ```python from transformers import BertTokenizer, BertForSequenceClassification import torch device = "cuda" if torch.cuda.is_available() else "cpu" model_name = "Aventiq-AI/finbert-english/quotes" model = BertForSequenceClassification.from_pretrained(model_name).to(device) tokenizer = BertTokenizer.from_pretrained(model_name) ``` ### Sentiment Classification Inference ```python def predict_sentiment(text): inputs = tokenizer(text, padding="max_length", truncation=True, max_length=128, return_tensors="pt") inputs = {key: val.to(device) for key, val in inputs.items()} # Move inputs to device with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits prediction = torch.argmax(logits, dim=-1).item() label_map = {0: "negative", 1: "neutral", 2: "positive"} return label_map[prediction] # Test on the original 5 quotes original_quotes = [ "“Be yourself; everyone else is already taken.”", "“I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.”", "“Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.”", "“So many books, so little time.”", "“A room without books is like a body without a soul.”" ] print("Predictions for original quotes:") for quote in original_quotes: pred = predict_sentiment(quote) print(f"Quote: {quote}\nPredicted Sentiment: {pred}\n") # Test on a new example new_quote = "Life is beautiful when you smile." print("Prediction for new quote:") print(f"Quote: {new_quote}\nPredicted Sentiment: {predict_sentiment(new_quote)}") ``` ## 📊 Evaluation Metric: Accuracy & F1 Score For sentiment analysis, accuracy and F1-score are key evaluation metrics. The model achieves: - **Accuracy:** 88% - **F1 Score:** 0.85 ## 📂 Repository Structure ``` . ├── model/ # Contains the fine-tuned model files ├── tokenizer_config/ # Tokenizer configuration and vocabulary files ├── model.safetensors/ # Model weights ├── README.md # Model documentation ``` ## ⚠️ Limitations - The model may struggle with ambiguous phrases. - Performance might vary across different jurisdictions and terminologies. - The dataset primarily contains English text, making it less effective for multilingual applications. ## 🤝 Contributing Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.