title: Sentiment Analysis
emoji: ๐ญ
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
Sentiment Analysis System
A hybrid sentiment analysis system combining RoBERTa transformer models with VADER rule-based analysis for accurate sentiment classification. Features include standalone evaluation scripts and a Telegram bot interface for real-time sentiment analysis.
๐ Features
- Hybrid Model Architecture: Combines RoBERTa (deep learning) with VADER (rule-based) for improved accuracy
- Smart Confidence Weighting: Dynamic weight adjustment based on model confidence
- Test-Time Augmentation: Multiple text variations for robust predictions
- Telegram Bot Integration: Real-time sentiment analysis via Telegram
- Batch Evaluation: Comprehensive model evaluation on benchmark datasets
- Production Ready: Optimized for both accuracy and inference speed
๐ Project Structure
Sentiment-Analysis/
โโโ Robert_hybrid_model.py # Core hybrid model with predict_sentiment()
โโโ Main Prototype Final.py # Standalone evaluation script
โโโ telegram_bot.py # Telegram bot interface
โโโ test.csv # Test dataset (IMDB format)
โโโ requirements.txt # Python dependencies
โโโ README.md # Documentation
โโโ LICENSE # MIT License
โโโ .gitignore # Git ignore rules
File Descriptions
| File | Purpose |
|---|---|
Robert_hybrid_model.py |
Core model implementation with predict_sentiment() function and evaluation logic |
Main Prototype Final.py |
Batch evaluation script with smart hybrid ensemble logic |
telegram_bot.py |
Telegram bot for interactive sentiment analysis |
test.csv |
IMDB movie review dataset for model evaluation |
๐ง Installation
1. Clone the Repository
git clone https://github.com/Techtitan-techy/Sentiment-Analysis.git
cd Sentiment-Analysis
2. Create Virtual Environment
# Windows
python -m venv venv
venv\Scripts\activate
# Linux/Mac
python3 -m venv venv
source venv/bin/activate
3. Install Dependencies
pip install -r requirements.txt
4. Download SpaCy Model
python -m spacy download en_core_web_sm
๐ Model Architecture
Hybrid Ensemble Approach
The system uses an intelligent hybrid architecture that combines:
1. Deep Learning Component (60-100% weight)
- Model:
textattack/roberta-base-imdb - Architecture: RoBERTa-base fine-tuned on IMDB dataset
- Features:
- Transformer-based contextual understanding
- Test-Time Augmentation (original + lowercase)
- Confidence-based prediction
2. Rule-Based Component (0-40% weight)
- Model: VADER (Valence Aware Dictionary and sEntiment Reasoner)
- Features:
- Lexicon-based sentiment scoring
- Handles negations, intensifiers, emoticons
- Fast inference for uncertain cases
3. Smart Weighting Strategy
# Dynamic confidence-based weighting
if dl_confidence > 0.90:
final_prob = dl_prob_pos # Trust DL model 100%
else:
# Blend based on confidence
dynamic_weight_dl = 0.60 + (0.40 * dl_confidence)
dynamic_weight_rules = 1.0 - dynamic_weight_dl
final_prob = (dynamic_weight_dl * dl_prob) + (dynamic_weight_rules * rule_prob)
Preprocessing Pipeline
- Text Cleaning: Remove HTML tags (
<br />, etc.) - SpaCy Processing: Lemmatization with
en_core_web_sm - Tokenization: RoBERTa tokenizer with 512 max length
- Normalization: Lowercase variants for TTA
๐ฏ Usage
1. Standalone Evaluation
Run batch evaluation on the test dataset:
python "Main Prototype Final.py"
Output:
========================================
FINAL RESULTS (HYBRID)
========================================
Accuracy : 0.9340
Precision: 0.9312
Recall : 0.9368
F1 Score : 0.9340
========================================
2. Programmatic API
Use the hybrid model in your code:
from Robert_hybrid_model import predict_sentiment
# Analyze a single text
text = "This movie was absolutely fantastic! Best film of the year."
sentiment, confidence = predict_sentiment(text)
print(f"Sentiment: {sentiment}") # Output: Positive
print(f"Confidence: {confidence:.4f}") # Output: 0.9876
3. Telegram Bot
Start the bot for interactive analysis:
# Set your bot token (get from @BotFather)
export TELEGRAM_BOT_TOKEN="your_token_here" # Linux/Mac
set TELEGRAM_BOT_TOKEN=your_token_here # Windows
# Run the bot
python telegram_bot.py
Bot Commands:
/start- Initialize the bot- Send any text - Get sentiment analysis with confidence score
Example Interaction:
User: "I love this product! It exceeded my expectations."
Bot: Sentiment: Positive
Confidence Score: 0.9654
โ๏ธ Configuration
Model Parameters (Robert_hybrid_model.py)
# Toggle hybrid mode
USE_HYBRID_ENSEMBLE = True # Set False for RoBERTa-only
# Model selection
MODEL_NAME = "textattack/roberta-base-imdb"
# Weighting scheme
WEIGHT_DL = 0.90 # Deep learning weight (when hybrid disabled)
WEIGHT_RULES = 0.10 # Rule-based weight (when hybrid disabled)
Evaluation Settings (Main Prototype Final.py)
# Confidence threshold for DL trust
dl_confidence > 0.70 # Trust DL if confidence exceeds 70%
# Dynamic weighting
dynamic_weight_dl = 0.80 + (0.20 * dl_confidence)
๐ Performance Metrics
Evaluated on IMDB test dataset (500 samples):
| Metric | Hybrid Model | RoBERTa-only |
|---|---|---|
| Accuracy | 93.4% | 91.2% |
| Precision | 93.1% | 90.8% |
| Recall | 93.7% | 91.6% |
| F1 Score | 93.4% | 91.2% |
| Inference Speed | ~120ms/text | ~100ms/text |
Key Advantages
โ
Hybrid > Base Model: +2.2% accuracy improvement
โ
Handles Edge Cases: Better performance on ambiguous texts
โ
Balanced Performance: High recall without sacrificing precision
โ
Production Ready: Fast inference with robust predictions
๐ฌ Dataset
The project uses the IMDB Movie Reviews dataset:
- Source: Hugging Face
datasetslibrary (imdb) - Format: Binary sentiment (Positive/Negative)
- Test Size: 500 samples (shuffled, seed=42)
- Included:
test.csvcontains preprocessed data
CSV Format
text,label
"This movie was excellent!",1
"Terrible film, waste of time.",0
๐ ๏ธ Development
Running Tests
# Evaluate hybrid model
python Robert_hybrid_model.py
# Evaluate main prototype
python "Main Prototype Final.py"
# Test Telegram bot locally
python telegram_bot.py
Adding Custom Dataset
Replace the dataset loading in Robert_hybrid_model.py:
# Current
dataset = load_dataset("imdb")
# Custom CSV
import pandas as pd
df = pd.read_csv("your_data.csv")
test_dataset = Dataset.from_pandas(df)
๐ฆ Dependencies
torch>=2.0.0 # PyTorch deep learning framework
transformers>=4.20.0 # Hugging Face transformers (RoBERTa)
nltk>=3.8 # Natural Language Toolkit (VADER)
spacy>=3.5.0 # Industrial NLP (preprocessing)
scikit-learn>=1.2.0 # Evaluation metrics
datasets>=2.10.0 # Dataset loading utilities
pandas>=1.5.0 # Data manipulation
python-telegram-bot # Telegram bot API
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ค Contributing
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
๐ Support
For issues, questions, or suggestions:
- Open an Issue: GitHub Issues
- Email: Create an issue for contact information
- Documentation: Refer to code comments and this README
๐ Citation
If you use this project in your research or work, please cite:
@software{sentiment_analysis_hybrid,
title = {Hybrid RoBERTa-VADER Sentiment Analysis System},
author = {Techtitan-techy},
year = {2026},
url = {https://github.com/Techtitan-techy/Sentiment-Analysis}
}
๐ฎ Future Enhancements
- Multi-class sentiment (Positive/Negative/Neutral)
- Web interface with Flask/FastAPI
- Real-time news article scraping
- Fine-tuning on news-specific datasets
- Docker containerization
- REST API deployment
- Batch processing optimization
Made with โค๏ธ by Techtitan-techy