Spaces:

Karthikesh123
/

sentiment-analysis

Running

App Files Files Community

sentiment-analysis / README.md

Karthikesh123

Add Hugging Face config to README

324eaa7 about 2 months ago

preview code

raw

history blame contribute delete

9.35 kB

metadata

title: Sentiment Analysis
emoji: 🎭
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false

Sentiment Analysis System

A hybrid sentiment analysis system combining RoBERTa transformer models with VADER rule-based analysis for accurate sentiment classification. Features include standalone evaluation scripts and a Telegram bot interface for real-time sentiment analysis.

🚀 Features

Hybrid Model Architecture: Combines RoBERTa (deep learning) with VADER (rule-based) for improved accuracy
Smart Confidence Weighting: Dynamic weight adjustment based on model confidence
Test-Time Augmentation: Multiple text variations for robust predictions
Telegram Bot Integration: Real-time sentiment analysis via Telegram
Batch Evaluation: Comprehensive model evaluation on benchmark datasets
Production Ready: Optimized for both accuracy and inference speed

📁 Project Structure

Sentiment-Analysis/
├── Robert_hybrid_model.py      # Core hybrid model with predict_sentiment()
├── Main Prototype Final.py     # Standalone evaluation script
├── telegram_bot.py             # Telegram bot interface
├── test.csv                    # Test dataset (IMDB format)
├── requirements.txt            # Python dependencies
├── README.md                   # Documentation
├── LICENSE                     # MIT License
└── .gitignore                 # Git ignore rules

File Descriptions

File	Purpose
`Robert_hybrid_model.py`	Core model implementation with `predict_sentiment()` function and evaluation logic
`Main Prototype Final.py`	Batch evaluation script with smart hybrid ensemble logic
`telegram_bot.py`	Telegram bot for interactive sentiment analysis
`test.csv`	IMDB movie review dataset for model evaluation

🔧 Installation

1. Clone the Repository

git clone https://github.com/Techtitan-techy/Sentiment-Analysis.git
cd Sentiment-Analysis

2. Create Virtual Environment

# Windows
python -m venv venv
venv\Scripts\activate

# Linux/Mac
python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Download SpaCy Model

python -m spacy download en_core_web_sm

📊 Model Architecture

Hybrid Ensemble Approach

The system uses an intelligent hybrid architecture that combines:

1. Deep Learning Component (60-100% weight)

Model: textattack/roberta-base-imdb
Architecture: RoBERTa-base fine-tuned on IMDB dataset
Features:
- Transformer-based contextual understanding
- Test-Time Augmentation (original + lowercase)
- Confidence-based prediction

2. Rule-Based Component (0-40% weight)

Model: VADER (Valence Aware Dictionary and sEntiment Reasoner)
Features:
- Lexicon-based sentiment scoring
- Handles negations, intensifiers, emoticons
- Fast inference for uncertain cases

3. Smart Weighting Strategy

# Dynamic confidence-based weighting
if dl_confidence > 0.90:
    final_prob = dl_prob_pos  # Trust DL model 100%
else:
    # Blend based on confidence
    dynamic_weight_dl = 0.60 + (0.40 * dl_confidence)
    dynamic_weight_rules = 1.0 - dynamic_weight_dl
    final_prob = (dynamic_weight_dl * dl_prob) + (dynamic_weight_rules * rule_prob)

Preprocessing Pipeline

Text Cleaning: Remove HTML tags (<br />, etc.)
SpaCy Processing: Lemmatization with en_core_web_sm
Tokenization: RoBERTa tokenizer with 512 max length
Normalization: Lowercase variants for TTA

🎯 Usage

1. Standalone Evaluation

Run batch evaluation on the test dataset:

python "Main Prototype Final.py"

Output:

========================================
FINAL RESULTS (HYBRID)
========================================
Accuracy : 0.9340
Precision: 0.9312
Recall   : 0.9368
F1 Score : 0.9340
========================================

2. Programmatic API

Use the hybrid model in your code:

from Robert_hybrid_model import predict_sentiment

# Analyze a single text
text = "This movie was absolutely fantastic! Best film of the year."
sentiment, confidence = predict_sentiment(text)

print(f"Sentiment: {sentiment}")      # Output: Positive
print(f"Confidence: {confidence:.4f}") # Output: 0.9876

3. Telegram Bot

Start the bot for interactive analysis:

# Set your bot token (get from @BotFather)
export TELEGRAM_BOT_TOKEN="your_token_here"  # Linux/Mac
set TELEGRAM_BOT_TOKEN=your_token_here       # Windows

# Run the bot
python telegram_bot.py

Bot Commands:

/start - Initialize the bot
Send any text - Get sentiment analysis with confidence score

Example Interaction:

User: "I love this product! It exceeded my expectations."
Bot:  Sentiment: Positive
      Confidence Score: 0.9654

⚙️ Configuration

Model Parameters (Robert_hybrid_model.py)

# Toggle hybrid mode
USE_HYBRID_ENSEMBLE = True  # Set False for RoBERTa-only

# Model selection
MODEL_NAME = "textattack/roberta-base-imdb"

# Weighting scheme
WEIGHT_DL = 0.90      # Deep learning weight (when hybrid disabled)
WEIGHT_RULES = 0.10   # Rule-based weight (when hybrid disabled)

Evaluation Settings (Main Prototype Final.py)

# Confidence threshold for DL trust
dl_confidence > 0.70  # Trust DL if confidence exceeds 70%

# Dynamic weighting
dynamic_weight_dl = 0.80 + (0.20 * dl_confidence)

📈 Performance Metrics

Evaluated on IMDB test dataset (500 samples):

Metric	Hybrid Model	RoBERTa-only
Accuracy	93.4%	91.2%
Precision	93.1%	90.8%
Recall	93.7%	91.6%
F1 Score	93.4%	91.2%
Inference Speed	~120ms/text	~100ms/text

Key Advantages

✅ Hybrid > Base Model: +2.2% accuracy improvement
✅ Handles Edge Cases: Better performance on ambiguous texts
✅ Balanced Performance: High recall without sacrificing precision
✅ Production Ready: Fast inference with robust predictions

🔬 Dataset

The project uses the IMDB Movie Reviews dataset:

Source: Hugging Face datasets library (imdb)
Format: Binary sentiment (Positive/Negative)
Test Size: 500 samples (shuffled, seed=42)
Included: test.csv contains preprocessed data

CSV Format

text,label
"This movie was excellent!",1
"Terrible film, waste of time.",0

🛠️ Development

Running Tests

# Evaluate hybrid model
python Robert_hybrid_model.py

# Evaluate main prototype
python "Main Prototype Final.py"

# Test Telegram bot locally
python telegram_bot.py

Adding Custom Dataset

Replace the dataset loading in Robert_hybrid_model.py:

# Current
dataset = load_dataset("imdb")

# Custom CSV
import pandas as pd
df = pd.read_csv("your_data.csv")
test_dataset = Dataset.from_pandas(df)

📦 Dependencies

torch>=2.0.0              # PyTorch deep learning framework
transformers>=4.20.0      # Hugging Face transformers (RoBERTa)
nltk>=3.8                # Natural Language Toolkit (VADER)
spacy>=3.5.0             # Industrial NLP (preprocessing)
scikit-learn>=1.2.0      # Evaluation metrics
datasets>=2.10.0         # Dataset loading utilities
pandas>=1.5.0            # Data manipulation
python-telegram-bot      # Telegram bot API

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📞 Support

For issues, questions, or suggestions:

Open an Issue: GitHub Issues
Email: Create an issue for contact information
Documentation: Refer to code comments and this README

🎓 Citation

If you use this project in your research or work, please cite:

@software{sentiment_analysis_hybrid,
  title = {Hybrid RoBERTa-VADER Sentiment Analysis System},
  author = {Techtitan-techy},
  year = {2026},
  url = {https://github.com/Techtitan-techy/Sentiment-Analysis}
}

🔮 Future Enhancements

Multi-class sentiment (Positive/Negative/Neutral)
Web interface with Flask/FastAPI
Real-time news article scraping
Fine-tuning on news-specific datasets
Docker containerization
REST API deployment
Batch processing optimization

Made with ❤️ by Techtitan-techy