Spaces:

entropy25
/

sentiment-analyzer3

Sleeping

App Files Files Community

entropy25 commited on Jul 24, 2025

Commit

3670fc5

verified ·

1 Parent(s): 797f9f3

Upload 10 files

Browse files

Files changed (10) hide show

Dockerfile +34 -0
README.md +512 -10
analyzer.py +218 -0
app.py +284 -0
config.py +33 -0
docker-compose.yml +42 -0
models.py +49 -0
requirements.txt +13 -0
utils.py +157 -0
visualizer.py +184 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,34 @@

+FROM python:3.9-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    curl \
+    software-properties-common \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first for better caching
+COPY requirements.txt .
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY . .
+# Create non-root user
+RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
+USER appuser
+# Expose port
+EXPOSE 7860
+# Health check
+HEALTHCHECK CMD curl --fail http://localhost:7860 || exit 1
+# Run the application
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,10 +1,512 @@
----
-title: Sentiment Analyzer3
-emoji: 🐨
-colorFrom: blue
-colorTo: green
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+title: Sentiment Analysis docker
+emoji: 📊
+colorFrom: gray
+colorTo: gray
+sdk: gradio
+sdk\_version: 5.34.1
+app\_file: app.py
+pinned: false
+license: mit
+short\_description: sentiment-analysis
+\# 🎬 AI Movie Sentiment Analyzer
+A sophisticated sentiment analysis application for movie reviews using advanced deep learning techniques with BERT, LIME, and SHAP explanations.
+\## Features
+\- \*\*Fast Sentiment Analysis\*\*: Quick movie review sentiment classification
+\- \*\*Advanced Explanations\*\*: LIME and SHAP-based word importance analysis
+\- \*\*Batch Processing\*\*: Analyze multiple reviews simultaneously
+\- \*\*Interactive Visualizations\*\*: Charts, gauges, word clouds, and heatmaps
+\- \*\*History Tracking\*\*: Keep track of all analyses with trend visualization
+\- \*\*Data Export\*\*: Export results in CSV and JSON formats
+\- \*\*File Upload Support\*\*: Process CSV and text files
+\- \*\*Multiple Themes\*\*: Customizable color themes for visualizations
+\## Project Structure
+```
+sentiment\\\_analyzer/
+├── config.py              # Configuration management
+├── models.py              # Model loading and management
+├── analyzer.py            # Core sentiment analysis logic
+├── visualizer.py          # Visualization components
+├── utils.py               # Utility functions and data handling
+├── app.py                 # Gradio interface and main application
+├── requirements.txt       # Python dependencies
+├── Dockerfile            # Docker container configuration
+├── docker-compose.yml    # Docker Compose setup
+└── README.md            # Project documentation
+```
+\## Installation
+\### Local Installation
+1\. \*\*Clone the repository\*\*
+   ```bash
+   git clone <repository-url>
+   cd sentiment\_analyzer
+   ```
+2\. \*\*Create virtual environment\*\*
+   ```bash
+   python -m venv venv
+   source venv/bin/activate  # On Windows: venv\\Scripts\\activate
+   ```
+3\. \*\*Install dependencies\*\*
+   ```bash
+   pip install -r requirements.txt
+   ```
+4\. \*\*Run the application\*\*
+   ```bash
+   python app.py
+   ```
+\### Docker Installation
+1\. \*\*Using Docker Compose (Recommended)\*\*
+   ```bash
+   docker-compose up --build
+   ```
+2\. \*\*Using Docker directly\*\*
+   ```bash
+   docker build -t sentiment-analyzer .
+   docker run -p 7860:7860 sentiment-analyzer
+   ```
+\## Usage
+\### Web Interface
+1\. Open your browser and navigate to `http://localhost:7860`
+2\. Choose from three main tabs:
+   - \*\*Quick Analysis\*\*: Fast sentiment analysis with basic visualizations
+   - \*\*Advanced Analysis\*\*: Deep analysis with LIME/SHAP explanations
+   - \*\*Batch Analysis\*\*: Process multiple reviews at once
+\### API Usage
+The application can be extended to provide API endpoints for programmatic access.
+\## Configuration
+Modify `config.py` to customize:
+\- \*\*Model Settings\*\*: Batch sizes, text length limits
+\- \*\*Visualization\*\*: Figure sizes, color themes
+\- \*\*Processing\*\*: Cache sizes, stop words
+\- \*\*History\*\*: Maximum history size
+\## Model Information
+\- \*\*Base Model\*\*: BERT (entropy25/sentimentanalysis)
+\- \*\*Classes\*\*: Positive, Negative
+\- \*\*Explanation Methods\*\*: LIME, SHAP
+\- \*\*Supported Languages\*\*: English
+\## Features Detail
+\### Quick Analysis
+\- Fast sentiment classification
+\- Confidence scoring
+\- Probability visualization
+\- Word cloud generation
+\### Advanced Analysis
+\- LIME-based word importance
+\- SHAP value calculation
+\- Interactive heatmap visualization
+\- Detailed explanations
+\### Batch Processing
+\- CSV/TXT file upload
+\- Bulk sentiment analysis
+\- Comprehensive result visualization
+\- Progress tracking
+\### History \& Export
+\- Analysis history tracking
+\- Trend visualization
+\- CSV/JSON export
+\- Data persistence
+\## Performance
+\- \*\*GPU Support\*\*: Automatic CUDA detection
+\- \*\*Memory Management\*\*: Efficient batch processing
+\- \*\*Caching\*\*: LRU cache for text processing
+\- \*\*Resource Optimization\*\*: Context managers for memory cleanup
+\## Dependencies
+\### Core Dependencies
+\- `torch`: Deep learning framework
+\- `transformers`: BERT model implementation
+\- `gradio`: Web interface framework
+\### Analysis \& Visualization
+\- `lime`: Local interpretable model explanations
+\- `shap`: Shapley additive explanations
+\- `matplotlib`: Plotting and visualization
+\- `wordcloud`: Word cloud generation
+\### Data Processing
+\- `pandas`: Data manipulation
+\- `numpy`: Numerical computing
+\## Development
+\### Adding New Features
+1\. \*\*New Analyzers\*\*: Add to `analyzer.py`
+2\. \*\*Visualizations\*\*: Extend `visualizer.py`
+3\. \*\*UI Components\*\*: Modify `app.py`
+4\. \*\*Configuration\*\*: Update `config.py`
+\### Testing
+```bash
+\\# Run tests (if implemented)
+python -m pytest tests/
+\\# Manual testing
+python -c "from analyzer import SentimentEngine; engine = SentimentEngine(); print(engine.analyze\\\_single\\\_fast('Great movie!'))"
+```
+\## Deployment
+\### Production Deployment
+1\. \*\*Environment Variables\*\*
+   ```bash
+   export GRADIO\_SERVER\_NAME=0.0.0.0
+   export GRADIO\_SERVER\_PORT=7860
+   ```
+2\. \*\*Resource Requirements\*\*
+   - CPU: 2+ cores recommended
+   - RAM: 4GB+ recommended
+   - GPU: Optional (CUDA support)
+3\. \*\*Monitoring\*\*
+   - Health checks included in Docker setup
+   - Logging configured for production use
+\## Troubleshooting
+\### Common Issues
+1\. \*\*CUDA Out of Memory\*\*
+   - Reduce `BATCH\\\_PROCESSING\\\_SIZE` in config
+   - Use CPU-only mode
+2\. \*\*Model Loading Errors\*\*
+   - Check internet connection
+   - Verify Hugging Face model availability
+3\. \*\*File Processing Issues\*\*
+   - Ensure proper file encoding (UTF-8 recommended)
+   - Check CSV format and column structure
+\### Performance Optimization
+\- Use GPU if available
+\- Adjust batch sizes based on available memory
+\- Enable caching for repeated analyses
+\- Use Docker for consistent performance
+\## Contributing
+1\. Fork the repository
+2\. Create a feature branch
+3\. Make your changes
+4\. Add tests if applicable
+5\. Submit a pull request
+\## License
+This project is licensed under the MIT License - see the LICENSE file for details.
+\## Acknowledgments
+\- Hugging Face for BERT model hosting
+\- LIME and SHAP libraries for explainable AI
+\- Gradio for the intuitive web interface
+\- The open-source community for various dependencies
+\## Support
+For issues and questions:
+1\. Check the troubleshooting section
+2\. Review existing GitHub issues
+3\. Create a new issue with detailed information
+\## Changelog
+\### v1.0.0
+\- Initial release with core functionality
+\- BERT-based sentiment analysis
+\- LIME and SHAP explanations
+\- Gradio web interface
+\- Docker support

analyzer.py ADDED Viewed

	@@ -0,0 +1,218 @@

+import torch
+import re
+import logging
+from typing import List, Dict, Tuple
+from functools import lru_cache
+from lime.lime_text import LimeTextExplainer
+from config import config
+from models import ModelManager
+from utils import handle_errors
+logger = logging.getLogger(__name__)
+class TextProcessor:
+    """Optimized text processing"""
+    @staticmethod
+    @lru_cache(maxsize=config.CACHE_SIZE)
+    def clean_text(text: str) -> Tuple[str, ...]:
+        """Single-pass text cleaning"""
+        words = re.findall(r'\b\w{3,}\b', text.lower())
+        return tuple(w for w in words if w not in config.STOP_WORDS)
+class SentimentEngine:
+    """Streamlined sentiment analysis engine with LIME and SHAP"""
+    def __init__(self):
+        self.model_manager = ModelManager()
+        self.lime_explainer = LimeTextExplainer(class_names=['Negative', 'Positive'])
+        self.shap_explainer = None
+    def predict_proba(self, texts):
+        """Prediction function for LIME"""
+        if isinstance(texts, str):
+            texts = [texts]
+        inputs = self.model_manager.tokenizer(
+            texts, return_tensors="pt", padding=True,
+            truncation=True, max_length=config.MAX_TEXT_LENGTH
+        ).to(self.model_manager.device)
+        with torch.no_grad():
+            outputs = self.model_manager.model(**inputs)
+            probs = torch.nn.functional.softmax(outputs.logits, dim=-1).cpu().numpy()
+        return probs
+    @handle_errors(default_return={'sentiment': 'Unknown', 'confidence': 0.0})
+    def analyze_single_fast(self, text: str) -> Dict:
+        """Fast single text analysis without keyword extraction"""
+        if not text.strip():
+            raise ValueError("Empty text")
+        probs = self.predict_proba([text])[0]
+        sentiment = "Positive" if probs[1] > probs[0] else "Negative"
+        return {
+            'sentiment': sentiment,
+            'confidence': float(probs.max()),
+            'pos_prob': float(probs[1]),
+            'neg_prob': float(probs[0])
+        }
+    def extract_key_words_lime(self, text: str, top_k: int = 10) -> List[Tuple[str, float]]:
+        """Advanced keyword extraction using LIME"""
+        try:
+            explanation = self.lime_explainer.explain_instance(
+                text, self.predict_proba, num_features=top_k, num_samples=200
+            )
+            word_scores = []
+            for word, score in explanation.as_list():
+                if len(word.strip()) >= config.MIN_WORD_LENGTH:
+                    word_scores.append((word.strip().lower(), abs(score)))
+            word_scores.sort(key=lambda x: x[1], reverse=True)
+            return word_scores[:top_k]
+        except Exception as e:
+            logger.error(f"LIME extraction failed: {e}")
+            return []
+    def extract_key_words_shap(self, text: str, top_k: int = 10) -> List[Tuple[str, float]]:
+        """Advanced keyword extraction using SHAP"""
+        try:
+            # Simple SHAP implementation using model predictions
+            words = text.split()
+            word_scores = []
+            # Get baseline prediction
+            baseline_prob = self.predict_proba([text])[0][1]  # Positive probability
+            # Calculate importance by removing each word
+            for i, word in enumerate(words):
+                # Create text without this word
+                modified_words = words[:i] + words[i+1:]
+                modified_text = ' '.join(modified_words)
+                if modified_text.strip():
+                    modified_prob = self.predict_proba([modified_text])[0][1]
+                    importance = abs(baseline_prob - modified_prob)
+                    clean_word = re.sub(r'[^\w]', '', word.lower())
+                    if len(clean_word) >= config.MIN_WORD_LENGTH:
+                        word_scores.append((clean_word, importance))
+            # Remove duplicates and sort
+            unique_scores = {}
+            for word, score in word_scores:
+                if word in unique_scores:
+                    unique_scores[word] = max(unique_scores[word], score)
+                else:
+                    unique_scores[word] = score
+            sorted_scores = sorted(unique_scores.items(), key=lambda x: x[1], reverse=True)
+            return sorted_scores[:top_k]
+        except Exception as e:
+            logger.error(f"SHAP extraction failed: {e}")
+            return []
+    def create_heatmap_html(self, text: str, word_scores: Dict[str, float]) -> str:
+        """Create HTML heatmap visualization"""
+        words = text.split()
+        html_parts = ['<div style="font-family: Arial; font-size: 16px; line-height: 1.6;">']
+        if word_scores:
+            max_score = max(abs(score) for score in word_scores.values())
+            min_score = min(word_scores.values())
+        else:
+            max_score = min_score = 0
+        for word in words:
+            clean_word = re.sub(r'[^\w]', '', word.lower())
+            score = word_scores.get(clean_word, 0)
+            if score > 0:
+                intensity = min(255, int(180 * (score / max_score) if max_score > 0 else 0))
+                color = f"rgba(0, {intensity}, 0, 0.3)"
+            elif score < 0:
+                intensity = min(255, int(180 * (abs(score) / abs(min_score)) if min_score < 0 else 0))
+                color = f"rgba({intensity}, 0, 0, 0.3)"
+            else:
+                color = "transparent"
+            html_parts.append(
+                f'<span style="background-color: {color}; padding: 2px; margin: 1px; '
+                f'border-radius: 3px;" title="Score: {score:.3f}">{word}</span> '
+            )
+        html_parts.append('</div>')
+        return ''.join(html_parts)
+    @handle_errors(default_return={'sentiment': 'Unknown', 'confidence': 0.0, 'lime_words': [], 'shap_words': [], 'heatmap_html': ''})
+    def analyze_single_advanced(self, text: str) -> Dict:
+        """Advanced single text analysis with LIME and SHAP explanation"""
+        if not text.strip():
+            raise ValueError("Empty text")
+        probs = self.predict_proba([text])[0]
+        sentiment = "Positive" if probs[1] > probs[0] else "Negative"
+        # Extract key words using both LIME and SHAP
+        lime_words = self.extract_key_words_lime(text)
+        shap_words = self.extract_key_words_shap(text)
+        # Create heatmap HTML using LIME results
+        word_scores_dict = dict(lime_words)
+        heatmap_html = self.create_heatmap_html(text, word_scores_dict)
+        return {
+            'sentiment': sentiment,
+            'confidence': float(probs.max()),
+            'pos_prob': float(probs[1]),
+            'neg_prob': float(probs[0]),
+            'lime_words': lime_words,
+            'shap_words': shap_words,
+            'heatmap_html': heatmap_html
+        }
+    @handle_errors(default_return=[])
+    def analyze_batch(self, texts: List[str], progress_callback=None) -> List[Dict]:
+        """Optimized batch processing"""
+        if len(texts) > config.BATCH_SIZE_LIMIT:
+            texts = texts[:config.BATCH_SIZE_LIMIT]
+        results = []
+        batch_size = config.BATCH_PROCESSING_SIZE
+        for i in range(0, len(texts), batch_size):
+            batch = texts[i:i+batch_size]
+            if progress_callback:
+                progress_callback((i + len(batch)) / len(texts))
+            inputs = self.model_manager.tokenizer(
+                batch, return_tensors="pt", padding=True,
+                truncation=True, max_length=config.MAX_TEXT_LENGTH
+            ).to(self.model_manager.device)
+            with torch.no_grad():
+                outputs = self.model_manager.model(**inputs)
+                probs = torch.nn.functional.softmax(outputs.logits, dim=-1).cpu().numpy()
+            for text, prob in zip(batch, probs):
+                sentiment = "Positive" if prob[1] > prob[0] else "Negative"
+                results.append({
+                    'text': text[:50] + '...' if len(text) > 50 else text,
+                    'full_text': text,
+                    'sentiment': sentiment,
+                    'confidence': float(prob.max()),
+                    'pos_prob': float(prob[1]),
+                    'neg_prob': float(prob[0])
+                })
+        return results

app.py ADDED Viewed

	@@ -0,0 +1,284 @@

+import gradio as gr
+import numpy as np
+import logging
+from collections import Counter
+from config import config
+from analyzer import SentimentEngine
+from visualizer import PlotFactory, ThemeContext
+from utils import HistoryManager, DataHandler, handle_errors, managed_figure
+class SentimentApp:
+    """Main application orchestrator"""
+    def __init__(self):
+        self.engine = SentimentEngine()
+        self.history = HistoryManager()
+        self.data_handler = DataHandler()
+        self.examples = [
+            ["While the film's visual effects were undeniably impressive, the story lacked emotional weight, and the pacing felt inconsistent throughout."],
+            ["An extraordinary achievement in filmmaking — the direction was masterful, the script was sharp, and every performance added depth and realism."],
+            ["Despite a promising start, the film quickly devolved into a series of clichés, with weak character development and an ending that felt rushed and unearned."],
+            ["A beautifully crafted story with heartfelt moments and a soundtrack that perfectly captured the emotional tone of each scene."],
+            ["The movie was far too long, with unnecessary subplots and dull dialogue that made it difficult to stay engaged until the end."]
+        ]
+    @handle_errors(default_return=("Please enter text", None, None, None))
+    def analyze_single_fast(self, text: str, theme: str = 'default'):
+        """Fast single text analysis without keywords"""
+        if not text.strip():
+            return "Please enter text", None, None, None
+        result = self.engine.analyze_single_fast(text)
+        self.history.add({
+            'text': text[:100],
+            'full_text': text,
+            **result
+        })
+        theme_ctx = ThemeContext(theme)
+        probs = np.array([result['neg_prob'], result['pos_prob']])
+        prob_plot = PlotFactory.create_sentiment_bars(probs, theme_ctx)
+        gauge_plot = PlotFactory.create_confidence_gauge(result['confidence'], result['sentiment'], theme_ctx)
+        cloud_plot = PlotFactory.create_wordcloud(text, result['sentiment'], theme_ctx)
+        result_text = f"Sentiment: {result['sentiment']} (Confidence: {result['confidence']:.3f})"
+        return result_text, prob_plot, gauge_plot, cloud_plot
+    @handle_errors(default_return=("Please enter text", None, None, None))
+    def analyze_single_advanced(self, text: str, theme: str = 'default'):
+        """Advanced single text analysis with LIME and SHAP explanation"""
+        if not text.strip():
+            return "Please enter text", None, None, None
+        result = self.engine.analyze_single_advanced(text)
+        self.history.add({
+            'text': text[:100],
+            'full_text': text,
+            **result
+        })
+        theme_ctx = ThemeContext(theme)
+        lime_plot = PlotFactory.create_lime_keyword_chart(result['lime_words'], result['sentiment'], theme_ctx)
+        shap_plot = PlotFactory.create_shap_keyword_chart(result['shap_words'], result['sentiment'], theme_ctx)
+        lime_words_str = ", ".join([f"{word}({score:.3f})" for word, score in result['lime_words'][:5]])
+        shap_words_str = ", ".join([f"{word}({score:.3f})" for word, score in result['shap_words'][:5]])
+        result_text = (f"Sentiment: {result['sentiment']} (Confidence: {result['confidence']:.3f})\n"
+                      f"LIME Key Words: {lime_words_str}\n"
+                      f"SHAP Key Words: {shap_words_str}")
+        return result_text, lime_plot, shap_plot, result['heatmap_html']
+    @handle_errors(default_return=None)
+    def analyze_batch(self, reviews: str, progress=None):
+        """Batch analysis"""
+        if not reviews.strip():
+            return None
+        texts = [r.strip() for r in reviews.split('\n') if r.strip()]
+        if len(texts) < 2:
+            return None
+        results = self.engine.analyze_batch(texts, progress)
+        for result in results:
+            self.history.add(result)
+        theme_ctx = ThemeContext('default')
+        return PlotFactory.create_batch_analysis(results, theme_ctx)
+    @handle_errors(default_return=(None, "No history available"))
+    def plot_history(self, theme: str = 'default'):
+        """Plot analysis history"""
+        history = self.history.get_all()
+        if len(history) < 2:
+            return None, f"Need at least 2 analyses for trends. Current: {len(history)}"
+        theme_ctx = ThemeContext(theme)
+        with managed_figure(figsize=(12, 8)) as fig:
+            gs = fig.add_gridspec(2, 1, hspace=0.3)
+            indices = list(range(len(history)))
+            pos_probs = [item['pos_prob'] for item in history]
+            confs = [item['confidence'] for item in history]
+            # Sentiment trend
+            ax1 = fig.add_subplot(gs[0, 0])
+            colors = [theme_ctx.colors['pos'] if p > 0.5 else theme_ctx.colors['neg']
+                     for p in pos_probs]
+            ax1.scatter(indices, pos_probs, c=colors, alpha=0.7, s=60)
+            ax1.plot(indices, pos_probs, alpha=0.5, linewidth=2)
+            ax1.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5)
+            ax1.set_title('Sentiment History')
+            ax1.set_ylabel('Positive Probability')
+            ax1.grid(True, alpha=0.3)
+            # Confidence trend
+            ax2 = fig.add_subplot(gs[1, 0])
+            ax2.bar(indices, confs, alpha=0.7, color='lightblue', edgecolor='navy')
+            ax2.set_title('Confidence Over Time')
+            ax2.set_xlabel('Analysis Number')
+            ax2.set_ylabel('Confidence')
+            ax2.grid(True, alpha=0.3)
+            fig.tight_layout()
+            return fig, f"History: {len(history)} analyses"
+def create_interface():
+    """Create streamlined Gradio interface"""
+    app = SentimentApp()
+    with gr.Blocks(theme=gr.themes.Soft(), title="Movie Sentiment Analyzer") as demo:
+        gr.Markdown("# 🎬 AI Movie Sentiment Analyzer")
+        gr.Markdown("Fast sentiment analysis with advanced deep learning explanations")
+        with gr.Tab("Quick Analysis"):
+            with gr.Row():
+                with gr.Column():
+                    text_input = gr.Textbox(
+                        label="Movie Review",
+                        placeholder="Enter your movie review...",
+                        lines=5
+                    )
+                    with gr.Row():
+                        analyze_btn = gr.Button("Analyze", variant="primary")
+                        theme_selector = gr.Dropdown(
+                            choices=list(config.THEMES.keys()),
+                            value="default",
+                            label="Theme"
+                        )
+                    gr.Examples(
+                        examples=app.examples,
+                        inputs=text_input
+                    )
+                with gr.Column():
+                    result_output = gr.Textbox(label="Result", lines=3)
+            with gr.Row():
+                prob_plot = gr.Plot(label="Probabilities")
+                gauge_plot = gr.Plot(label="Confidence")
+            with gr.Row():
+                wordcloud_plot = gr.Plot(label="Word Cloud")
+        with gr.Tab("Advanced Analysis"):
+            with gr.Row():
+                with gr.Column():
+                    adv_text_input = gr.Textbox(
+                        label="Movie Review",
+                        placeholder="Enter your movie review for deep analysis...",
+                        lines=5
+                    )
+                    with gr.Row():
+                        adv_analyze_btn = gr.Button("Deep Analyze", variant="primary")
+                        adv_theme_selector = gr.Dropdown(
+                            choices=list(config.THEMES.keys()),
+                            value="default",
+                            label="Theme"
+                        )
+                    gr.Examples(
+                        examples=app.examples,
+                        inputs=adv_text_input
+                    )
+                with gr.Column():
+                    adv_result_output = gr.Textbox(label="Analysis Result", lines=4)
+            with gr.Row():
+                lime_plot = gr.Plot(label="LIME: Key Contributing Words")
+                shap_plot = gr.Plot(label="SHAP: Key Contributing Words")
+            with gr.Row():
+                heatmap_output = gr.HTML(label="Word Importance Heatmap (LIME-based)")
+        with gr.Tab("Batch Analysis"):
+            with gr.Row():
+                with gr.Column():
+                    file_upload = gr.File(label="Upload File", file_types=[".csv", ".txt"])
+                    batch_input = gr.Textbox(
+                        label="Reviews (one per line)",
+                        lines=8
+                    )
+                with gr.Column():
+                    load_btn = gr.Button("Load File")
+                    batch_btn = gr.Button("Analyze Batch", variant="primary")
+            batch_plot = gr.Plot(label="Batch Results")
+        with gr.Tab("History & Export"):
+            with gr.Row():
+                refresh_btn = gr.Button("Refresh")
+                clear_btn = gr.Button("Clear", variant="stop")
+            with gr.Row():
+                csv_btn = gr.Button("Export CSV")
+                json_btn = gr.Button("Export JSON")
+            history_status = gr.Textbox(label="Status")
+            history_plot = gr.Plot(label="History Trends")
+            csv_file = gr.File(label="CSV Download", visible=True)
+            json_file = gr.File(label="JSON Download", visible=True)
+        # Event bindings for Quick Analysis
+        analyze_btn.click(
+            app.analyze_single_fast,
+            inputs=[text_input, theme_selector],
+            outputs=[result_output, prob_plot, gauge_plot, wordcloud_plot]
+        )
+        # Event bindings for Advanced Analysis
+        adv_analyze_btn.click(
+            app.analyze_single_advanced,
+            inputs=[adv_text_input, adv_theme_selector],
+            outputs=[adv_result_output, lime_plot, shap_plot, heatmap_output]
+        )
+        # Event bindings for Batch Analysis
+        load_btn.click(app.data_handler.process_file, inputs=file_upload, outputs=batch_input)
+        batch_btn.click(app.analyze_batch, inputs=batch_input, outputs=batch_plot)
+        # Event bindings for History & Export
+        refresh_btn.click(
+            lambda theme: app.plot_history(theme),
+            inputs=theme_selector,
+            outputs=[history_plot, history_status]
+        )
+        clear_btn.click(
+            lambda: f"Cleared {app.history.clear()} entries",
+            outputs=history_status
+        )
+        csv_btn.click(
+            lambda: app.data_handler.export_data(app.history.get_all(), 'csv'),
+            outputs=[csv_file, history_status]
+        )
+        json_btn.click(
+            lambda: app.data_handler.export_data(app.history.get_all(), 'json'),
+            outputs=[json_file, history_status]
+        )
+    return demo
+# Application Entry Point
+if __name__ == "__main__":
+    logging.basicConfig(level=logging.INFO)
+    demo = create_interface()
+    demo.launch(share=True)

config.py ADDED Viewed

	@@ -0,0 +1,33 @@

+from dataclasses import dataclass
+from typing import Tuple, Dict
+@dataclass
+class Config:
+    MAX_HISTORY_SIZE: int = 1000
+    BATCH_SIZE_LIMIT: int = 50
+    MAX_TEXT_LENGTH: int = 512
+    MIN_WORD_LENGTH: int = 2
+    CACHE_SIZE: int = 128
+    BATCH_PROCESSING_SIZE: int = 8
+    # Visualization settings
+    FIGURE_SIZE_SINGLE: Tuple[int, int] = (8, 5)
+    FIGURE_SIZE_BATCH: Tuple[int, int] = (12, 8)
+    WORDCLOUD_SIZE: Tuple[int, int] = (10, 5)
+    THEMES = {
+        'default': {'pos': '#4ecdc4', 'neg': '#ff6b6b'},
+        'ocean': {'pos': '#0077be', 'neg': '#ff6b35'},
+        'forest': {'pos': '#228b22', 'neg': '#dc143c'},
+        'sunset': {'pos': '#ff8c00', 'neg': '#8b0000'}
+    }
+    STOP_WORDS = {
+        'the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to',
+        'for', 'of', 'with', 'by', 'is', 'are', 'was', 'were', 'be',
+        'been', 'have', 'has', 'had', 'will', 'would', 'could', 'should'
+    }
+config = Config()

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,42 @@

+version: '3.8'
+services:
+  sentiment-analyzer:
+    build: .
+    ports:
+      - "7860:7860"
+    environment:
+      - PYTHONPATH=/app
+      - GRADIO_SERVER_NAME=0.0.0.0
+      - GRADIO_SERVER_PORT=7860
+    volumes:
+      - ./data:/app/data
+      - ./logs:/app/logs
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:7860"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 40s
+    deploy:
+      resources:
+        limits:
+          cpus: '2.0'
+          memory: 4G
+        reservations:
+          cpus: '1.0'
+          memory: 2G
+  # Optional: Add Redis for caching (uncomment if needed)
+  # redis:
+  #   image: redis:7-alpine
+  #   ports:
+  #     - "6379:6379"
+  #   command: redis-server --appendonly yes
+  #   volumes:
+  #     - redis_data:/data
+  #   restart: unless-stopped
+# volumes:
+#   redis_data:

models.py ADDED Viewed

	@@ -0,0 +1,49 @@

+import torch
+import logging
+from transformers import BertTokenizer, BertForSequenceClassification
+logger = logging.getLogger(__name__)
+class ModelManager:
+    """Lazy loading model manager"""
+    _instance = None
+    _model = None
+    _tokenizer = None
+    _device = None
+    def __new__(cls):
+        if cls._instance is None:
+            cls._instance = super().__new__(cls)
+        return cls._instance
+    @property
+    def model(self):
+        if self._model is None:
+            self._load_model()
+        return self._model
+    @property
+    def tokenizer(self):
+        if self._tokenizer is None:
+            self._load_model()
+        return self._tokenizer
+    @property
+    def device(self):
+        if self._device is None:
+            self._device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        return self._device
+    def _load_model(self):
+        """Load model and tokenizer"""
+        try:
+            self._device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+            self._tokenizer = BertTokenizer.from_pretrained("entropy25/sentimentanalysis")
+            self._model = BertForSequenceClassification.from_pretrained("entropy25/sentimentanalysis")
+            self._model.to(self._device)
+            logger.info(f"Model loaded on {self._device}")
+        except Exception as e:
+            logger.error(f"Model loading failed: {e}")
+            raise

requirements.txt ADDED Viewed

	@@ -0,0 +1,13 @@

+torch>=1.9.0
+transformers>=4.20.0
+gradio>=3.45.0
+matplotlib>=3.5.0
+numpy>=1.21.0
+wordcloud>=1.9.0
+pandas>=1.3.0
+lime>=0.2.0
+shap>=0.41.0
+scikit-learn>=1.0.0
+Pillow>=8.3.0
+requests>=2.25.0
+tqdm>=4.62.0

utils.py ADDED Viewed

	@@ -0,0 +1,157 @@

+import matplotlib.pyplot as plt
+import pandas as pd
+import csv
+import json
+import tempfile
+import gc
+import logging
+from datetime import datetime
+from functools import wraps
+from contextlib import contextmanager
+from typing import List, Dict, Optional, Tuple, Any, Callable
+logger = logging.getLogger(__name__)
+# Decorators and Context Managers
+def handle_errors(default_return=None):
+    """Centralized error handling decorator"""
+    def decorator(func: Callable) -> Callable:
+        @wraps(func)
+        def wrapper(*args, **kwargs):
+            try:
+                return func(*args, **kwargs)
+            except Exception as e:
+                logger.error(f"{func.__name__} failed: {e}")
+                return default_return if default_return is not None else f"Error: {str(e)}"
+        return wrapper
+    return decorator
+@contextmanager
+def managed_figure(*args, **kwargs):
+    """Context manager for matplotlib figures to prevent memory leaks"""
+    fig = plt.figure(*args, **kwargs)
+    try:
+        yield fig
+    finally:
+        plt.close(fig)
+        gc.collect()
+class HistoryManager:
+    """Simplified history management"""
+    def __init__(self):
+        self._history = []
+    def add(self, entry: Dict):
+        from config import config
+        self._history.append({**entry, 'timestamp': datetime.now().isoformat()})
+        if len(self._history) > config.MAX_HISTORY_SIZE:
+            self._history = self._history[-config.MAX_HISTORY_SIZE:]
+    def get_all(self) -> List[Dict]:
+        return self._history.copy()
+    def clear(self) -> int:
+        count = len(self._history)
+        self._history.clear()
+        return count
+    def size(self) -> int:
+        return len(self._history)
+class DataHandler:
+    """Handles all data operations"""
+    @staticmethod
+    @handle_errors(default_return=(None, "Export failed"))
+    def export_data(data: List[Dict], format_type: str) -> Tuple[Optional[str], str]:
+        """Universal data export"""
+        if not data:
+            return None, "No data to export"
+        temp_file = tempfile.NamedTemporaryFile(mode='w', delete=False,
+                                               suffix=f'.{format_type}', encoding='utf-8')
+        if format_type == 'csv':
+            writer = csv.writer(temp_file)
+            writer.writerow(['Timestamp', 'Text', 'Sentiment', 'Confidence', 'Pos_Prob', 'Neg_Prob'])
+            for entry in data:
+                writer.writerow([
+                    entry.get('timestamp', ''),
+                    entry.get('text', ''),
+                    entry.get('sentiment', ''),
+                    f"{entry.get('confidence', 0):.4f}",
+                    f"{entry.get('pos_prob', 0):.4f}",
+                    f"{entry.get('neg_prob', 0):.4f}"
+                ])
+        elif format_type == 'json':
+            json.dump(data, temp_file, indent=2, ensure_ascii=False)
+        temp_file.close()
+        return temp_file.name, f"Exported {len(data)} entries"
+    @staticmethod
+    @handle_errors(default_return="")
+    def process_file(file) -> str:
+        """Process uploaded file with improved CSV handling"""
+        if not file:
+            return ""
+        try:
+            file_path = file.name
+            if file_path.endswith('.csv'):
+                for encoding in ['utf-8', 'latin-1', 'cp1252', 'iso-8859-1']:
+                    try:
+                        df = pd.read_csv(file_path, encoding=encoding)
+                        text_columns = []
+                        for col in df.columns:
+                            sample_values = df[col].dropna().head(10)
+                            if len(sample_values) > 0:
+                                text_count = sum(1 for val in sample_values
+                                               if isinstance(val, str) and len(str(val).strip()) > 10)
+                                if text_count > len(sample_values) * 0.7:
+                                    text_columns.append(col)
+                        if text_columns:
+                            selected_column = text_columns[0]
+                        else:
+                            selected_column = df.columns[0]
+                        reviews = df[selected_column].dropna().astype(str).tolist()
+                        cleaned_reviews = []
+                        for review in reviews:
+                            review = review.strip()
+                            if len(review) > 10 and review.lower() != 'nan':
+                                cleaned_reviews.append(review)
+                        if cleaned_reviews:
+                            logger.info(f"Successfully read {len(cleaned_reviews)} reviews from CSV")
+                            return '\n'.join(cleaned_reviews)
+                    except Exception as e:
+                        continue
+                return "Error: Could not read CSV file. Please check the file format and encoding."
+            else:
+                for encoding in ['utf-8', 'latin-1', 'cp1252']:
+                    try:
+                        with open(file_path, 'r', encoding=encoding) as f:
+                            content = f.read().strip()
+                        if content:
+                            return content
+                    except Exception as e:
+                        continue
+                return "Error: Could not read text file. Please check the file encoding."
+        except Exception as e:
+            logger.error(f"File processing error: {e}")
+            return f"Error processing file: {str(e)}"

visualizer.py ADDED Viewed

	@@ -0,0 +1,184 @@

+import matplotlib.pyplot as plt
+import numpy as np
+from wordcloud import WordCloud
+from collections import Counter
+from typing import List, Dict, Tuple, Optional
+import gc
+from config import config
+from utils import handle_errors, managed_figure
+class ThemeContext:
+    """Theme management context"""
+    def __init__(self, theme: str = 'default'):
+        self.theme = theme
+        self.colors = config.THEMES.get(theme, config.THEMES['default'])
+class PlotFactory:
+    """Factory for creating plots with proper memory management"""
+    @staticmethod
+    @handle_errors(default_return=None)
+    def create_sentiment_bars(probs: np.ndarray, theme: ThemeContext) -> plt.Figure:
+        """Create sentiment probability bars"""
+        with managed_figure(figsize=config.FIGURE_SIZE_SINGLE) as fig:
+            ax = fig.add_subplot(111)
+            labels = ["Negative", "Positive"]
+            colors = [theme.colors['neg'], theme.colors['pos']]
+            bars = ax.bar(labels, probs, color=colors, alpha=0.8)
+            ax.set_title("Sentiment Probabilities", fontweight='bold')
+            ax.set_ylabel("Probability")
+            ax.set_ylim(0, 1)
+            for bar, prob in zip(bars, probs):
+                ax.text(bar.get_x() + bar.get_width()/2., bar.get_height() + 0.02,
+                       f'{prob:.3f}', ha='center', va='bottom', fontweight='bold')
+            fig.tight_layout()
+            return fig
+    @staticmethod
+    @handle_errors(default_return=None)
+    def create_confidence_gauge(confidence: float, sentiment: str, theme: ThemeContext) -> plt.Figure:
+        """Create confidence gauge"""
+        with managed_figure(figsize=config.FIGURE_SIZE_SINGLE) as fig:
+            ax = fig.add_subplot(111)
+            theta = np.linspace(0, np.pi, 100)
+            colors = [theme.colors['neg'] if i < 50 else theme.colors['pos'] for i in range(100)]
+            for i in range(len(theta)-1):
+                ax.fill_between([theta[i], theta[i+1]], [0, 0], [0.8, 0.8],
+                               color=colors[i], alpha=0.7)
+            pos = np.pi * (0.5 + (0.4 if sentiment == 'Positive' else -0.4) * confidence)
+            ax.plot([pos, pos], [0, 0.6], 'k-', linewidth=6)
+            ax.plot(pos, 0.6, 'ko', markersize=10)
+            ax.set_xlim(0, np.pi)
+            ax.set_ylim(0, 1)
+            ax.set_title(f'{sentiment} - Confidence: {confidence:.3f}', fontweight='bold')
+            ax.set_xticks([0, np.pi/2, np.pi])
+            ax.set_xticklabels(['Negative', 'Neutral', 'Positive'])
+            ax.axis('off')
+            fig.tight_layout()
+            return fig
+    @staticmethod
+    @handle_errors(default_return=None)
+    def create_lime_keyword_chart(lime_words: List[Tuple[str, float]], sentiment: str, theme: ThemeContext) -> Optional[plt.Figure]:
+        """Create horizontal bar chart for LIME key contributing words"""
+        if not lime_words:
+            return None
+        with managed_figure(figsize=config.FIGURE_SIZE_SINGLE) as fig:
+            ax = fig.add_subplot(111)
+            words = [word for word, score in lime_words]
+            scores = [score for word, score in lime_words]
+            color = theme.colors['pos'] if sentiment == 'Positive' else theme.colors['neg']
+            bars = ax.barh(range(len(words)), scores, color=color, alpha=0.7)
+            ax.set_yticks(range(len(words)))
+            ax.set_yticklabels(words)
+            ax.set_xlabel('LIME Attention Weight')
+            ax.set_title(f'LIME: Top Contributing Words ({sentiment})', fontweight='bold')
+            for i, (bar, score) in enumerate(zip(bars, scores)):
+                ax.text(bar.get_width() + 0.001, bar.get_y() + bar.get_height()/2.,
+                       f'{score:.3f}', ha='left', va='center', fontsize=9)
+            ax.invert_yaxis()
+            ax.grid(axis='x', alpha=0.3)
+            fig.tight_layout()
+            return fig
+    @staticmethod
+    @handle_errors(default_return=None)
+    def create_shap_keyword_chart(shap_words: List[Tuple[str, float]], sentiment: str, theme: ThemeContext) -> Optional[plt.Figure]:
+        """Create horizontal bar chart for SHAP key contributing words"""
+        if not shap_words:
+            return None
+        with managed_figure(figsize=config.FIGURE_SIZE_SINGLE) as fig:
+            ax = fig.add_subplot(111)
+            words = [word for word, score in shap_words]
+            scores = [score for word, score in shap_words]
+            color = theme.colors['pos'] if sentiment == 'Positive' else theme.colors['neg']
+            bars = ax.barh(range(len(words)), scores, color=color, alpha=0.7)
+            ax.set_yticks(range(len(words)))
+            ax.set_yticklabels(words)
+            ax.set_xlabel('SHAP Value')
+            ax.set_title(f'SHAP: Top Contributing Words ({sentiment})', fontweight='bold')
+            for i, (bar, score) in enumerate(zip(bars, scores)):
+                ax.text(bar.get_width() + 0.001, bar.get_y() + bar.get_height()/2.,
+                       f'{score:.3f}', ha='left', va='center', fontsize=9)
+            ax.invert_yaxis()
+            ax.grid(axis='x', alpha=0.3)
+            fig.tight_layout()
+            return fig
+    @staticmethod
+    @handle_errors(default_return=None)
+    def create_wordcloud(text: str, sentiment: str, theme: ThemeContext) -> Optional[plt.Figure]:
+        """Create word cloud"""
+        if len(text.split()) < 3:
+            return None
+        colormap = 'Greens' if sentiment == 'Positive' else 'Reds'
+        wc = WordCloud(width=800, height=400, background_color='white',
+                      colormap=colormap, max_words=30).generate(text)
+        with managed_figure(figsize=config.WORDCLOUD_SIZE) as fig:
+            ax = fig.add_subplot(111)
+            ax.imshow(wc, interpolation='bilinear')
+            ax.axis('off')
+            ax.set_title(f'{sentiment} Word Cloud', fontweight='bold')
+            fig.tight_layout()
+            return fig
+    @staticmethod
+    @handle_errors(default_return=None)
+    def create_batch_analysis(results: List[Dict], theme: ThemeContext) -> plt.Figure:
+        """Create comprehensive batch visualization"""
+        with managed_figure(figsize=config.FIGURE_SIZE_BATCH) as fig:
+            gs = fig.add_gridspec(2, 2, hspace=0.3, wspace=0.3)
+            # Sentiment distribution
+            ax1 = fig.add_subplot(gs[0, 0])
+            sent_counts = Counter([r['sentiment'] for r in results])
+            colors = [theme.colors['pos'], theme.colors['neg']]
+            ax1.pie(sent_counts.values(), labels=sent_counts.keys(),
+                   autopct='%1.1f%%', colors=colors[:len(sent_counts)])
+            ax1.set_title('Sentiment Distribution')
+            # Confidence histogram
+            ax2 = fig.add_subplot(gs[0, 1])
+            confs = [r['confidence'] for r in results]
+            ax2.hist(confs, bins=8, alpha=0.7, color='skyblue', edgecolor='black')
+            ax2.set_title('Confidence Distribution')
+            ax2.set_xlabel('Confidence')
+            # Sentiment over time
+            ax3 = fig.add_subplot(gs[1, :])
+            pos_probs = [r['pos_prob'] for r in results]
+            indices = range(len(results))
+            colors_scatter = [theme.colors['pos'] if r['sentiment'] == 'Positive'
+                            else theme.colors['neg'] for r in results]
+            ax3.scatter(indices, pos_probs, c=colors_scatter, alpha=0.7, s=60)
+            ax3.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5)
+            ax3.set_title('Sentiment Progression')
+            ax3.set_xlabel('Review Index')
+            ax3.set_ylabel('Positive Probability')
+            return fig