Spaces:

spamultrapromax
/

BrandScanAI

Running

App Files Files Community

BrandScanAI / README.md

Arun21102003

Deployment preparation (removed binary files)

90fe073 22 days ago

preview code

raw

history blame contribute delete

7.72 kB

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade

metadata

title: BrandScanAI
emoji: 🔍
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false

BrandScanAI: Open-Source Brand Monitoring with LLM Analysis

Overview

BrandScanAI is a comprehensive brand monitoring system that combines web search, content extraction, and AI-powered sentiment analysis to track brand mentions across the internet. Built with Streamlit and powered by open-source LLMs, it provides real-time insights into brand perception and media coverage.

Open-Source LLM APIs Explored

Primary Implementation: Groq + Llama Models

Model: Llama 3.1 8B Instant (via Groq API)
Tradeoffs:
- Speed: ⚡ Extremely fast inference (sub-second response times)
- Accuracy: 🎯 Good for sentiment analysis and structured extraction
- Documentation: 📚 Excellent Groq documentation with clear examples
- Cost: 💰 Very affordable ($0.27/1M tokens for Llama 3.1 8B)
- Limitations: Smaller context window compared to larger models

Alternative Models Considered

Llama 3.3 70B: Higher accuracy but slower inference and higher cost
Code Llama: Specialized for code analysis but less suitable for general text
Mistral 7B: Good balance but Groq's Llama 3.1 8B proved more reliable

Technical Challenges & Solutions

1. Web Crawling Challenges

Anti-bot measures: Implemented respectful delays (0.5s) and proper User-Agent headers
Content extraction: Used Trafilatura for robust article extraction vs. basic BeautifulSoup
Rate limiting: Graceful error handling with informative user feedback
Dynamic content: Limited JavaScript-heavy sites, focused on static content

2. LLM Querying Issues

JSON parsing errors: Enforced response_format={"type": "json_object"} in API calls
Inconsistent outputs: Implemented structured prompts with explicit JSON schema
Context length: Limited article content to 1000 characters for analysis
API reliability: Added retry logic and fallback error responses

3. Context Extraction Problems

Noise removal: Trafilatura effectively strips ads, navigation, and boilerplate
Metadata extraction: Combined Trafilatura metadata with BeautifulSoup fallback
Content quality: Implemented content length validation before analysis

Scalability & Robustness Improvements

Production-Ready Enhancements

Database Integration: SQLite/PostgreSQL for persistent storage and historical analysis
Queue System: Celery/Redis for background processing of large batches
Caching Layer: Redis for API response caching and rate limit management
Monitoring: Prometheus/Grafana for system health and performance tracking
Load Balancing: Multiple worker processes for concurrent analysis
Error Recovery: Retry mechanisms with exponential backoff
API Rate Limiting: Intelligent request throttling across multiple providers

Architecture Improvements

Microservices: Separate services for search, scraping, and analysis
Message Queues: Asynchronous processing for large-scale monitoring
CDN Integration: Cached content delivery for faster responses
Multi-region Deployment: Geographic distribution for global brand monitoring

LLM Comparison: Llama 3.1 8B vs Llama 3.3 70B

Test Case: Brand Sentiment Analysis

Input: "OpenAI's new GPT-4 model shows impressive capabilities but raises concerns about AI safety and job displacement."

Llama 3.1 8B Response:

{
  "explicit_mentions": [{
    "mention": "OpenAI's new GPT-4 model",
    "sentiment": "positive",
    "explanation": "Shows impressive capabilities"
  }],
  "indirect_mentions": [],
  "overall_sentiment": "neutral"
}

Llama 3.3 70B Response:

{
  "explicit_mentions": [{
    "mention": "OpenAI's new GPT-4 model",
    "sentiment": "positive", 
    "explanation": "Shows impressive capabilities"
  }],
  "indirect_mentions": [{
    "reference": "AI safety and job displacement",
    "sentiment": "negative",
    "explanation": "Raises concerns about negative impacts"
  }],
  "overall_sentiment": "neutral"
}

Key Differences:

3.3 70B: More nuanced analysis, catches indirect negative mentions
3.1 8B: Faster but misses subtle context and indirect references
Trade-off: 70B provides better accuracy but 3x slower and 10x more expensive

Setup & Installation

Prerequisites

Python 3.11+
API keys for Groq and SerpAPI

Installation

# Clone repository
git clone https://github.com/yourusername/brandscan-ai.git
cd brandscan-ai

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys

Environment Variables

# Required API Keys
GROQ_API_KEY=your_groq_api_key_here
SERPAPI_API_KEY=your_serpapi_key_here

# Database (optional - defaults to SQLite)
DATABASE_URL=sqlite:///./brandscan.db

API Key Setup

Groq API: Visit console.groq.com → Sign up → Get API key
SerpAPI: Visit serpapi.com → Sign up → Get API key

Running the Application

# Start the Streamlit app
streamlit run app.py

# Access at http://localhost:8501

Deploying to Hugging Face Spaces

Create a Space: Go to huggingface.co/new-space.
Configure:
- Name: brandscan-ai
- SDK: Streamlit
- Privacy: Public (or Private)
Upload Files: Upload all project files (except .venv, .env, and brandscan.db). The .hfignore file will handle this if you use Git.
Set Secrets: Go to Settings -> Variables and secrets -> New secret:
- GROQ_API_KEY: Your Groq API key
- SERPAPI_API_KEY: Your SerpAPI key
- DATABASE_URL: sqlite:///./brandscan.db (Note: SQLite is not persistent on HF Spaces. For persistence, use an external PostgreSQL DB or HF Datasets).
Wait for Build: Hugging Face will automatically build and deploy your app.

Usage

Configure Search: Enter search query and brand names
Select Engines: Choose from Google, Bing, DuckDuckGo
Run Analysis: Click "Start Batch Analysis"
View Results: Explore mentions, sentiment, and context
Export Data: Download CSV reports for further analysis

Dependencies

Core Libraries

streamlit>=1.50.0 - Web application framework
groq>=0.32.0 - Groq API client
serpapi>=0.1.5 - Google Search API
trafilatura>=2.0.0 - Web content extraction
sqlalchemy>=2.0.44 - Database ORM
pandas>=2.3.3 - Data manipulation
plotly>=6.3.1 - Interactive visualizations

Optional Dependencies

psycopg2-binary - PostgreSQL support
PyMySQL - MySQL support
duckduckgo-search - DuckDuckGo search
beautifulsoup4 - HTML parsing fallback

Features

🔍 Multi-Engine Search: Google, Bing, DuckDuckGo
🤖 AI-Powered Analysis: Sentiment analysis with context
📊 Interactive Dashboard: Real-time analytics and visualizations
💾 Data Export: CSV reports and database storage
⏰ Scheduled Monitoring: Automated recurring analysis
🕸️ Co-Mention Network: Brand relationship visualization
📈 Historical Tracking: Trend analysis over time

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Support

For issues and questions:

Create an issue on GitHub
Check the documentation
Review the troubleshooting guide