Spaces:
Running
Running
A newer version of the Streamlit SDK is available: 1.58.0
metadata
title: BrandScanAI
emoji: π
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.31.0
app_file: app.py
pinned: false
BrandScanAI: Open-Source Brand Monitoring with LLM Analysis
Overview
BrandScanAI is a comprehensive brand monitoring system that combines web search, content extraction, and AI-powered sentiment analysis to track brand mentions across the internet. Built with Streamlit and powered by open-source LLMs, it provides real-time insights into brand perception and media coverage.
Open-Source LLM APIs Explored
Primary Implementation: Groq + Llama Models
- Model: Llama 3.1 8B Instant (via Groq API)
- Tradeoffs:
- Speed: β‘ Extremely fast inference (sub-second response times)
- Accuracy: π― Good for sentiment analysis and structured extraction
- Documentation: π Excellent Groq documentation with clear examples
- Cost: π° Very affordable ($0.27/1M tokens for Llama 3.1 8B)
- Limitations: Smaller context window compared to larger models
Alternative Models Considered
- Llama 3.3 70B: Higher accuracy but slower inference and higher cost
- Code Llama: Specialized for code analysis but less suitable for general text
- Mistral 7B: Good balance but Groq's Llama 3.1 8B proved more reliable
Technical Challenges & Solutions
1. Web Crawling Challenges
- Anti-bot measures: Implemented respectful delays (0.5s) and proper User-Agent headers
- Content extraction: Used Trafilatura for robust article extraction vs. basic BeautifulSoup
- Rate limiting: Graceful error handling with informative user feedback
- Dynamic content: Limited JavaScript-heavy sites, focused on static content
2. LLM Querying Issues
- JSON parsing errors: Enforced
response_format={"type": "json_object"}in API calls - Inconsistent outputs: Implemented structured prompts with explicit JSON schema
- Context length: Limited article content to 1000 characters for analysis
- API reliability: Added retry logic and fallback error responses
3. Context Extraction Problems
- Noise removal: Trafilatura effectively strips ads, navigation, and boilerplate
- Metadata extraction: Combined Trafilatura metadata with BeautifulSoup fallback
- Content quality: Implemented content length validation before analysis
Scalability & Robustness Improvements
Production-Ready Enhancements
- Database Integration: SQLite/PostgreSQL for persistent storage and historical analysis
- Queue System: Celery/Redis for background processing of large batches
- Caching Layer: Redis for API response caching and rate limit management
- Monitoring: Prometheus/Grafana for system health and performance tracking
- Load Balancing: Multiple worker processes for concurrent analysis
- Error Recovery: Retry mechanisms with exponential backoff
- API Rate Limiting: Intelligent request throttling across multiple providers
Architecture Improvements
- Microservices: Separate services for search, scraping, and analysis
- Message Queues: Asynchronous processing for large-scale monitoring
- CDN Integration: Cached content delivery for faster responses
- Multi-region Deployment: Geographic distribution for global brand monitoring
LLM Comparison: Llama 3.1 8B vs Llama 3.3 70B
Test Case: Brand Sentiment Analysis
Input: "OpenAI's new GPT-4 model shows impressive capabilities but raises concerns about AI safety and job displacement."
Llama 3.1 8B Response:
{
"explicit_mentions": [{
"mention": "OpenAI's new GPT-4 model",
"sentiment": "positive",
"explanation": "Shows impressive capabilities"
}],
"indirect_mentions": [],
"overall_sentiment": "neutral"
}
Llama 3.3 70B Response:
{
"explicit_mentions": [{
"mention": "OpenAI's new GPT-4 model",
"sentiment": "positive",
"explanation": "Shows impressive capabilities"
}],
"indirect_mentions": [{
"reference": "AI safety and job displacement",
"sentiment": "negative",
"explanation": "Raises concerns about negative impacts"
}],
"overall_sentiment": "neutral"
}
Key Differences:
- 3.3 70B: More nuanced analysis, catches indirect negative mentions
- 3.1 8B: Faster but misses subtle context and indirect references
- Trade-off: 70B provides better accuracy but 3x slower and 10x more expensive
Setup & Installation
Prerequisites
- Python 3.11+
- API keys for Groq and SerpAPI
Installation
# Clone repository
git clone https://github.com/yourusername/brandscan-ai.git
cd brandscan-ai
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env
# Edit .env with your API keys
Environment Variables
# Required API Keys
GROQ_API_KEY=your_groq_api_key_here
SERPAPI_API_KEY=your_serpapi_key_here
# Database (optional - defaults to SQLite)
DATABASE_URL=sqlite:///./brandscan.db
API Key Setup
- Groq API: Visit console.groq.com β Sign up β Get API key
- SerpAPI: Visit serpapi.com β Sign up β Get API key
Running the Application
# Start the Streamlit app
streamlit run app.py
# Access at http://localhost:8501
Deploying to Hugging Face Spaces
- Create a Space: Go to huggingface.co/new-space.
- Configure:
- Name:
brandscan-ai - SDK: Streamlit
- Privacy: Public (or Private)
- Name:
- Upload Files: Upload all project files (except
.venv,.env, andbrandscan.db). The.hfignorefile will handle this if you use Git. - Set Secrets: Go to Settings -> Variables and secrets -> New secret:
GROQ_API_KEY: Your Groq API keySERPAPI_API_KEY: Your SerpAPI keyDATABASE_URL:sqlite:///./brandscan.db(Note: SQLite is not persistent on HF Spaces. For persistence, use an external PostgreSQL DB or HF Datasets).
- Wait for Build: Hugging Face will automatically build and deploy your app.
Usage
- Configure Search: Enter search query and brand names
- Select Engines: Choose from Google, Bing, DuckDuckGo
- Run Analysis: Click "Start Batch Analysis"
- View Results: Explore mentions, sentiment, and context
- Export Data: Download CSV reports for further analysis
Dependencies
Core Libraries
streamlit>=1.50.0- Web application frameworkgroq>=0.32.0- Groq API clientserpapi>=0.1.5- Google Search APItrafilatura>=2.0.0- Web content extractionsqlalchemy>=2.0.44- Database ORMpandas>=2.3.3- Data manipulationplotly>=6.3.1- Interactive visualizations
Optional Dependencies
psycopg2-binary- PostgreSQL supportPyMySQL- MySQL supportduckduckgo-search- DuckDuckGo searchbeautifulsoup4- HTML parsing fallback
Features
- π Multi-Engine Search: Google, Bing, DuckDuckGo
- π€ AI-Powered Analysis: Sentiment analysis with context
- π Interactive Dashboard: Real-time analytics and visualizations
- πΎ Data Export: CSV reports and database storage
- β° Scheduled Monitoring: Automated recurring analysis
- πΈοΈ Co-Mention Network: Brand relationship visualization
- π Historical Tracking: Trend analysis over time
License
MIT License - see LICENSE file for details.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
Support
For issues and questions:
- Create an issue on GitHub
- Check the documentation
- Review the troubleshooting guide