Spaces:
Running
Running
| title: BrandScanAI | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: streamlit | |
| sdk_version: 1.31.0 | |
| app_file: app.py | |
| pinned: false | |
| # BrandScanAI: Open-Source Brand Monitoring with LLM Analysis | |
| ## Overview | |
| BrandScanAI is a comprehensive brand monitoring system that combines web search, content extraction, and AI-powered sentiment analysis to track brand mentions across the internet. Built with Streamlit and powered by open-source LLMs, it provides real-time insights into brand perception and media coverage. | |
| ## Open-Source LLM APIs Explored | |
| ### Primary Implementation: Groq + Llama Models | |
| - **Model**: Llama 3.1 8B Instant (via Groq API) | |
| - **Tradeoffs**: | |
| - **Speed**: β‘ Extremely fast inference (sub-second response times) | |
| - **Accuracy**: π― Good for sentiment analysis and structured extraction | |
| - **Documentation**: π Excellent Groq documentation with clear examples | |
| - **Cost**: π° Very affordable ($0.27/1M tokens for Llama 3.1 8B) | |
| - **Limitations**: Smaller context window compared to larger models | |
| ### Alternative Models Considered | |
| - **Llama 3.3 70B**: Higher accuracy but slower inference and higher cost | |
| - **Code Llama**: Specialized for code analysis but less suitable for general text | |
| - **Mistral 7B**: Good balance but Groq's Llama 3.1 8B proved more reliable | |
| ## Technical Challenges & Solutions | |
| ### 1. Web Crawling Challenges | |
| - **Anti-bot measures**: Implemented respectful delays (0.5s) and proper User-Agent headers | |
| - **Content extraction**: Used Trafilatura for robust article extraction vs. basic BeautifulSoup | |
| - **Rate limiting**: Graceful error handling with informative user feedback | |
| - **Dynamic content**: Limited JavaScript-heavy sites, focused on static content | |
| ### 2. LLM Querying Issues | |
| - **JSON parsing errors**: Enforced `response_format={"type": "json_object"}` in API calls | |
| - **Inconsistent outputs**: Implemented structured prompts with explicit JSON schema | |
| - **Context length**: Limited article content to 1000 characters for analysis | |
| - **API reliability**: Added retry logic and fallback error responses | |
| ### 3. Context Extraction Problems | |
| - **Noise removal**: Trafilatura effectively strips ads, navigation, and boilerplate | |
| - **Metadata extraction**: Combined Trafilatura metadata with BeautifulSoup fallback | |
| - **Content quality**: Implemented content length validation before analysis | |
| ## Scalability & Robustness Improvements | |
| ### Production-Ready Enhancements | |
| 1. **Database Integration**: SQLite/PostgreSQL for persistent storage and historical analysis | |
| 2. **Queue System**: Celery/Redis for background processing of large batches | |
| 3. **Caching Layer**: Redis for API response caching and rate limit management | |
| 4. **Monitoring**: Prometheus/Grafana for system health and performance tracking | |
| 5. **Load Balancing**: Multiple worker processes for concurrent analysis | |
| 6. **Error Recovery**: Retry mechanisms with exponential backoff | |
| 7. **API Rate Limiting**: Intelligent request throttling across multiple providers | |
| ### Architecture Improvements | |
| - **Microservices**: Separate services for search, scraping, and analysis | |
| - **Message Queues**: Asynchronous processing for large-scale monitoring | |
| - **CDN Integration**: Cached content delivery for faster responses | |
| - **Multi-region Deployment**: Geographic distribution for global brand monitoring | |
| ## LLM Comparison: Llama 3.1 8B vs Llama 3.3 70B | |
| ### Test Case: Brand Sentiment Analysis | |
| **Input**: "OpenAI's new GPT-4 model shows impressive capabilities but raises concerns about AI safety and job displacement." | |
| ### Llama 3.1 8B Response: | |
| ```json | |
| { | |
| "explicit_mentions": [{ | |
| "mention": "OpenAI's new GPT-4 model", | |
| "sentiment": "positive", | |
| "explanation": "Shows impressive capabilities" | |
| }], | |
| "indirect_mentions": [], | |
| "overall_sentiment": "neutral" | |
| } | |
| ``` | |
| ### Llama 3.3 70B Response: | |
| ```json | |
| { | |
| "explicit_mentions": [{ | |
| "mention": "OpenAI's new GPT-4 model", | |
| "sentiment": "positive", | |
| "explanation": "Shows impressive capabilities" | |
| }], | |
| "indirect_mentions": [{ | |
| "reference": "AI safety and job displacement", | |
| "sentiment": "negative", | |
| "explanation": "Raises concerns about negative impacts" | |
| }], | |
| "overall_sentiment": "neutral" | |
| } | |
| ``` | |
| **Key Differences**: | |
| - **3.3 70B**: More nuanced analysis, catches indirect negative mentions | |
| - **3.1 8B**: Faster but misses subtle context and indirect references | |
| - **Trade-off**: 70B provides better accuracy but 3x slower and 10x more expensive | |
| ## Setup & Installation | |
| ### Prerequisites | |
| - Python 3.11+ | |
| - API keys for Groq and SerpAPI | |
| ### Installation | |
| ```bash | |
| # Clone repository | |
| git clone https://github.com/yourusername/brandscan-ai.git | |
| cd brandscan-ai | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Set up environment variables | |
| cp .env.example .env | |
| # Edit .env with your API keys | |
| ``` | |
| ### Environment Variables | |
| ```bash | |
| # Required API Keys | |
| GROQ_API_KEY=your_groq_api_key_here | |
| SERPAPI_API_KEY=your_serpapi_key_here | |
| # Database (optional - defaults to SQLite) | |
| DATABASE_URL=sqlite:///./brandscan.db | |
| ``` | |
| ### API Key Setup | |
| 1. **Groq API**: Visit [console.groq.com](https://console.groq.com) β Sign up β Get API key | |
| 2. **SerpAPI**: Visit [serpapi.com](https://serpapi.com) β Sign up β Get API key | |
| ### Running the Application | |
| ```bash | |
| # Start the Streamlit app | |
| streamlit run app.py | |
| # Access at http://localhost:8501 | |
| ``` | |
| ## Deploying to Hugging Face Spaces | |
| 1. **Create a Space**: Go to [huggingface.co/new-space](https://huggingface.co/new-space). | |
| 2. **Configure**: | |
| - **Name**: `brandscan-ai` | |
| - **SDK**: Streamlit | |
| - **Privacy**: Public (or Private) | |
| 3. **Upload Files**: Upload all project files (except `.venv`, `.env`, and `brandscan.db`). The `.hfignore` file will handle this if you use Git. | |
| 4. **Set Secrets**: Go to **Settings** -> **Variables and secrets** -> **New secret**: | |
| - `GROQ_API_KEY`: Your Groq API key | |
| - `SERPAPI_API_KEY`: Your SerpAPI key | |
| - `DATABASE_URL`: `sqlite:///./brandscan.db` (Note: SQLite is not persistent on HF Spaces. For persistence, use an external PostgreSQL DB or HF Datasets). | |
| 5. **Wait for Build**: Hugging Face will automatically build and deploy your app. | |
| ## Usage | |
| 1. **Configure Search**: Enter search query and brand names | |
| 2. **Select Engines**: Choose from Google, Bing, DuckDuckGo | |
| 3. **Run Analysis**: Click "Start Batch Analysis" | |
| 4. **View Results**: Explore mentions, sentiment, and context | |
| 5. **Export Data**: Download CSV reports for further analysis | |
| ## Dependencies | |
| ### Core Libraries | |
| - `streamlit>=1.50.0` - Web application framework | |
| - `groq>=0.32.0` - Groq API client | |
| - `serpapi>=0.1.5` - Google Search API | |
| - `trafilatura>=2.0.0` - Web content extraction | |
| - `sqlalchemy>=2.0.44` - Database ORM | |
| - `pandas>=2.3.3` - Data manipulation | |
| - `plotly>=6.3.1` - Interactive visualizations | |
| ### Optional Dependencies | |
| - `psycopg2-binary` - PostgreSQL support | |
| - `PyMySQL` - MySQL support | |
| - `duckduckgo-search` - DuckDuckGo search | |
| - `beautifulsoup4` - HTML parsing fallback | |
| ## Features | |
| - π **Multi-Engine Search**: Google, Bing, DuckDuckGo | |
| - π€ **AI-Powered Analysis**: Sentiment analysis with context | |
| - π **Interactive Dashboard**: Real-time analytics and visualizations | |
| - πΎ **Data Export**: CSV reports and database storage | |
| - β° **Scheduled Monitoring**: Automated recurring analysis | |
| - πΈοΈ **Co-Mention Network**: Brand relationship visualization | |
| - π **Historical Tracking**: Trend analysis over time | |
| ## License | |
| MIT License - see LICENSE file for details. | |
| ## Contributing | |
| 1. Fork the repository | |
| 2. Create a feature branch | |
| 3. Make your changes | |
| 4. Add tests if applicable | |
| 5. Submit a pull request | |
| ## Support | |
| For issues and questions: | |
| - Create an issue on GitHub | |
| - Check the documentation | |
| - Review the troubleshooting guide | |