| # Cryptocurrency Data Aggregator - Complete Rewrite |
|
|
| A production-ready cryptocurrency data aggregation application with AI-powered analysis, real-time data collection, and an interactive Gradio dashboard. |
|
|
| ## Features |
|
|
| ### Core Capabilities |
| - **Real-time Price Tracking**: Monitor top 100 cryptocurrencies with live updates |
| - **AI-Powered Sentiment Analysis**: Using HuggingFace models for news sentiment |
| - **Market Analysis**: Technical indicators (MA, RSI), trend detection, predictions |
| - **News Aggregation**: RSS feeds from CoinDesk, Cointelegraph, Bitcoin.com, and Reddit |
| - **Interactive Dashboard**: 6-tab Gradio interface with auto-refresh |
| - **SQLite Database**: Persistent storage with full CRUD operations |
| - **No API Keys Required**: Uses only free data sources |
|
|
| ### Data Sources (All Free, No Authentication) |
| - **CoinGecko API**: Market data, prices, rankings |
| - **CoinCap API**: Backup price data source |
| - **Binance Public API**: Real-time trading data |
| - **Alternative.me**: Fear & Greed Index |
| - **RSS Feeds**: CoinDesk, Cointelegraph, Bitcoin Magazine, Decrypt, Bitcoinist |
| - **Reddit**: r/cryptocurrency, r/bitcoin, r/ethtrader, r/cryptomarkets |
|
|
| ### AI Models (HuggingFace - Local Inference) |
| - **cardiffnlp/twitter-roberta-base-sentiment-latest**: Social media sentiment |
| - **ProsusAI/finbert**: Financial news sentiment |
| - **facebook/bart-large-cnn**: News summarization |
|
|
| ## Project Structure |
|
|
| ``` |
| crypto-dt-source/ |
| ├── config.py # Configuration constants |
| ├── database.py # SQLite database with CRUD operations |
| ├── collectors.py # Data collection from all sources |
| ├── ai_models.py # HuggingFace model integration |
| ├── utils.py # Helper functions and utilities |
| ├── app.py # Main Gradio application |
| ├── requirements.txt # Python dependencies |
| ├── README.md # This file |
| ├── data/ |
| │ ├── database/ # SQLite database files |
| │ └── backups/ # Database backups |
| └── logs/ |
| └── crypto_aggregator.log # Application logs |
| ``` |
|
|
| ## Installation |
|
|
| ### Prerequisites |
| - Python 3.8 or higher |
| - 4GB+ RAM (for AI models) |
| - Internet connection |
|
|
| ### Step 1: Clone Repository |
| ```bash |
| git clone <repository-url> |
| cd crypto-dt-source |
| ``` |
|
|
| ### Step 2: Install Dependencies |
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| This will install: |
| - Gradio (web interface) |
| - Pandas, NumPy (data processing) |
| - Transformers, PyTorch (AI models) |
| - Plotly (charts) |
| - BeautifulSoup4, Feedparser (web scraping) |
| - And more... |
|
|
| ### Step 3: Run Application |
| ```bash |
| python app.py |
| ``` |
|
|
| The application will: |
| 1. Initialize the SQLite database |
| 2. Load AI models (first run may take 2-3 minutes) |
| 3. Start background data collection |
| 4. Launch Gradio interface |
|
|
| Access the dashboard at: **http://localhost:7860** |
|
|
| ## Gradio Dashboard |
|
|
| ### Tab 1: Live Dashboard 📊 |
| - Top 100 cryptocurrencies with real-time prices |
| - Columns: Rank, Name, Symbol, Price, 24h Change, Volume, Market Cap |
| - Auto-refresh every 30 seconds |
| - Search and filter functionality |
| - Color-coded price changes (green/red) |
|
|
| ### Tab 2: Historical Charts 📈 |
| - Select any cryptocurrency |
| - Choose timeframe: 1d, 7d, 30d, 90d, 1y, All |
| - Interactive Plotly charts with: |
| - Price line chart |
| - Volume bars |
| - MA(7) and MA(30) overlays |
| - RSI indicator |
| - Export charts as PNG |
|
|
| ### Tab 3: News & Sentiment 📰 |
| - Latest cryptocurrency news from 9+ sources |
| - Filter by sentiment: All, Positive, Neutral, Negative |
| - Filter by coin: BTC, ETH, etc. |
| - Each article shows: |
| - Title (clickable link) |
| - Source and date |
| - AI-generated sentiment score |
| - Summary |
| - Related coins |
| - Market sentiment gauge (0-100 scale) |
|
|
| ### Tab 4: AI Analysis 🤖 |
| - Select cryptocurrency |
| - Generate AI-powered analysis: |
| - Current trend (Bullish/Bearish/Neutral) |
| - Support/Resistance levels |
| - Technical indicators (RSI, MA7, MA30) |
| - 24-72h prediction |
| - Confidence score |
| - Analysis saved to database for history |
|
|
| ### Tab 5: Database Explorer 🗄️ |
| - Pre-built SQL queries: |
| - Top 10 gainers in last 24h |
| - All positive sentiment news |
| - Price history for any coin |
| - Database statistics |
| - Custom SQL query support (read-only for security) |
| - Export results to CSV |
|
|
| ### Tab 6: Data Sources Status 🔍 |
| - Real-time status monitoring: |
| - CoinGecko API ✓ |
| - CoinCap API ✓ |
| - Binance API ✓ |
| - RSS feeds (5 sources) ✓ |
| - Reddit endpoints (4 subreddits) ✓ |
| - Database connection ✓ |
| - Shows: Status (🟢/🔴), Last Update, Error Count |
| - Manual refresh and data collection controls |
| - Error log viewer |
|
|
| ## Database Schema |
|
|
| ### `prices` Table |
| - `id`: Primary key |
| - `symbol`: Coin symbol (e.g., "bitcoin") |
| - `name`: Full name (e.g., "Bitcoin") |
| - `price_usd`: Current price in USD |
| - `volume_24h`: 24-hour trading volume |
| - `market_cap`: Market capitalization |
| - `percent_change_1h`, `percent_change_24h`, `percent_change_7d`: Price changes |
| - `rank`: Market cap rank |
| - `timestamp`: Record timestamp |
|
|
| ### `news` Table |
| - `id`: Primary key |
| - `title`: News article title |
| - `summary`: AI-generated summary |
| - `url`: Article URL (unique) |
| - `source`: Source name (e.g., "CoinDesk") |
| - `sentiment_score`: Float (-1 to 1) |
| - `sentiment_label`: Label (positive/negative/neutral) |
| - `related_coins`: JSON array of coin symbols |
| - `published_date`: Original publication date |
| - `timestamp`: Record timestamp |
|
|
| ### `market_analysis` Table |
| - `id`: Primary key |
| - `symbol`: Coin symbol |
| - `timeframe`: Analysis period |
| - `trend`: Trend direction (Bullish/Bearish/Neutral) |
| - `support_level`, `resistance_level`: Price levels |
| - `prediction`: Text prediction |
| - `confidence`: Confidence score (0-1) |
| - `timestamp`: Analysis timestamp |
| |
| ### `user_queries` Table |
| - `id`: Primary key |
| - `query`: SQL query or search term |
| - `result_count`: Number of results |
| - `timestamp`: Query timestamp |
|
|
| ## Configuration |
|
|
| Edit `config.py` to customize: |
|
|
| ```python |
| # Data collection intervals |
| COLLECTION_INTERVALS = { |
| "price_data": 300, # 5 minutes |
| "news_data": 1800, # 30 minutes |
| "sentiment_data": 1800 # 30 minutes |
| } |
| |
| # Number of coins to track |
| TOP_COINS_LIMIT = 100 |
| |
| # Gradio settings |
| GRADIO_SERVER_PORT = 7860 |
| AUTO_REFRESH_INTERVAL = 30 # seconds |
| |
| # Cache settings |
| CACHE_TTL = 300 # 5 minutes |
| CACHE_MAX_SIZE = 1000 |
| |
| # Logging |
| LOG_LEVEL = "INFO" |
| LOG_FILE = "logs/crypto_aggregator.log" |
| ``` |
|
|
| ## API Usage Examples |
|
|
| ### Collect Data Manually |
| ```python |
| from collectors import collect_price_data, collect_news_data |
| |
| # Collect latest prices |
| success, count = collect_price_data() |
| print(f"Collected {count} prices") |
| |
| # Collect news |
| count = collect_news_data() |
| print(f"Collected {count} articles") |
| ``` |
|
|
| ### Query Database |
| ```python |
| from database import get_database |
| |
| db = get_database() |
| |
| # Get latest prices |
| prices = db.get_latest_prices(limit=10) |
| |
| # Get news by coin |
| news = db.get_news_by_coin("bitcoin", limit=5) |
| |
| # Get top gainers |
| gainers = db.get_top_gainers(limit=10) |
| ``` |
|
|
| ### AI Analysis |
| ```python |
| from ai_models import analyze_sentiment, analyze_market_trend |
| from database import get_database |
| |
| # Analyze sentiment |
| result = analyze_sentiment("Bitcoin hits new all-time high!") |
| print(result) # {'label': 'positive', 'score': 0.95, 'confidence': 0.92} |
| |
| # Analyze market trend |
| db = get_database() |
| history = db.get_price_history("bitcoin", hours=168) |
| analysis = analyze_market_trend(history) |
| print(analysis) # {'trend': 'Bullish', 'support_level': 50000, ...} |
| ``` |
|
|
| ## Error Handling & Resilience |
|
|
| ### Fallback Mechanisms |
| - If CoinGecko fails → CoinCap is used |
| - If both APIs fail → cached database data is used |
| - If AI models fail to load → keyword-based sentiment analysis |
| - All network requests have timeout and retry logic |
|
|
| ### Data Validation |
| - Price bounds checking (MIN_PRICE to MAX_PRICE) |
| - Volume and market cap validation |
| - Duplicate prevention (unique URLs for news) |
| - SQL injection prevention (read-only queries only) |
|
|
| ### Logging |
| All operations are logged to `logs/crypto_aggregator.log`: |
| - Info: Successful operations, data collection |
| - Warning: API failures, retries |
| - Error: Database errors, critical failures |
|
|
| ## Performance Optimization |
|
|
| - **Async/Await**: All network requests use aiohttp |
| - **Connection Pooling**: Reused HTTP connections |
| - **Caching**: In-memory cache with 5-minute TTL |
| - **Batch Inserts**: Minimum 100 records per database insert |
| - **Indexed Queries**: Database indexes on symbol, timestamp, sentiment |
| - **Lazy Loading**: AI models load only when first used |
|
|
| ## Troubleshooting |
|
|
| ### Issue: Models won't load |
| **Solution**: Ensure you have 4GB+ RAM. Models download on first run (2-3 min). |
|
|
| ### Issue: No data appearing |
| **Solution**: Wait 5 minutes for initial data collection, or click "Refresh" buttons. |
|
|
| ### Issue: Port 7860 already in use |
| **Solution**: Change `GRADIO_SERVER_PORT` in `config.py` or kill existing process. |
|
|
| ### Issue: Database locked |
| **Solution**: Only one process can write at a time. Close other instances. |
|
|
| ### Issue: RSS feeds failing |
| **Solution**: Some feeds may be temporarily down. Check Tab 6 for status. |
|
|
| ## Development |
|
|
| ### Running Tests |
| ```bash |
| # Test data collection |
| python collectors.py |
| |
| # Test AI models |
| python ai_models.py |
| |
| # Test utilities |
| python utils.py |
| |
| # Test database |
| python database.py |
| ``` |
|
|
| ### Adding New Data Sources |
|
|
| Edit `collectors.py`: |
| ```python |
| def collect_new_source(): |
| try: |
| response = safe_api_call("https://api.example.com/data") |
| # Parse and save data |
| return True |
| except Exception as e: |
| logger.error(f"Error: {e}") |
| return False |
| ``` |
|
|
| Add to scheduler in `collectors.py`: |
| ```python |
| # In schedule_data_collection() |
| threading.Timer(interval, collect_new_source).start() |
| ``` |
|
|
| ## Validation Checklist |
|
|
| - [x] All 8 files complete |
| - [x] No TODO or FIXME comments |
| - [x] No placeholder functions |
| - [x] All imports in requirements.txt |
| - [x] Database schema matches specification |
| - [x] All 6 Gradio tabs implemented |
| - [x] All 3 AI models integrated |
| - [x] All 5+ data sources configured |
| - [x] Error handling in every network call |
| - [x] Logging for all major operations |
| - [x] No API keys in code |
| - [x] Comments in English |
| - [x] PEP 8 compliant |
|
|
| ## License |
|
|
| MIT License - Free to use, modify, and distribute. |
|
|
| ## Support |
|
|
| For issues or questions: |
| - Check logs: `logs/crypto_aggregator.log` |
| - Review error messages in Tab 6 |
| - Ensure all dependencies installed: `pip install -r requirements.txt` |
|
|
| ## Credits |
|
|
| - **Data Sources**: CoinGecko, CoinCap, Binance, Alternative.me, CoinDesk, Cointelegraph, Reddit |
| - **AI Models**: HuggingFace (Cardiff NLP, ProsusAI, Facebook) |
| - **Framework**: Gradio |
|
|
| --- |
|
|
| **Made with ❤️ for the Crypto Community** |
|
|