Spaces:

arpit13
/

Whale_Arbitrum

Build error

App Files Files Community

arpit13 commited on Apr 20, 2025

Commit

011960a

1 Parent(s): b853c37

Deploy Whale_Arbitrum on HF Spaces

Browse files

Files changed (20) hide show

.env +11 -0
README.md +135 -0
app.py +719 -0
modules/__init__.py +1 -0
modules/__pycache__/__init__.cpython-312.pyc +0 -0
modules/__pycache__/api_client.cpython-312.pyc +0 -0
modules/__pycache__/crew_system.cpython-312.pyc +0 -0
modules/__pycache__/crew_tools.cpython-312.pyc +0 -0
modules/__pycache__/data_processor.cpython-312.pyc +0 -0
modules/__pycache__/detection.cpython-312.pyc +0 -0
modules/__pycache__/visualizer.cpython-312.pyc +0 -0
modules/api_client.py +768 -0
modules/crew_system.py +1117 -0
modules/crew_tools.py +362 -0
modules/data_processor.py +1425 -0
modules/detection.py +684 -0
modules/tools.py +373 -0
modules/visualizer.py +638 -0
requirements.txt +12 -0
test_api.py +205 -0

.env ADDED Viewed

	@@ -0,0 +1,11 @@

+# Your current API key appears to be having issues
+# Please replace it with your own key from https://arbiscan.io/myapikey
+# Uncomment one of the API keys below or add your own
+ARBISCAN_API_KEY=4YEN1UTUEZ8I8ZBWSZW5NH6ZDFYEUVKQ5U
+# ARBISCAN_API_KEY=HVZC2W3IZWCGJWS8QDBZ56D1GZZNDJMZ25
+# Gemini API key for price data
+GEMINI_API_KEY=AIzaSyCyble5D3dlgPxDXWLlaZmu8hOM_nt-V6M
+# OpenAI API key for CrewAI functionality
+OPENAI_API_KEY=your-openai-api-key

README.md ADDED Viewed

	@@ -0,0 +1,135 @@

+# Whale Wallet AI – Market Manipulation Detection
+A powerful Streamlit-based tool that tracks large holders ("whales") on the Arbitrum network to uncover potential market manipulation tactics.
+## 1. Prerequisites & Setup
+### 1.1. Python & Dependencies
+- Ensure you have Python 3.8+ installed.
+- Install required packages via:
+  ```bash
+  pip install -r requirements.txt
+  ```
+### 1.2. API Keys
+You need API keys to fetch on-chain data and real-time prices:
+- **ARBISCAN_API_KEY**: For fetching Arbitrum transaction data
+- **GEMINI_API_KEY**: For retrieving live token prices
+- **OPENAI_API_KEY**: For powering the CrewAI agents
+Save these in a file named `.env` at the project root:
+```env
+ARBISCAN_API_KEY=your_arbiscan_key
+GEMINI_API_KEY=your_gemini_key
+OPENAI_API_KEY=your_openai_key
+```
+Note: Sample API keys are provided in the default .env file, but you should replace them with your own for production use.
+### 1.3. Run the App
+Launch the web interface with:
+```bash
+streamlit run app.py
+```
+## 2. Core Features & How to Use Them
+### 2.1 Track Large Buy/Sell Transactions
+**What it does:**
+Monitors on-chain transfers exceeding a configurable threshold (e.g., 1,000 tokens or $100K) for any wallet or contract you specify.
+**How to use:**
+1. In the sidebar, enter one or more wallet addresses
+2. Set your minimum token or USD value filter
+3. Click **Track Transactions**
+4. The dashboard will list incoming/outgoing transfers above the threshold.
+### 2.2 Identify Trading Patterns of Whale Wallets
+**What it does:**
+Uses time-series clustering and sequence analysis to surface recurring behaviors (e.g., cyclical dumping, accumulation bursts).
+**How to use:**
+1. Select a wallet address
+2. Choose a time period (e.g., last 7 days)
+3. Click **Analyze Patterns**
+4. View a summary of detected clusters and drill down into individual events.
+### 2.3 Analyze Impact of Whale Transactions on Token Prices
+**What it does:**
+Correlates large trades against minute-by-minute price ticks to quantify slippage, price spikes, or dumps.
+**How to use:**
+1. Enable **Price Impact** analysis in settings
+2. Specify lookback/lookahead windows (e.g., 5 minutes)
+3. Click **Run Impact Analysis**
+4. See interactive line charts and slippage metrics.
+### 2.4 Detect Potential Market Manipulation Techniques
+**What it does:**
+Automatically flags suspicious behaviors such as:
+- **Pump-and-Dump:** Rapid buys followed by coordinated sell-offs
+- **Wash Trading:** Self-trading across multiple addresses
+- **Spoofing:** Large orders placed then canceled
+**How to use:**
+1. Toggle **Manipulation Detection** on
+2. Adjust sensitivity slider (Low/Medium/High)
+3. Click **Detect**
+4. Examine the **Alerts** panel for flagged events.
+### 2.5 Generate Reports & Visualizations
+**What it does:**
+Compiles whale activity into PDF/CSV summaries and interactive charts.
+**How to use:**
+1. Select **Export** in the top menu
+2. Choose **CSV**, **PDF**, or **PNG**
+3. Specify time range and wallets to include
+4. Click **Download**
+5. Saved file will appear in your browser's download folder.
+## 3. Advanced Features: CrewAI Integration
+This application leverages CrewAI to provide advanced analysis through specialized AI agents:
+- **Blockchain Data Collector**: Extracts and organizes on-chain data
+- **Price Impact Analyst**: Correlates trading activity with price movements
+- **Trading Pattern Detector**: Identifies recurring behavioral patterns
+- **Market Manipulation Investigator**: Detects potential market abuse
+- **Insights Reporter**: Transforms data into actionable intelligence
+## 4. Project Structure
+```
+/Whale_Arbitrum/
+├── app.py                # Main Streamlit application entry point
+├── requirements.txt      # Dependencies and package versions
+├── .env                  # API keys and environment variables
+├── modules/
+│   ├── api_client.py     # Arbiscan and Gemini API clients
+│   ├── data_processor.py # Data processing and analysis
+│   ├── detection.py      # Market manipulation detection algorithms
+│   ├── visualizer.py     # Visualization and report generation
+│   └── crew_system.py    # CrewAI agentic system
+```
+## 5. Use Cases
+- **Regulatory Compliance & Fraud Detection**
+  Auditors and regulators can monitor DeFi markets for wash trades and suspicious dumps.
+- **Investment Strategy Optimization**
+  Traders gain insight into institutional flows and can calibrate entry/exit points.
+- **Market Research & Analysis**
+  Researchers study whale behavior to gauge token health and potential volatility.
+- **DeFi Protocol Security Monitoring**
+  Protocol teams receive alerts on large dumps that may destabilize liquidity pools.
+- **Token Project Risk Assessment**
+  Token issuers review top-holder actions to flag governance or distribution issues.

app.py ADDED Viewed

	@@ -0,0 +1,719 @@

+import streamlit as st
+import pandas as pd
+import numpy as np
+import plotly.express as px
+import plotly.graph_objects as go
+import os
+import json
+import logging
+import time
+from datetime import datetime, timedelta
+from typing import Dict, List, Optional, Union, Any
+from dotenv import load_dotenv
+# Configure logging - Reduce verbosity and improve performance
+logging.basicConfig(
+    level=logging.WARNING,  # Only show warnings and errors by default
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+# Create a custom filter to suppress repetitive Gemini API errors
+class SuppressRepetitiveErrors(logging.Filter):
+    def __init__(self):
+        super().__init__()
+        self.error_counts = {}
+        self.max_errors = 3  # Show at most 3 instances of each error
+    def filter(self, record):
+        if record.levelno < logging.WARNING:
+            return True
+        # If it's a Gemini API error for non-existent tokens, suppress it after a few occurrences
+        if 'Error fetching historical prices from Gemini API' in record.getMessage():
+            key = 'gemini_api_error'
+            self.error_counts[key] = self.error_counts.get(key, 0) + 1
+            # Only allow the first few errors through
+            return self.error_counts[key] <= self.max_errors
+        return True
+# Apply the filter
+logging.getLogger().addFilter(SuppressRepetitiveErrors())
+from modules.api_client import ArbiscanClient, GeminiClient
+from modules.data_processor import DataProcessor
+from modules.visualizer import Visualizer
+from modules.detection import ManipulationDetector
+# Load environment variables
+load_dotenv()
+# Set page configuration
+st.set_page_config(
+    page_title="Whale Wallet AI - Market Manipulation Detection",
+    page_icon="🐳",
+    layout="wide",
+    initial_sidebar_state="expanded"
+)
+# Add custom CSS
+st.markdown("""
+<style>
+    .main-header {
+        font-size: 2.5rem;
+        color: #1E88E5;
+        text-align: center;
+        margin-bottom: 1rem;
+    }
+    .sub-header {
+        font-size: 1.5rem;
+        color: #424242;
+        margin-bottom: 1rem;
+    }
+    .info-text {
+        background-color: #E3F2FD;
+        padding: 1rem;
+        border-radius: 0.5rem;
+        margin-bottom: 1rem;
+    }
+    .stButton>button {
+        width: 100%;
+    }
+</style>
+""", unsafe_allow_html=True)
+# Initialize Streamlit session state for persisting data between tab navigation
+if 'transactions_data' not in st.session_state:
+    st.session_state.transactions_data = pd.DataFrame()
+if 'patterns_data' not in st.session_state:
+    st.session_state.patterns_data = None
+if 'price_impact_data' not in st.session_state:
+    st.session_state.price_impact_data = None
+# Performance metrics tracking
+if 'performance_metrics' not in st.session_state:
+    st.session_state.performance_metrics = {
+        'api_calls': 0,
+        'data_processing_time': 0,
+        'visualization_time': 0,
+        'last_refresh': None
+    }
+# Function to track performance
+def track_timing(category: str):
+    def timing_decorator(func):
+        def wrapper(*args, **kwargs):
+            start_time = time.time()
+            result = func(*args, **kwargs)
+            elapsed = time.time() - start_time
+            if category in st.session_state.performance_metrics:
+                st.session_state.performance_metrics[category] += elapsed
+            else:
+                st.session_state.performance_metrics[category] = elapsed
+            return result
+        return wrapper
+    return timing_decorator
+if 'alerts_data' not in st.session_state:
+    st.session_state.alerts_data = None
+# Initialize API clients
+arbiscan_client = ArbiscanClient(os.getenv("ARBISCAN_API_KEY"))
+# Set debug mode to False to reduce log output
+arbiscan_client.verbose_debug = False
+gemini_client = GeminiClient(os.getenv("GEMINI_API_KEY"))
+# Initialize data processor and visualizer
+data_processor = DataProcessor()
+visualizer = Visualizer()
+# Apply performance tracking to key instance methods after initialization
+original_fetch_whale = arbiscan_client.fetch_whale_transactions
+arbiscan_client.fetch_whale_transactions = track_timing('api_calls')(original_fetch_whale)
+original_identify_patterns = data_processor.identify_patterns
+data_processor.identify_patterns = track_timing('data_processing_time')(original_identify_patterns)
+original_analyze_price_impact = data_processor.analyze_price_impact
+data_processor.analyze_price_impact = track_timing('data_processing_time')(original_analyze_price_impact)
+detection = ManipulationDetector()
+# Initialize crew system (for AI-assisted analysis)
+try:
+    from modules.crew_system import WhaleAnalysisCrewSystem
+    crew_system = WhaleAnalysisCrewSystem(arbiscan_client, gemini_client, data_processor)
+    CREW_ENABLED = True
+    logging.info("CrewAI system loaded successfully")
+except Exception as e:
+    CREW_ENABLED = False
+    logging.error(f"Failed to load CrewAI system: {str(e)}")
+    st.sidebar.error("CrewAI features are disabled due to an error.")
+# Sidebar for inputs
+st.sidebar.header("Configuration")
+# Wallet tracking section
+st.sidebar.subheader("Track Wallets")
+wallet_addresses = st.sidebar.text_area(
+    "Enter wallet addresses (one per line)",
+    placeholder="0x1234abcd...\n0xabcd1234..."
+)
+threshold_type = st.sidebar.radio(
+    "Threshold Type",
+    ["Token Amount", "USD Value"]
+)
+if threshold_type == "Token Amount":
+    threshold_value = st.sidebar.number_input("Minimum Token Amount", min_value=0.0, value=1000.0)
+    token_symbol = st.sidebar.text_input("Token Symbol", placeholder="ETH")
+else:
+    threshold_value = st.sidebar.number_input("Minimum USD Value", min_value=0.0, value=100000.0)
+# Time period selection
+st.sidebar.subheader("Time Period")
+time_period = st.sidebar.selectbox(
+    "Select Time Period",
+    ["Last 24 hours", "Last 7 days", "Last 30 days", "Custom"]
+)
+if time_period == "Custom":
+    start_date = st.sidebar.date_input("Start Date", datetime.now() - timedelta(days=7))
+    end_date = st.sidebar.date_input("End Date", datetime.now())
+else:
+    # Calculate dates based on selection
+    end_date = datetime.now()
+    if time_period == "Last 24 hours":
+        start_date = end_date - timedelta(days=1)
+    elif time_period == "Last 7 days":
+        start_date = end_date - timedelta(days=7)
+    else:  # Last 30 days
+        start_date = end_date - timedelta(days=30)
+# Manipulation detection settings
+st.sidebar.subheader("Manipulation Detection")
+enable_manipulation_detection = st.sidebar.toggle("Enable Manipulation Detection", value=True)
+if enable_manipulation_detection:
+    sensitivity = st.sidebar.select_slider(
+        "Detection Sensitivity",
+        options=["Low", "Medium", "High"],
+        value="Medium"
+    )
+# Price impact analysis settings
+st.sidebar.subheader("Price Impact Analysis")
+enable_price_impact = st.sidebar.toggle("Enable Price Impact Analysis", value=True)
+if enable_price_impact:
+    lookback_minutes = st.sidebar.slider("Lookback (minutes)", 1, 60, 5)
+    lookahead_minutes = st.sidebar.slider("Lookahead (minutes)", 1, 60, 5)
+# Action buttons
+track_button = st.sidebar.button("Track Transactions", type="primary")
+pattern_button = st.sidebar.button("Analyze Patterns")
+if enable_manipulation_detection:
+    detect_button = st.sidebar.button("Detect Manipulation")
+# Main content area
+tab1, tab2, tab3, tab4, tab5 = st.tabs([
+    "Transactions", "Patterns", "Price Impact", "Alerts", "Reports"
+])
+with tab1:
+    st.header("Whale Transactions")
+    if track_button and wallet_addresses:
+        with st.spinner("Fetching whale transactions..."):
+            # Function to track whale transactions
+            def track_whale_transactions(wallets, start_date, end_date, threshold_value, threshold_type, token_symbol=None):
+                # Direct API call since CrewAI is temporarily disabled
+                try:
+                    min_token_amount = None
+                    min_usd_value = None
+                    if threshold_type == "Token Amount":
+                        min_token_amount = threshold_value
+                    else:
+                        min_usd_value = threshold_value
+                    # Add pagination control to prevent infinite API requests
+                    max_pages = 5  # Limit the number of pages to prevent excessive API calls
+                    transactions = arbiscan_client.fetch_whale_transactions(
+                        addresses=wallets,
+                        min_token_amount=min_token_amount,
+                        max_pages=5,
+                        min_usd_value=min_usd_value
+                    )
+                    if transactions.empty:
+                        st.warning("No transactions found for the specified addresses")
+                    return transactions
+                except Exception as e:
+                    st.error(f"Error fetching transactions: {str(e)}")
+                    return pd.DataFrame()
+            wallet_list = [addr.strip() for addr in wallet_addresses.split("\n") if addr.strip()]
+            # Use cached data or fetch new if not available
+            if st.session_state.transactions_data is None or track_button:
+                with st.spinner("Fetching transactions..."):
+                    transactions = track_whale_transactions(
+                        wallets=wallet_list,
+                        start_date=start_date,
+                        end_date=end_date,
+                        threshold_value=threshold_value,
+                        threshold_type=threshold_type,
+                        token_symbol=token_symbol
+                    )
+                    # Store in session state
+                    st.session_state.transactions_data = transactions
+            else:
+                transactions = st.session_state.transactions_data
+            if not transactions.empty:
+                st.success(f"Found {len(transactions)} transactions matching your criteria")
+                # Display transactions
+                if len(transactions) > 0:
+                    st.dataframe(transactions, use_container_width=True)
+                    # Add download button
+                    csv = transactions.to_csv(index=False).encode('utf-8')
+                    st.download_button(
+                        "Download Transactions CSV",
+                        csv,
+                        "whale_transactions.csv",
+                        "text/csv",
+                        key='download-csv'
+                    )
+                    # Volume by day chart
+                    st.subheader("Transaction Volume by Day")
+                    try:
+                        st.plotly_chart(visualizer.plot_volume_by_day(transactions), use_container_width=True)
+                    except Exception as e:
+                        st.error(f"Error generating volume chart: {str(e)}")
+                    # Transaction flow visualization
+                    st.subheader("Transaction Flow")
+                    try:
+                        flow_chart = visualizer.plot_transaction_flow(transactions)
+                        st.plotly_chart(flow_chart, use_container_width=True)
+                    except Exception as e:
+                        st.error(f"Error generating flow chart: {str(e)}")
+            else:
+                st.warning("No transactions found matching your criteria. Try adjusting the parameters.")
+    else:
+        st.info("Enter wallet addresses and click 'Track Transactions' to view whale activity")
+with tab2:
+    st.header("Trading Patterns")
+    if track_button and wallet_addresses:
+        with st.spinner("Analyzing trading patterns..."):
+            # Function to analyze trading patterns
+            def analyze_trading_patterns(wallets, start_date, end_date):
+                # Direct analysis
+                try:
+                    transactions_df = arbiscan_client.fetch_whale_transactions(addresses=wallets, max_pages=5)
+                    if transactions_df.empty:
+                        st.warning("No transactions found for the specified addresses")
+                        return []
+                    return data_processor.identify_patterns(transactions_df)
+                except Exception as e:
+                    st.error(f"Error analyzing trading patterns: {str(e)}")
+                    return []
+            wallet_list = [addr.strip() for addr in wallet_addresses.split("\n") if addr.strip()]
+            # Use cached data or fetch new if not available
+            if st.session_state.patterns_data is None or track_button:
+                with st.spinner("Analyzing trading patterns..."):
+                    patterns = analyze_trading_patterns(
+                        wallets=wallet_list,
+                        start_date=start_date,
+                        end_date=end_date
+                    )
+                    # Store in session state
+                    st.session_state.patterns_data = patterns
+            else:
+                patterns = st.session_state.patterns_data
+            if patterns:
+                for i, pattern in enumerate(patterns):
+                    pattern_card = st.container()
+                    with pattern_card:
+                        # Pattern header with name and risk profile
+                        header_cols = st.columns([3, 1])
+                        with header_cols[0]:
+                            st.subheader(f"Pattern {i+1}: {pattern['name']}")
+                        with header_cols[1]:
+                            risk_color = "green"
+                            if pattern.get('risk_profile') == "Medium":
+                                risk_color = "orange"
+                            elif pattern.get('risk_profile') in ["High", "Very High"]:
+                                risk_color = "red"
+                            st.markdown(f"<h5 style='color:{risk_color};'>Risk: {pattern.get('risk_profile', 'Unknown')}</h5>", unsafe_allow_html=True)
+                        # Pattern description and details
+                        st.markdown(f"**Description:** {pattern['description']}")
+                        # Additional strategy information
+                        if 'strategy' in pattern:
+                            st.markdown(f"**Strategy:** {pattern['strategy']}")
+                        # Time insight
+                        if 'time_insight' in pattern:
+                            st.info(pattern['time_insight'])
+                        # Metrics
+                        metric_cols = st.columns(3)
+                        with metric_cols[0]:
+                            st.markdown(f"**Occurrences:** {pattern['occurrence_count']} instances")
+                        with metric_cols[1]:
+                            st.markdown(f"**Confidence:** {pattern.get('confidence', 0):.2f}")
+                        with metric_cols[2]:
+                            st.markdown(f"**Volume:** {pattern.get('volume_metric', 'N/A')}")
+                        # Display main chart first
+                        if 'charts' in pattern and 'main' in pattern['charts']:
+                            st.plotly_chart(pattern['charts']['main'], use_container_width=True)
+                        elif 'chart_data' in pattern and pattern['chart_data'] is not None:  # Fallback for old format
+                            st.plotly_chart(pattern['chart_data'], use_container_width=True)
+                        # Create two columns for additional charts
+                        if 'charts' in pattern and len(pattern['charts']) > 1:
+                            charts_col1, charts_col2 = st.columns(2)
+                            # Hourly distribution chart
+                            if 'hourly_distribution' in pattern['charts']:
+                                with charts_col1:
+                                    st.plotly_chart(pattern['charts']['hourly_distribution'], use_container_width=True)
+                            # Value distribution chart
+                            if 'value_distribution' in pattern['charts']:
+                                with charts_col2:
+                                    st.plotly_chart(pattern['charts']['value_distribution'], use_container_width=True)
+                        # Advanced metrics in expander
+                        if 'metrics' in pattern and pattern['metrics']:
+                            with st.expander("Detailed Metrics"):
+                                metrics_table = []
+                                for k, v in pattern['metrics'].items():
+                                    if v is not None:
+                                        if isinstance(v, float):
+                                            metrics_table.append([k.replace('_', ' ').title(), f"{v:.4f}"])
+                                        else:
+                                            metrics_table.append([k.replace('_', ' ').title(), v])
+                                if metrics_table:
+                                    st.table(pd.DataFrame(metrics_table, columns=["Metric", "Value"]))
+                        # Display example transactions
+                        if 'examples' in pattern and not pattern['examples'].empty:
+                            with st.expander("Example Transactions"):
+                                # Format the dataframe for better display
+                                display_df = pattern['examples'].copy()
+                                # Convert timestamp to readable format if needed
+                                if 'timeStamp' in display_df.columns and not pd.api.types.is_datetime64_any_dtype(display_df['timeStamp']):
+                                    display_df['timeStamp'] = pd.to_datetime(display_df['timeStamp'], unit='s')
+                                st.dataframe(display_df, use_container_width=True)
+                        st.markdown("---")
+            else:
+                st.info("No significant trading patterns detected. Try expanding the date range or adding more addresses.")
+    else:
+        st.info("Track transactions to analyze trading patterns")
+with tab3:
+    st.header("Price Impact Analysis")
+    if enable_price_impact and track_button and wallet_addresses:
+        with st.spinner("Analyzing price impact..."):
+            # Function to analyze price impact
+            def analyze_price_impact(wallets, start_date, end_date, lookback_minutes, lookahead_minutes):
+                # Direct analysis
+                transactions_df = arbiscan_client.fetch_whale_transactions(addresses=wallets, max_pages=5)
+                # Get token from first transaction
+                if not transactions_df.empty:
+                    token_symbol = transactions_df.iloc[0].get('tokenSymbol', 'ETH')
+                    # For each transaction, get price impact
+                    price_impacts = {}
+                    progress_bar = st.progress(0)
+                    for idx, row in transactions_df.iterrows():
+                        progress = int((idx + 1) / len(transactions_df) * 100)
+                        progress_bar.progress(progress, text=f"Analyzing transaction {idx+1} of {len(transactions_df)}")
+                        if 'timeStamp' in row:
+                            try:
+                                tx_time = datetime.fromtimestamp(int(row['timeStamp']))
+                                impact_data = gemini_client.get_price_impact(
+                                    symbol=f"{token_symbol}USD",
+                                    transaction_time=tx_time,
+                                    lookback_minutes=lookback_minutes,
+                                    lookahead_minutes=lookahead_minutes
+                                )
+                                price_impacts[row['hash']] = impact_data
+                            except Exception as e:
+                                st.warning(f"Could not get price data for transaction: {str(e)}")
+                    progress_bar.empty()
+                    if price_impacts:
+                        return data_processor.analyze_price_impact(transactions_df, price_impacts)
+                # Create an empty chart for the default case
+                empty_fig = go.Figure()
+                empty_fig.update_layout(
+                    title="No Price Impact Data Available",
+                    xaxis_title="Time",
+                    yaxis_title="Price Impact (%)",
+                    height=400,
+                    template="plotly_white"
+                )
+                empty_fig.add_annotation(
+                    text="No transactions found with price impact data",
+                    showarrow=False,
+                    font=dict(size=14)
+                )
+                return {
+                    "avg_impact_pct": 0,
+                    "max_impact_pct": 0,
+                    "min_impact_pct": 0,
+                    "significant_moves_count": 0,
+                    "total_transactions": 0,
+                    "transactions_with_impact": pd.DataFrame(),
+                    "charts": {
+                        "main_chart": empty_fig,
+                        "impact_distribution": empty_fig,
+                        "cumulative_impact": empty_fig,
+                        "hourly_impact": empty_fig
+                    },
+                    "insights": [],
+                    "impact_summary": "No price impact data available"
+                }
+            wallet_list = [addr.strip() for addr in wallet_addresses.split("\n") if addr.strip()]
+            # Use cached data or fetch new if not available
+            if st.session_state.price_impact_data is None or track_button:
+                with st.spinner("Analyzing price impact..."):
+                    impact_analysis = analyze_price_impact(
+                        wallets=wallet_list,
+                        start_date=start_date,
+                        end_date=end_date,
+                        lookback_minutes=lookback_minutes,
+                        lookahead_minutes=lookahead_minutes
+                    )
+                    # Store in session state
+                    st.session_state.price_impact_data = impact_analysis
+            else:
+                impact_analysis = st.session_state.price_impact_data
+            if impact_analysis:
+                # Display impact summary
+                if 'impact_summary' in impact_analysis:
+                    st.info(impact_analysis['impact_summary'])
+                # Summary metrics in two rows
+                metrics_row1 = st.columns(4)
+                with metrics_row1[0]:
+                    st.metric("Avg. Price Impact (%)", f"{impact_analysis.get('avg_impact_pct', 0):.2f}%")
+                with metrics_row1[1]:
+                    st.metric("Max Impact (%)", f"{impact_analysis.get('max_impact_pct', 0):.2f}%")
+                with metrics_row1[2]:
+                    st.metric("Min Impact (%)", f"{impact_analysis.get('min_impact_pct', 0):.2f}%")
+                with metrics_row1[3]:
+                    st.metric("Std Dev (%)", f"{impact_analysis.get('std_impact_pct', 0):.2f}%")
+                metrics_row2 = st.columns(4)
+                with metrics_row2[0]:
+                    st.metric("Significant Moves", impact_analysis.get('significant_moves_count', 0))
+                with metrics_row2[1]:
+                    st.metric("High Impact Moves", impact_analysis.get('high_impact_moves_count', 0))
+                with metrics_row2[2]:
+                    st.metric("Positive/Negative", f"{impact_analysis.get('positive_impacts_count', 0)}/{impact_analysis.get('negative_impacts_count', 0)}")
+                with metrics_row2[3]:
+                    st.metric("Total Transactions", impact_analysis.get('total_transactions', 0))
+                # Display insights if available
+                if 'insights' in impact_analysis and impact_analysis['insights']:
+                    st.subheader("Key Insights")
+                    for insight in impact_analysis['insights']:
+                        st.markdown(f"**{insight['title']}**: {insight['description']}")
+                # Display the main chart
+                if 'charts' in impact_analysis and 'main_chart' in impact_analysis['charts']:
+                    st.subheader("Price Impact Over Time")
+                    st.plotly_chart(impact_analysis['charts']['main_chart'], use_container_width=True)
+                # Create two columns for secondary charts
+                col1, col2 = st.columns(2)
+                # Distribution chart
+                if 'charts' in impact_analysis and 'impact_distribution' in impact_analysis['charts']:
+                    with col1:
+                        st.plotly_chart(impact_analysis['charts']['impact_distribution'], use_container_width=True)
+                # Cumulative impact chart
+                if 'charts' in impact_analysis and 'cumulative_impact' in impact_analysis['charts']:
+                    with col2:
+                        st.plotly_chart(impact_analysis['charts']['cumulative_impact'], use_container_width=True)
+                # Hourly impact chart
+                if 'charts' in impact_analysis and 'hourly_impact' in impact_analysis['charts']:
+                    st.plotly_chart(impact_analysis['charts']['hourly_impact'], use_container_width=True)
+                # Detailed transactions with impact
+                if not impact_analysis['transactions_with_impact'].empty:
+                    st.subheader("Transactions with Price Impact")
+                    # Convert numeric columns to have 2 decimal places for better display
+                    display_df = impact_analysis['transactions_with_impact'].copy()
+                    for col in ['impact_pct', 'pre_price', 'post_price', 'cumulative_impact']:
+                        if col in display_df.columns:
+                            display_df[col] = display_df[col].apply(lambda x: f"{float(x):.2f}%" if pd.notnull(x) else "N/A")
+                    st.dataframe(display_df, use_container_width=True)
+                else:
+                    st.info("No transaction-specific price impact data available")
+            else:
+                st.info("No price impact data available for the given parameters")
+    else:
+        st.info("Enable Price Impact Analysis and track transactions to see price effects")
+with tab4:
+    st.header("Manipulation Alerts")
+    if enable_manipulation_detection and detect_button and wallet_addresses:
+        with st.spinner("Detecting potential manipulation..."):
+            wallet_list = [addr.strip() for addr in wallet_addresses.split("\n") if addr.strip()]
+            # Function to detect manipulation
+            def detect_manipulation(wallets, start_date, end_date, sensitivity):
+                try:
+                    transactions_df = arbiscan_client.fetch_whale_transactions(addresses=wallets, max_pages=5)
+                    if transactions_df.empty:
+                        st.warning("No transactions found for the specified addresses")
+                        return []
+                    pump_dump = detection.detect_pump_and_dump(transactions_df, sensitivity)
+                    wash_trades = detection.detect_wash_trading(transactions_df, wallets, sensitivity)
+                    return pump_dump + wash_trades
+                except Exception as e:
+                    st.error(f"Error detecting manipulation: {str(e)}")
+                    return []
+            alerts = detect_manipulation(
+                wallets=wallet_list,
+                start_date=start_date,
+                end_date=end_date,
+                sensitivity=sensitivity
+            )
+            if alerts:
+                for i, alert in enumerate(alerts):
+                    alert_color = "red" if alert['risk_level'] == "High" else "orange" if alert['risk_level'] == "Medium" else "blue"
+                    with st.expander(f" {alert['type']} - Risk: {alert['risk_level']}", expanded=i==0):
+                        st.markdown(f"<h4 style='color:{alert_color}'>{alert['title']}</h4>", unsafe_allow_html=True)
+                        st.write(f"**Description:** {alert['description']}")
+                        st.write(f"**Detection Time:** {alert['detection_time']}")
+                        st.write(f"**Involved Addresses:** {', '.join(alert['addresses'])}")
+                        # Display evidence
+                        if 'evidence' in alert and alert['evidence'] is not None and not (isinstance(alert['evidence'], pd.DataFrame) and alert['evidence'].empty):
+                            st.subheader("Evidence")
+                            try:
+                                evidence_df = alert['evidence']
+                                if isinstance(evidence_df, str):
+                                    # Try to convert from JSON string if needed
+                                    evidence_df = pd.read_json(evidence_df)
+                                st.dataframe(evidence_df, use_container_width=True)
+                            except Exception as e:
+                                st.error(f"Error displaying evidence: {str(e)}")
+                        # Display chart if available
+                        if 'chart' in alert and alert['chart'] is not None:
+                            try:
+                                st.plotly_chart(alert['chart'], use_container_width=True)
+                            except Exception as e:
+                                st.error(f"Error displaying chart: {str(e)}")
+            else:
+                st.success("No manipulation tactics detected for the given parameters")
+    else:
+        st.info("Enable Manipulation Detection and click 'Detect Manipulation' to scan for suspicious activity")
+with tab5:
+    st.header("Reports & Visualizations")
+    # Report type selection
+    report_type = st.selectbox(
+        "Select Report Type",
+        ["Transaction Summary", "Pattern Analysis", "Price Impact", "Manipulation Detection", "Complete Analysis"]
+    )
+    # Export format
+    export_format = st.radio(
+        "Export Format",
+        ["CSV", "PDF", "PNG"],
+        horizontal=True
+    )
+    # Generate report button
+    if st.button("Generate Report"):
+        if wallet_addresses:
+            with st.spinner("Generating report..."):
+                wallet_list = [addr.strip() for addr in wallet_addresses.split("\n") if addr.strip()]
+                if CREW_ENABLED and crew_system is not None:
+                    try:
+                        with st.spinner("Generating AI analysis report..."):
+                            # Check if crew_system has llm attribute defined
+                            if not hasattr(crew_system, 'llm') or crew_system.llm is None:
+                                raise ValueError("LLM not initialized in crew system")
+                            report = crew_system.generate_market_manipulation_report(wallet_addresses=wallet_list)
+                            st.markdown(f"## AI Analysis Report")
+                            st.markdown(report['content'])
+                            if 'charts' in report and report['charts']:
+                                for i, chart in enumerate(report['charts']):
+                                    st.plotly_chart(chart, use_container_width=True)
+                    except Exception as e:
+                        st.error(f"CrewAI report generation failed: {str(e)}")
+                        st.warning("Using direct analysis instead")
+                        # Fallback to direct analysis
+                        with st.spinner("Generating basic analysis..."):
+                            insights = detection.generate_manipulation_insights(transactions=st.session_state.transactions_data)
+                            st.markdown(f"## Potential Manipulation Insights")
+                            for insight in insights:
+                                st.markdown(f"**{insight['title']}**\n{insight['description']}")
+                else:
+                    st.error("Failed to generate report: CrewAI is not enabled")
+        else:
+            st.error("Please enter wallet addresses to generate a report")
+# Footer with instructions
+st.markdown("---")
+with st.expander("How to Use"):
+    st.markdown("""
+    ### Typical Workflow
+    1. **Input wallet addresses** in the sidebar - these are the whale wallets you want to track
+    2. **Set the minimum threshold** for transaction size (token amount or USD value)
+    3. **Select time period** for analysis
+    4. **Click 'Track Transactions'** to see large transfers for these wallets
+    5. **Enable additional analysis** like pattern recognition or manipulation detection
+    6. **Export reports** for further analysis or record-keeping
+    ### API Keys
+    This app requires two API keys to function properly:
+    - **ARBISCAN_API_KEY** - For accessing Arbitrum blockchain data
+    - **GEMINI_API_KEY** - For real-time token price data
+    These should be stored in a `.env` file in the project root.
+    """)

modules/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+

modules/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (157 Bytes). View file

modules/__pycache__/api_client.cpython-312.pyc ADDED Viewed

Binary file (30.1 kB). View file

modules/__pycache__/crew_system.cpython-312.pyc ADDED Viewed

Binary file (36.2 kB). View file

modules/__pycache__/crew_tools.cpython-312.pyc ADDED Viewed

Binary file (18.3 kB). View file

modules/__pycache__/data_processor.cpython-312.pyc ADDED Viewed

Binary file (44.1 kB). View file

modules/__pycache__/detection.cpython-312.pyc ADDED Viewed

Binary file (22.3 kB). View file

modules/__pycache__/visualizer.cpython-312.pyc ADDED Viewed

Binary file (23.2 kB). View file

modules/api_client.py ADDED Viewed

	@@ -0,0 +1,768 @@

+import requests
+import json
+import time
+import logging
+from datetime import datetime
+import pandas as pd
+from typing import Dict, List, Optional, Union, Any
+class ArbiscanClient:
+    """
+    Client to interact with the Arbiscan API for fetching on-chain data from Arbitrum
+    """
+    def __init__(self, api_key: str):
+        self.api_key = api_key
+        self.base_url = "https://api.arbiscan.io/api"
+        self.rate_limit_delay = 0.2  # Delay between API calls to avoid rate limiting (200ms)
+        # Add caching to improve performance
+        self._transaction_cache = {}
+        self._last_api_call_time = 0
+        # Configure debug logging - set to True for verbose output, False for minimal output
+        self.verbose_debug = False
+    def _make_request(self, params: Dict[str, str]) -> Dict[str, Any]:
+        """
+        Make a request to the Arbiscan API with rate limiting
+        """
+        params["apikey"] = self.api_key
+        # Implement rate limiting
+        current_time = time.time()
+        time_since_last_call = current_time - self._last_api_call_time
+        if time_since_last_call < self.rate_limit_delay:
+            time.sleep(self.rate_limit_delay - time_since_last_call)
+        self._last_api_call_time = time.time()
+        try:
+            # Log the request details but only in verbose mode
+            if self.verbose_debug:
+                debug_params = params.copy()
+                debug_params.pop("apikey", None)
+                logging.debug(f"API Request: {self.base_url}")
+                logging.debug(f"Params: {json.dumps(debug_params, indent=2)}")
+            response = requests.get(self.base_url, params=params)
+            # Print response status and URL only in verbose mode
+            if self.verbose_debug:
+                logging.debug(f"Response Status: {response.status_code}")
+                logging.debug(f"Full URL: {response.url.replace(self.api_key, 'API_KEY_REDACTED')}")
+            response.raise_for_status()
+            # Parse the JSON response
+            json_data = response.json()
+            # Log the response structure but only in verbose mode
+            if self.verbose_debug:
+                result_preview = str(json_data.get('result', ''))[:100] + '...' if len(str(json_data.get('result', ''))) > 100 else str(json_data.get('result', ''))
+                logging.debug(f"Response Status: {json_data.get('status')}")
+                logging.debug(f"Response Message: {json_data.get('message', 'No message')}")
+                logging.debug(f"Result Preview: {result_preview}")
+            # Check for API-level errors in the response
+            status = json_data.get('status')
+            message = json_data.get('message', 'No message')
+            if status == '0' and message != 'No transactions found':
+                logging.warning(f"API Error: {message}")
+            return json_data
+        except requests.exceptions.HTTPError as e:
+            logging.error(f"HTTP Error in API Request: {e.response.status_code}")
+            raise
+        except requests.exceptions.ConnectionError as e:
+            logging.error(f"Connection Error in API Request: {str(e)}")
+            raise
+        except requests.exceptions.Timeout as e:
+            logging.error(f"Timeout in API Request: {str(e)}")
+            raise
+        except requests.exceptions.RequestException as e:
+            logging.error(f"API Request failed: {str(e)}")
+            print(f"ERROR - URL: {self.base_url}")
+            print(f"ERROR - Method: {params.get('module')}/{params.get('action')}")
+            return {"status": "0", "message": f"Error: {str(e)}", "result": []}
+    def get_eth_balance(self, address: str) -> float:
+        """
+        Get the ETH balance of an address
+        Args:
+            address: Wallet address
+        Returns:
+            ETH balance as a float
+        """
+        params = {
+            "module": "account",
+            "action": "balance",
+            "address": address,
+            "tag": "latest"
+        }
+        result = self._make_request(params)
+        if result.get("status") == "1":
+            # Convert wei to ETH
+            wei_balance = int(result.get("result", "0"))
+            eth_balance = wei_balance / 10**18
+            return eth_balance
+        else:
+            return 0.0
+    def get_token_balance(self, address: str, token_address: str) -> float:
+        """
+        Get the token balance of an address for a specific token
+        Args:
+            address: Wallet address
+            token_address: Token contract address
+        Returns:
+            Token balance as a float
+        """
+        params = {
+            "module": "account",
+            "action": "tokenbalance",
+            "address": address,
+            "contractaddress": token_address,
+            "tag": "latest"
+        }
+        result = self._make_request(params)
+        if result.get("status") == "1":
+            # Get token decimals and convert to proper amount
+            decimals = self.get_token_decimals(token_address)
+            raw_balance = int(result.get("result", "0"))
+            token_balance = raw_balance / 10**decimals
+            return token_balance
+        else:
+            return 0.0
+    def get_token_decimals(self, token_address: str) -> int:
+        """
+        Get the number of decimals for a token
+        Args:
+            token_address: Token contract address
+        Returns:
+            Number of decimals (default: 18)
+        """
+        params = {
+            "module": "token",
+            "action": "getToken",
+            "contractaddress": token_address
+        }
+        result = self._make_request(params)
+        if result.get("status") == "1":
+            token_info = result.get("result", {})
+            return int(token_info.get("divisor", "18"))
+        else:
+            # Default to 18 decimals (most ERC-20 tokens)
+            return 18
+    def get_token_transfers(self,
+                           address: str,
+                           contract_address: Optional[str] = None,
+                           start_block: int = 0,
+                           end_block: int = 99999999,
+                           page: int = 1,
+                           offset: int = 100,
+                           sort: str = "desc") -> List[Dict[str, Any]]:
+        """
+        Get token transfers for an address
+        Args:
+            address: Wallet address
+            contract_address: Optional token contract address to filter by
+            start_block: Starting block number
+            end_block: Ending block number
+            page: Page number
+            offset: Number of results per page
+            sort: Sort order ("asc" or "desc")
+        Returns:
+            List of token transfers
+        """
+        params = {
+            "module": "account",
+            "action": "tokentx",
+            "address": address,
+            "startblock": str(start_block),
+            "endblock": str(end_block),
+            "page": str(page),
+            "offset": str(offset),
+            "sort": sort
+        }
+        # Add contract address if specified
+        if contract_address:
+            params["contractaddress"] = contract_address
+        result = self._make_request(params)
+        if result.get("status") == "1":
+            return result.get("result", [])
+        else:
+            message = result.get("message", "Unknown error")
+            if "No transactions found" in message:
+                return []
+            else:
+                logging.warning(f"Error fetching token transfers: {message}")
+                return []
+    def fetch_all_token_transfers(self,
+                                address: str,
+                                contract_address: Optional[str] = None,
+                                start_block: int = 0,
+                                end_block: int = 99999999,
+                                max_pages: int = 10) -> List[Dict[str, Any]]:
+        """
+        Fetch all token transfers for an address, paginating through results
+        Args:
+            address: Wallet address
+            contract_address: Optional token contract address to filter by
+            start_block: Starting block number
+            end_block: Ending block number
+            max_pages: Maximum number of pages to fetch
+        Returns:
+            List of all token transfers
+        """
+        all_transfers = []
+        offset = 100  # Results per page (API limit)
+        for page in range(1, max_pages + 1):
+            try:
+                transfers = self.get_token_transfers(
+                    address=address,
+                    contract_address=contract_address,
+                    start_block=start_block,
+                    end_block=end_block,
+                    page=page,
+                    offset=offset
+                )
+                # No more transfers, break the loop
+                if not transfers:
+                    break
+                all_transfers.extend(transfers)
+                # If we got fewer results than the offset, we've reached the end
+                if len(transfers) < offset:
+                    break
+            except Exception as e:
+                logging.error(f"Error fetching page {page} of token transfers: {str(e)}")
+                break
+        return all_transfers
+    def fetch_whale_transactions(self,
+                               addresses: List[str],
+                               token_address: Optional[str] = None,
+                               min_token_amount: Optional[float] = None,
+                               min_usd_value: Optional[float] = None,
+                               start_block: int = 0,
+                               end_block: int = 99999999,
+                               max_pages: int = 10) -> pd.DataFrame:
+        """
+        Fetch whale transactions for a list of addresses
+        Args:
+            addresses: List of wallet addresses
+            token_address: Optional token contract address to filter by
+            min_token_amount: Minimum token amount to be considered a whale transaction
+            min_usd_value: Minimum USD value to be considered a whale transaction
+            start_block: Starting block number
+            end_block: Ending block number
+            max_pages: Maximum number of pages to fetch per address (default: 10)
+        Returns:
+            DataFrame of whale transactions
+        """
+        try:
+            # Create a cache key based on parameters
+            cache_key = f"{','.join(addresses)}_{token_address}_{min_token_amount}_{min_usd_value}_{start_block}_{end_block}_{max_pages}"
+            # Check if we have cached results
+            if cache_key in self._transaction_cache:
+                logging.info(f"Using cached transactions for {len(addresses)} addresses")
+                return self._transaction_cache[cache_key]
+            all_transfers = []
+            logging.info(f"Fetching whale transactions for {len(addresses)} addresses")
+            logging.info(f"Token address filter: {token_address if token_address else 'None'}")
+            logging.info(f"Min token amount: {min_token_amount}")
+            logging.info(f"Min USD value: {min_usd_value}")
+            for i, address in enumerate(addresses):
+                try:
+                    logging.info(f"Processing address {i+1}/{len(addresses)}: {address}")
+                    # Create address-specific cache key
+                    addr_cache_key = f"{address}_{token_address}_{start_block}_{end_block}_{max_pages}"
+                    # Check if we have cached results for this specific address
+                    if addr_cache_key in self._transaction_cache:
+                        transfers = self._transaction_cache[addr_cache_key]
+                        logging.info(f"Using cached {len(transfers)} transfers for address {address}")
+                    else:
+                        transfers = self.fetch_all_token_transfers(
+                            address=address,
+                            contract_address=token_address,
+                            start_block=start_block,
+                            end_block=end_block,
+                            max_pages=max_pages
+                        )
+                        logging.info(f"Found {len(transfers)} transfers for address {address}")
+                        # Cache the results for this address
+                        self._transaction_cache[addr_cache_key] = transfers
+                    all_transfers.extend(transfers)
+                except Exception as e:
+                    logging.error(f"Failed to fetch transactions for address {address}: {str(e)}")
+                    continue
+            logging.info(f"Total transfers found: {len(all_transfers)}")
+            if not all_transfers:
+                logging.warning("No whale transactions found for the specified addresses")
+                return pd.DataFrame()
+            # Convert to DataFrame
+            logging.info("Converting transfers to DataFrame")
+            df = pd.DataFrame(all_transfers)
+            # Log the column names
+            logging.info(f"DataFrame created with {len(df)} rows and {len(df.columns)} columns")
+            logging.info(f"Columns: {', '.join(df.columns[:5])}...")
+            # Apply token amount filter if specified
+            if min_token_amount is not None:
+                logging.info(f"Applying min token amount filter: {min_token_amount}")
+                # Convert to float and then filter
+                df['tokenAmount'] = df['value'].astype(float) / (10 ** df['tokenDecimal'].astype(int))
+                df = df[df['tokenAmount'] >= min_token_amount]
+                logging.info(f"After token amount filtering: {len(df)}/{len(all_transfers)} rows remain")
+            # Apply USD value filter if specified (this would require price data)
+            if min_usd_value is not None and 'tokenAmount' in df.columns:
+                logging.info(f"USD value filtering is not implemented yet")
+                # This would require token price data, which we don't have yet
+                # df = df[df['usd_value'] >= min_usd_value]
+            # Convert timestamp to datetime
+            if 'timeStamp' in df.columns:
+                logging.info("Converting timestamp to datetime")
+                try:
+                    df['timeStamp'] = pd.to_datetime(df['timeStamp'].astype(float), unit='s')
+                except Exception as e:
+                    logging.error(f"Error converting timestamp: {str(e)}")
+            logging.info(f"Final DataFrame has {len(df)} rows")
+            # Cache the final result
+            self._transaction_cache[cache_key] = df
+            return df
+        except Exception as e:
+            logging.error(f"Error fetching whale transactions: {str(e)}")
+            return pd.DataFrame()
+    def get_internal_transactions(self,
+                                address: str,
+                                start_block: int = 0,
+                                end_block: int = 99999999,
+                                page: int = 1,
+                                offset: int = 100,
+                                sort: str = "desc") -> List[Dict[str, Any]]:
+        """
+        Get internal transactions for an address
+        Args:
+            address: Wallet address
+            start_block: Starting block number
+            end_block: Ending block number
+            page: Page number
+            offset: Number of results per page
+            sort: Sort order ("asc" or "desc")
+        Returns:
+            List of internal transactions
+        """
+        params = {
+            "module": "account",
+            "action": "txlistinternal",
+            "address": address,
+            "startblock": str(start_block),
+            "endblock": str(end_block),
+            "page": str(page),
+            "offset": str(offset),
+            "sort": sort
+        }
+        result = self._make_request(params)
+        if result.get("status") == "1":
+            return result.get("result", [])
+        else:
+            message = result.get("message", "Unknown error")
+            if "No transactions found" in message:
+                return []
+            else:
+                logging.warning(f"Error fetching internal transactions: {message}")
+                return []
+class GeminiClient:
+    """
+    Client to interact with the Gemini API for fetching token prices
+    """
+    def __init__(self, api_key: str):
+        self.api_key = api_key
+        self.base_url = "https://api.gemini.com/v1"
+        # Add caching to avoid repetitive API calls
+        self._price_cache = {}
+        # Track API errors to avoid flooding logs
+        self._error_count = {}
+        self._last_api_call = 0  # For rate limiting
+    def get_current_price(self, symbol: str) -> Optional[float]:
+        """
+        Get the current price of a token
+        Args:
+            symbol: Token symbol (e.g., "ETHUSD")
+        Returns:
+            Current price as a float or None if not found
+        """
+        try:
+            url = f"{self.base_url}/pubticker/{symbol}"
+            response = requests.get(url)
+            response.raise_for_status()
+            data = response.json()
+            return float(data.get("last", 0))
+        except requests.exceptions.RequestException as e:
+            logging.error(f"Error fetching price from Gemini API: {e}")
+            return None
+    def get_historical_prices(self,
+                             symbol: str,
+                             start_time: datetime,
+                             end_time: datetime) -> Optional[pd.DataFrame]:
+        """
+        Get historical prices for a token within a time range
+        Args:
+            symbol: Token symbol (e.g., "ETHUSD")
+            start_time: Start datetime
+            end_time: End datetime
+        Returns:
+            DataFrame of historical prices with timestamps
+        """
+        # Implement simple rate limiting
+        current_time = time.time()
+        if current_time - self._last_api_call < 0.05:  # 50ms minimum between calls
+            time.sleep(0.05)
+        self._last_api_call = current_time
+        # Create a cache key based on the parameters
+        cache_key = f"{symbol}_{int(start_time.timestamp())}_{int(end_time.timestamp())}"
+        # Check if we already have this data cached
+        if cache_key in self._price_cache:
+            return self._price_cache[cache_key]
+        try:
+            # Convert datetime to milliseconds
+            start_ms = int(start_time.timestamp() * 1000)
+            end_ms = int(end_time.timestamp() * 1000)
+            url = f"{self.base_url}/trades/{symbol}"
+            params = {
+                "limit_trades": 500,
+                "timestamp": start_ms
+            }
+            # Check if we've seen too many errors for this symbol
+            error_key = f"error_{symbol}"
+            if self._error_count.get(error_key, 0) > 10:
+                # If we've already had too many errors for this symbol, don't try again
+                return None
+            response = requests.get(url, params=params)
+            response.raise_for_status()
+            trades = response.json()
+            # Reset error count on success
+            self._error_count[error_key] = 0
+            # Filter trades within the time range
+            filtered_trades = [
+                trade for trade in trades
+                if start_ms <= trade.get("timestampms", 0) <= end_ms
+            ]
+            if not filtered_trades:
+                # Cache negative result to avoid future lookups
+                self._price_cache[cache_key] = None
+                return None
+            # Convert to DataFrame
+            df = pd.DataFrame(filtered_trades)
+            # Convert timestamp to datetime
+            df['timestamp'] = pd.to_datetime(df['timestampms'], unit='ms')
+            # Select and rename columns
+            result_df = df[['timestamp', 'price', 'amount']].copy()
+            result_df.columns = ['Timestamp', 'Price', 'Amount']
+            # Convert price to float
+            result_df['Price'] = result_df['Price'].astype(float)
+            # Cache the result
+            self._price_cache[cache_key] = result_df
+            return result_df
+        except requests.exceptions.HTTPError as e:
+            # Handle HTTP errors more efficiently
+            self._error_count[error_key] = self._error_count.get(error_key, 0) + 1
+            # Only log the first few occurrences of each error
+            if self._error_count[error_key] <= 3:
+                logging.warning(f"HTTP error fetching price for {symbol}: {e.response.status_code}")
+            return None
+        except Exception as e:
+            # For other errors, use a similar approach
+            self._error_count[error_key] = self._error_count.get(error_key, 0) + 1
+            if self._error_count[error_key] <= 3:
+                logging.error(f"Error fetching prices for {symbol}: {str(e)}")
+            return None
+    def get_price_at_time(self,
+                         symbol: str,
+                         timestamp: datetime) -> Optional[float]:
+        """
+        Get the approximate price of a token at a specific time
+        Args:
+            symbol: Token symbol (e.g., "ETHUSD")
+            timestamp: Target datetime
+        Returns:
+            Price at the specified time as a float or None if not found
+        """
+        # Look for prices 5 minutes before and after the target time
+        start_time = timestamp - pd.Timedelta(minutes=5)
+        end_time = timestamp + pd.Timedelta(minutes=5)
+        prices_df = self.get_historical_prices(symbol, start_time, end_time)
+        if prices_df is None or prices_df.empty:
+            return None
+        # Find the closest price
+        prices_df['time_diff'] = abs(prices_df['Timestamp'] - timestamp)
+        closest_price = prices_df.loc[prices_df['time_diff'].idxmin(), 'Price']
+        return closest_price
+    def get_price_impact(self,
+                        symbol: str,
+                        transaction_time: datetime,
+                        lookback_minutes: int = 5,
+                        lookahead_minutes: int = 5) -> Dict[str, Any]:
+        """
+        Analyze the price impact before and after a transaction
+        Args:
+            symbol: Token symbol (e.g., "ETHUSD")
+            transaction_time: Transaction datetime
+            lookback_minutes: Minutes to look back before the transaction
+            lookahead_minutes: Minutes to look ahead after the transaction
+        Returns:
+            Dictionary with price impact metrics
+        """
+        start_time = transaction_time - pd.Timedelta(minutes=lookback_minutes)
+        end_time = transaction_time + pd.Timedelta(minutes=lookahead_minutes)
+        prices_df = self.get_historical_prices(symbol, start_time, end_time)
+        if prices_df is None or prices_df.empty:
+            return {
+                "pre_price": None,
+                "post_price": None,
+                "impact_pct": None,
+                "prices_df": None
+            }
+        # Find pre and post transaction prices
+        pre_prices = prices_df[prices_df['Timestamp'] < transaction_time]
+        post_prices = prices_df[prices_df['Timestamp'] >= transaction_time]
+        pre_price = pre_prices['Price'].iloc[-1] if not pre_prices.empty else None
+        post_price = post_prices['Price'].iloc[0] if not post_prices.empty else None
+        # Calculate impact percentage
+        impact_pct = None
+        if pre_price is not None and post_price is not None:
+            impact_pct = ((post_price - pre_price) / pre_price) * 100
+        return {
+            "pre_price": pre_price,
+            "post_price": post_price,
+            "impact_pct": impact_pct,
+            "prices_df": prices_df
+        }
+    def fetch_historical_prices(self, token_symbol: str, timestamp) -> Dict[str, Any]:
+        """Fetch historical price data for a token at a specific timestamp
+        Args:
+            token_symbol: Token symbol (e.g., "ETH")
+            timestamp: Timestamp (can be int, float, datetime, or pandas Timestamp)
+        Returns:
+            Dictionary with price data
+        """
+        # Convert timestamp to integer if it's not already
+        timestamp_value = 0
+        try:
+            # Handle different timestamp types
+            if isinstance(timestamp, (int, float)):
+                timestamp_value = int(timestamp)
+            elif isinstance(timestamp, pd.Timestamp):
+                timestamp_value = int(timestamp.timestamp())
+            elif isinstance(timestamp, datetime):
+                timestamp_value = int(timestamp.timestamp())
+            elif isinstance(timestamp, str):
+                # Try to parse string as timestamp
+                dt = pd.to_datetime(timestamp)
+                timestamp_value = int(dt.timestamp())
+            else:
+                # Default to current time if invalid type
+                logging.warning(f"Invalid timestamp type: {type(timestamp)}, using current time")
+                timestamp_value = int(time.time())
+        except Exception as e:
+            logging.warning(f"Error converting timestamp {timestamp}: {str(e)}, using current time")
+            timestamp_value = int(time.time())
+        # Check cache first
+        cache_key = f"{token_symbol}_{timestamp_value}"
+        if cache_key in self._price_cache:
+            return self._price_cache[cache_key]
+        # Implement rate limiting
+        current_time = time.time()
+        if current_time - self._last_api_call < 0.05:  # 50ms minimum between calls
+            time.sleep(0.05)
+        self._last_api_call = current_time
+        # Check error count for this symbol
+        error_key = f"error_{token_symbol}"
+        if self._error_count.get(error_key, 0) > 10:
+            # Too many errors, return cached failure
+            return {
+                'symbol': token_symbol,
+                'timestamp': timestamp_value,
+                'price': None,
+                'status': 'error',
+                'error': 'Too many previous errors'
+            }
+        try:
+            url = f"{self.base_url}/trades/{token_symbol}USD"
+            params = {
+                'limit_trades': 500,
+                'timestamp': timestamp_value * 1000  # Convert to milliseconds
+            }
+            response = requests.get(url, params=params)
+            response.raise_for_status()
+            data = response.json()
+            # Reset error count on success
+            self._error_count[error_key] = 0
+            # Calculate average price from recent trades
+            if data:
+                prices = [float(trade['price']) for trade in data]
+                avg_price = sum(prices) / len(prices)
+                result = {
+                    'symbol': token_symbol,
+                    'timestamp': timestamp_value,
+                    'price': avg_price,
+                    'status': 'success'
+                }
+                # Cache success
+                self._price_cache[cache_key] = result
+                return result
+            else:
+                result = {
+                    'symbol': token_symbol,
+                    'timestamp': timestamp_value,
+                    'price': None,
+                    'status': 'no_data'
+                }
+                # Cache no data
+                self._price_cache[cache_key] = result
+                return result
+        except requests.exceptions.HTTPError as e:
+            # Handle HTTP errors efficiently
+            self._error_count[error_key] = self._error_count.get(error_key, 0) + 1
+            # Only log first few occurrences
+            if self._error_count[error_key] <= 3:
+                logging.warning(f"HTTP error fetching price for {token_symbol}: {e.response.status_code}")
+            elif self._error_count[error_key] == 10:
+                logging.warning(f"Suppressing further logs for {token_symbol} errors")
+            result = {
+                'symbol': token_symbol,
+                'timestamp': timestamp,
+                'price': None,
+                'status': 'error',
+                'error': f"HTTP {e.response.status_code}"
+            }
+            self._price_cache[cache_key] = result
+            return result
+        except Exception as e:
+            # For other errors
+            self._error_count[error_key] = self._error_count.get(error_key, 0) + 1
+            if self._error_count[error_key] <= 3:
+                logging.error(f"Error fetching prices for {token_symbol}: {str(e)}")
+            result = {
+                'symbol': token_symbol,
+                'timestamp': timestamp_value,
+                'price': None,
+                'status': 'error',
+                'error': str(e)
+            }
+            self._price_cache[cache_key] = result
+            return result

modules/crew_system.py ADDED Viewed

	@@ -0,0 +1,1117 @@

+import os
+import logging
+from typing import Dict, List, Optional, Union, Any, Tuple
+import pandas as pd
+from datetime import datetime, timedelta
+import io
+import base64
+from crewai import Agent, Task, Crew, Process
+from langchain.tools import BaseTool
+from langchain.chat_models import ChatOpenAI
+from modules.api_client import ArbiscanClient, GeminiClient
+from modules.data_processor import DataProcessor
+from modules.crew_tools import (
+    ArbiscanGetTokenTransfersTool,
+    ArbiscanGetNormalTransactionsTool,
+    ArbiscanGetInternalTransactionsTool,
+    ArbiscanFetchWhaleTransactionsTool,
+    GeminiGetCurrentPriceTool,
+    GeminiGetHistoricalPricesTool,
+    DataProcessorIdentifyPatternsTool,
+    DataProcessorDetectAnomalousTransactionsTool,
+    set_global_clients
+)
+class WhaleAnalysisCrewSystem:
+    """
+    CrewAI system for analyzing whale wallet activity and detecting market manipulation
+    """
+    def __init__(self, arbiscan_client: ArbiscanClient, gemini_client: GeminiClient, data_processor: DataProcessor):
+        self.arbiscan_client = arbiscan_client
+        self.gemini_client = gemini_client
+        self.data_processor = data_processor
+        # Initialize LLM
+        try:
+            from langchain.chat_models import ChatOpenAI
+            self.llm = ChatOpenAI(
+                model="gpt-4",
+                temperature=0.2,
+                api_key=os.getenv("OPENAI_API_KEY")
+            )
+        except Exception as e:
+            logging.warning(f"Could not initialize LLM: {str(e)}")
+            self.llm = None
+        # Use a factory method to safely create tool instances
+        self.setup_tools()
+    def setup_tools(self):
+        """Setup LangChain tools for the whale analysis crew"""
+        try:
+            # Setup clients
+            arbiscan_client = ArbiscanClient(api_key=os.getenv("ARBISCAN_API_KEY"))
+            gemini_client = GeminiClient(api_key=os.getenv("GEMINI_API_KEY"))
+            data_processor = DataProcessor()
+            # Set global clients first
+            set_global_clients(
+                arbiscan_client=arbiscan_client,
+                gemini_client=gemini_client,
+                data_processor=data_processor
+            )
+            # Create tools (no need to pass clients, they'll use globals)
+            self.arbiscan_tools = [
+                self._create_tool(ArbiscanGetTokenTransfersTool),
+                self._create_tool(ArbiscanGetNormalTransactionsTool),
+                self._create_tool(ArbiscanGetInternalTransactionsTool),
+                self._create_tool(ArbiscanFetchWhaleTransactionsTool)
+            ]
+            self.gemini_tools = [
+                self._create_tool(GeminiGetCurrentPriceTool),
+                self._create_tool(GeminiGetHistoricalPricesTool)
+            ]
+            self.data_processor_tools = [
+                self._create_tool(DataProcessorIdentifyPatternsTool),
+                self._create_tool(DataProcessorDetectAnomalousTransactionsTool)
+            ]
+            logging.info(f"Successfully created {len(self.arbiscan_tools + self.gemini_tools + self.data_processor_tools)} tools")
+        except Exception as e:
+            logging.error(f"Error setting up tools: {str(e)}")
+            raise Exception(f"Error setting up tools: {str(e)}")
+    def _create_tool(self, tool_class, *args, **kwargs):
+        """Factory method to safely create a tool with proper error handling"""
+        try:
+            tool = tool_class(*args, **kwargs)
+            return tool
+        except Exception as e:
+            logging.error(f"Failed to create tool {tool_class.__name__}: {str(e)}")
+            raise Exception(f"Failed to create tool {tool_class.__name__}: {str(e)}")
+    def create_agents(self):
+        """Create the agents for the crew"""
+        # Data Collection Agent
+        data_collector = Agent(
+            role="Blockchain Data Collector",
+            goal="Collect comprehensive whale transaction data from the blockchain",
+            backstory="""You are a blockchain analytics expert specialized in extracting and
+            organizing on-chain data from the Arbitrum network. You have deep knowledge of blockchain
+            transaction structures and can efficiently query APIs to gather relevant whale activity.""",
+            verbose=True,
+            allow_delegation=True,
+            tools=self.arbiscan_tools,
+            llm=self.llm
+        )
+        # Price Analysis Agent
+        price_analyst = Agent(
+            role="Price Impact Analyst",
+            goal="Analyze how whale transactions impact token prices",
+            backstory="""You are a quantitative market analyst with expertise in correlating
+            trading activity with price movements. You specialize in detecting how large trades
+            influence market dynamics, and can identify unusual price patterns.""",
+            verbose=True,
+            allow_delegation=True,
+            tools=self.gemini_tools,
+            llm=self.llm
+        )
+        # Pattern Detection Agent
+        pattern_detector = Agent(
+            role="Trading Pattern Detector",
+            goal="Identify recurring behavior patterns in whale trading activity",
+            backstory="""You are a data scientist specialized in time-series analysis and behavioral
+            pattern recognition. You excel at spotting cyclical behaviors, correlation patterns, and
+            anomalous trading activities across multiple addresses.""",
+            verbose=True,
+            allow_delegation=True,
+            tools=self.data_processor_tools,
+            llm=self.llm
+        )
+        # Manipulation Detector Agent
+        manipulation_detector = Agent(
+            role="Market Manipulation Investigator",
+            goal="Detect potential market manipulation in whale activity",
+            backstory="""You are a financial forensics expert who has studied market manipulation
+            techniques for years. You can identify pump-and-dump schemes, wash trading, spoofing,
+            and other deceptive practices used by whale traders to manipulate market prices.""",
+            verbose=True,
+            allow_delegation=True,
+            tools=self.data_processor_tools,
+            llm=self.llm
+        )
+        # Report Generator Agent
+        report_generator = Agent(
+            role="Insights Reporter",
+            goal="Create comprehensive, actionable reports on whale activity",
+            backstory="""You are a financial data storyteller who excels at transforming complex
+            blockchain data into clear, insightful narratives. You can distill technical findings
+            into actionable intelligence for different audiences.""",
+            verbose=True,
+            allow_delegation=True,
+            tools=[],
+            llm=self.llm
+        )
+        return {
+            "data_collector": data_collector,
+            "price_analyst": price_analyst,
+            "pattern_detector": pattern_detector,
+            "manipulation_detector": manipulation_detector,
+            "report_generator": report_generator
+        }
+    def track_large_transactions(self,
+                               wallets: List[str],
+                               start_date: datetime,
+                               end_date: datetime,
+                               threshold_value: float,
+                               threshold_type: str,
+                               token_symbol: Optional[str] = None) -> pd.DataFrame:
+        """
+        Track large buy/sell transactions for specified wallets
+        Args:
+            wallets: List of wallet addresses to track
+            start_date: Start date for analysis
+            end_date: End date for analysis
+            threshold_value: Minimum value for transaction tracking
+            threshold_type: Type of threshold ("Token Amount" or "USD Value")
+            token_symbol: Symbol of token to track (only required if threshold_type is "Token Amount")
+        Returns:
+            DataFrame of large transactions
+        """
+        agents = self.create_agents()
+        # Define tasks
+        data_collection_task = Task(
+            description=f"""
+            Collect all transactions for the following wallets: {', '.join(wallets)}
+            between {start_date.strftime('%Y-%m-%d')} and {end_date.strftime('%Y-%m-%d')}.
+            Filter for transactions {'of ' + token_symbol if token_symbol else ''} with a
+            {'token amount greater than ' + str(threshold_value) if threshold_type == 'Token Amount'
+            else 'USD value greater than $' + str(threshold_value)}.
+            Return the data in a well-structured format with timestamp, transaction hash,
+            sender, recipient, token symbol, and amount.
+            """,
+            agent=agents["data_collector"],
+            expected_output="""
+            A comprehensive dataset of all large transactions for the specified wallets,
+            properly filtered according to the threshold criteria.
+            """
+        )
+        # Create and run the crew
+        crew = Crew(
+            agents=[agents["data_collector"]],
+            tasks=[data_collection_task],
+            verbose=2,
+            process=Process.sequential
+        )
+        result = crew.kickoff()
+        # Process the result
+        import json
+        try:
+            # Try to extract JSON from the result
+            import re
+            json_match = re.search(r'```json\n([\s\S]*?)\n```', result)
+            if json_match:
+                json_str = json_match.group(1)
+                transactions_data = json.loads(json_str)
+                if isinstance(transactions_data, list):
+                    return pd.DataFrame(transactions_data)
+                else:
+                    return pd.DataFrame()
+            else:
+                # Try to parse the entire result as JSON
+                transactions_data = json.loads(result)
+                if isinstance(transactions_data, list):
+                    return pd.DataFrame(transactions_data)
+                else:
+                    return pd.DataFrame()
+        except:
+            # Fallback to querying the API directly
+            token_address = None  # Would need a mapping of symbol to address
+            transactions_df = self.arbiscan_client.fetch_whale_transactions(
+                addresses=wallets,
+                token_address=token_address,
+                min_token_amount=threshold_value if threshold_type == "Token Amount" else None,
+                min_usd_value=threshold_value if threshold_type == "USD Value" else None
+            )
+            return transactions_df
+    def identify_trading_patterns(self,
+                                wallets: List[str],
+                                start_date: datetime,
+                                end_date: datetime) -> List[Dict[str, Any]]:
+        """
+        Identify trading patterns for specified wallets
+        Args:
+            wallets: List of wallet addresses to analyze
+            start_date: Start date for analysis
+            end_date: End date for analysis
+        Returns:
+            List of identified patterns
+        """
+        agents = self.create_agents()
+        # Define tasks
+        data_collection_task = Task(
+            description=f"""
+            Collect all transactions for the following wallets: {', '.join(wallets)}
+            between {start_date.strftime('%Y-%m-%d')} and {end_date.strftime('%Y-%m-%d')}.
+            Include all token transfers, regardless of size.
+            """,
+            agent=agents["data_collector"],
+            expected_output="""
+            A comprehensive dataset of all transactions for the specified wallets.
+            """
+        )
+        pattern_analysis_task = Task(
+            description="""
+            Analyze the transaction data to identify recurring trading patterns.
+            Look for:
+            1. Cyclical buying/selling behaviors
+            2. Time-of-day patterns
+            3. Accumulation/distribution phases
+            4. Coordinated movements across multiple addresses
+            Cluster similar behaviors and describe each pattern identified.
+            """,
+            agent=agents["pattern_detector"],
+            expected_output="""
+            A detailed analysis of trading patterns with:
+            - Pattern name/type
+            - Description of behavior
+            - Frequency and confidence level
+            - Example transactions showing the pattern
+            """,
+            context=[data_collection_task]
+        )
+        # Create and run the crew
+        crew = Crew(
+            agents=[agents["data_collector"], agents["pattern_detector"]],
+            tasks=[data_collection_task, pattern_analysis_task],
+            verbose=2,
+            process=Process.sequential
+        )
+        result = crew.kickoff()
+        # Process the result
+        import json
+        try:
+            # Try to extract JSON from the result
+            import re
+            json_match = re.search(r'```json\n([\s\S]*?)\n```', result)
+            if json_match:
+                json_str = json_match.group(1)
+                patterns_data = json.loads(json_str)
+                # Convert the patterns to the expected format
+                return self._convert_patterns_to_visual_format(patterns_data)
+            else:
+                # Fallback to a simple pattern analysis
+                # First, get transaction data directly
+                all_transactions = []
+                for wallet in wallets:
+                    transfers = self.arbiscan_client.fetch_all_token_transfers(
+                        address=wallet
+                    )
+                    all_transactions.extend(transfers)
+                if not all_transactions:
+                    return []
+                transactions_df = pd.DataFrame(all_transactions)
+                # Use data processor to identify patterns
+                patterns = self.data_processor.identify_patterns(transactions_df)
+                return patterns
+        except Exception as e:
+            print(f"Error processing patterns: {str(e)}")
+            return []
+    def _convert_patterns_to_visual_format(self, patterns_data: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+        """
+        Convert pattern data from agents to visual format with charts
+        Args:
+            patterns_data: Pattern data from agents
+        Returns:
+            List of patterns with visualizations
+        """
+        visual_patterns = []
+        for pattern in patterns_data:
+            # Create chart
+            if 'examples' in pattern and pattern['examples']:
+                examples_data = []
+                # Check if examples is a JSON string
+                if isinstance(pattern['examples'], str):
+                    try:
+                        examples_data = pd.read_json(pattern['examples'])
+                    except:
+                        examples_data = pd.DataFrame()
+                else:
+                    examples_data = pd.DataFrame(pattern['examples'])
+                # Create visualization
+                if not examples_data.empty:
+                    import plotly.express as px
+                    # Check for timestamp column
+                    if 'Timestamp' in examples_data.columns:
+                        time_col = 'Timestamp'
+                    elif 'timeStamp' in examples_data.columns:
+                        time_col = 'timeStamp'
+                    else:
+                        time_col = None
+                    # Check for amount column
+                    if 'Amount' in examples_data.columns:
+                        amount_col = 'Amount'
+                    elif 'tokenAmount' in examples_data.columns:
+                        amount_col = 'tokenAmount'
+                    elif 'value' in examples_data.columns:
+                        amount_col = 'value'
+                    else:
+                        amount_col = None
+                    if time_col and amount_col:
+                        # Create time series chart
+                        fig = px.line(
+                            examples_data,
+                            x=time_col,
+                            y=amount_col,
+                            title=f"Pattern: {pattern['name']}"
+                        )
+                    else:
+                        fig = None
+                else:
+                    fig = None
+            else:
+                fig = None
+                examples_data = pd.DataFrame()
+            # Create visual pattern object
+            visual_pattern = {
+                "name": pattern.get("name", "Unknown Pattern"),
+                "description": pattern.get("description", ""),
+                "confidence": pattern.get("confidence", 0.5),
+                "occurrence_count": pattern.get("occurrence_count", 0),
+                "chart_data": fig,
+                "examples": examples_data
+            }
+            visual_patterns.append(visual_pattern)
+        return visual_patterns
+    def analyze_price_impact(self,
+                           wallets: List[str],
+                           start_date: datetime,
+                           end_date: datetime,
+                           lookback_minutes: int = 5,
+                           lookahead_minutes: int = 5) -> Dict[str, Any]:
+        """
+        Analyze the impact of whale transactions on token prices
+        Args:
+            wallets: List of wallet addresses to analyze
+            start_date: Start date for analysis
+            end_date: End date for analysis
+            lookback_minutes: Minutes to look back before transactions
+            lookahead_minutes: Minutes to look ahead after transactions
+        Returns:
+            Dictionary with price impact analysis
+        """
+        agents = self.create_agents()
+        # Define tasks
+        data_collection_task = Task(
+            description=f"""
+            Collect all transactions for the following wallets: {', '.join(wallets)}
+            between {start_date.strftime('%Y-%m-%d')} and {end_date.strftime('%Y-%m-%d')}.
+            Focus on large transactions that might impact price.
+            """,
+            agent=agents["data_collector"],
+            expected_output="""
+            A comprehensive dataset of all significant transactions for the specified wallets.
+            """
+        )
+        price_impact_task = Task(
+            description=f"""
+            Analyze the price impact of the whale transactions.
+            For each transaction:
+            1. Fetch price data for {lookback_minutes} minutes before and {lookahead_minutes} minutes after the transaction
+            2. Calculate the percentage price change
+            3. Identify transactions that caused significant price moves
+            Summarize the overall price impact statistics and highlight notable instances.
+            """,
+            agent=agents["price_analyst"],
+            expected_output="""
+            A detailed analysis of price impacts with:
+            - Average price impact percentage
+            - Maximum price impact (positive and negative)
+            - Count of significant price moves
+            - List of transactions with their corresponding price impacts
+            """,
+            context=[data_collection_task]
+        )
+        # Create and run the crew
+        crew = Crew(
+            agents=[agents["data_collector"], agents["price_analyst"]],
+            tasks=[data_collection_task, price_impact_task],
+            verbose=2,
+            process=Process.sequential
+        )
+        result = crew.kickoff()
+        # Process the result
+        import json
+        try:
+            # Try to extract JSON from the result
+            import re
+            json_match = re.search(r'```json\n([\s\S]*?)\n```', result)
+            if json_match:
+                json_str = json_match.group(1)
+                impact_data = json.loads(json_str)
+                # Convert the impact data to visual format
+                return self._convert_impact_to_visual_format(impact_data)
+            else:
+                # Fallback to direct calculation
+                # First, get transaction data
+                all_transactions = []
+                for wallet in wallets:
+                    transfers = self.arbiscan_client.fetch_all_token_transfers(
+                        address=wallet
+                    )
+                    all_transactions.extend(transfers)
+                if not all_transactions:
+                    return {}
+                transactions_df = pd.DataFrame(all_transactions)
+                # Calculate price impact for each transaction
+                price_data = {}
+                for idx, row in transactions_df.iterrows():
+                    tx_hash = row.get('hash', '')
+                    if not tx_hash:
+                        continue
+                    # Get symbol
+                    symbol = row.get('tokenSymbol', '')
+                    if not symbol:
+                        continue
+                    # Get timestamp
+                    timestamp = row.get('timeStamp', 0)
+                    if not timestamp:
+                        continue
+                    # Convert timestamp to datetime
+                    if isinstance(timestamp, (int, float)):
+                        tx_time = datetime.fromtimestamp(int(timestamp))
+                    else:
+                        tx_time = timestamp
+                    # Get price impact
+                    symbol_usd = f"{symbol}USD"
+                    impact = self.gemini_client.get_price_impact(
+                        symbol=symbol_usd,
+                        transaction_time=tx_time,
+                        lookback_minutes=lookback_minutes,
+                        lookahead_minutes=lookahead_minutes
+                    )
+                    price_data[tx_hash] = impact
+                # Use data processor to analyze price impact
+                impact_analysis = self.data_processor.analyze_price_impact(
+                    transactions_df=transactions_df,
+                    price_data=price_data
+                )
+                return impact_analysis
+        except Exception as e:
+            print(f"Error processing price impact: {str(e)}")
+            return {}
+    def _convert_impact_to_visual_format(self, impact_data: Dict[str, Any]) -> Dict[str, Any]:
+        """
+        Convert price impact data to visual format with charts
+        Args:
+            impact_data: Price impact data
+        Returns:
+            Dictionary with price impact analysis and visualizations
+        """
+        # Convert transactions_with_impact to DataFrame if it's a string
+        if 'transactions_with_impact' in impact_data and isinstance(impact_data['transactions_with_impact'], str):
+            try:
+                transactions_df = pd.read_json(impact_data['transactions_with_impact'])
+            except:
+                transactions_df = pd.DataFrame()
+        elif 'transactions_with_impact' in impact_data and isinstance(impact_data['transactions_with_impact'], list):
+            transactions_df = pd.DataFrame(impact_data['transactions_with_impact'])
+        else:
+            transactions_df = pd.DataFrame()
+        # Create impact chart
+        if not transactions_df.empty and 'impact_pct' in transactions_df.columns and 'Timestamp' in transactions_df.columns:
+            import plotly.graph_objects as go
+            fig = go.Figure()
+            fig.add_trace(go.Scatter(
+                x=transactions_df['Timestamp'],
+                y=transactions_df['impact_pct'],
+                mode='markers+lines',
+                name='Price Impact (%)',
+                marker=dict(
+                    size=10,
+                    color=transactions_df['impact_pct'],
+                    colorscale='RdBu',
+                    cmin=-max(abs(transactions_df['impact_pct'])) if len(transactions_df) > 0 else -1,
+                    cmax=max(abs(transactions_df['impact_pct'])) if len(transactions_df) > 0 else 1,
+                    colorbar=dict(title='Impact %'),
+                    symbol='circle'
+                )
+            ))
+            fig.update_layout(
+                title='Price Impact of Whale Transactions',
+                xaxis_title='Timestamp',
+                yaxis_title='Price Impact (%)',
+                hovermode='closest'
+            )
+            # Add zero line
+            fig.add_hline(y=0, line_dash="dash", line_color="gray")
+        else:
+            fig = None
+        # Create visual impact analysis
+        visual_impact = {
+            'avg_impact_pct': impact_data.get('avg_impact_pct', 0),
+            'max_impact_pct': impact_data.get('max_impact_pct', 0),
+            'min_impact_pct': impact_data.get('min_impact_pct', 0),
+            'significant_moves_count': impact_data.get('significant_moves_count', 0),
+            'total_transactions': impact_data.get('total_transactions', 0),
+            'impact_chart': fig,
+            'transactions_with_impact': transactions_df
+        }
+        return visual_impact
+    def detect_manipulation(self,
+                         wallets: List[str],
+                         start_date: datetime,
+                         end_date: datetime,
+                         sensitivity: str = "Medium") -> List[Dict[str, Any]]:
+        """
+        Detect potential market manipulation by whale wallets
+        Args:
+            wallets: List of wallet addresses to analyze
+            start_date: Start date for analysis
+            end_date: End date for analysis
+            sensitivity: Detection sensitivity ("Low", "Medium", "High")
+        Returns:
+            List of manipulation alerts
+        """
+        agents = self.create_agents()
+        # Define tasks
+        data_collection_task = Task(
+            description=f"""
+            Collect all transactions for the following wallets: {', '.join(wallets)}
+            between {start_date.strftime('%Y-%m-%d')} and {end_date.strftime('%Y-%m-%d')}.
+            Include all token transfers and also fetch price data if available.
+            """,
+            agent=agents["data_collector"],
+            expected_output="""
+            A comprehensive dataset of all transactions for the specified wallets.
+            """
+        )
+        price_impact_task = Task(
+            description="""
+            Analyze the price impact of the whale transactions.
+            For each significant transaction, fetch and analyze price data around the transaction time.
+            """,
+            agent=agents["price_analyst"],
+            expected_output="""
+            Price impact data for the transactions.
+            """,
+            context=[data_collection_task]
+        )
+        manipulation_detection_task = Task(
+            description=f"""
+            Detect potential market manipulation patterns in the transaction data with sensitivity level: {sensitivity}.
+            Look for:
+            1. Pump-and-Dump: Rapid buys followed by coordinated sell-offs
+            2. Wash Trading: Self-trading across multiple addresses
+            3. Spoofing: Large orders placed then canceled (if detectable)
+            4. Momentum Ignition: Creating sharp price moves to trigger other participants' momentum-based trading
+            For each potential manipulation, provide:
+            - Type of manipulation
+            - Involved addresses
+            - Risk level (High, Medium, Low)
+            - Description of the suspicious behavior
+            - Evidence (transactions showing the pattern)
+            """,
+            agent=agents["manipulation_detector"],
+            expected_output="""
+            A detailed list of potential manipulation incidents with supporting evidence.
+            """,
+            context=[data_collection_task, price_impact_task]
+        )
+        # Create and run the crew
+        crew = Crew(
+            agents=[
+                agents["data_collector"],
+                agents["price_analyst"],
+                agents["manipulation_detector"]
+            ],
+            tasks=[
+                data_collection_task,
+                price_impact_task,
+                manipulation_detection_task
+            ],
+            verbose=2,
+            process=Process.sequential
+        )
+        result = crew.kickoff()
+        # Process the result
+        import json
+        try:
+            # Try to extract JSON from the result
+            import re
+            json_match = re.search(r'```json\n([\s\S]*?)\n```', result)
+            if json_match:
+                json_str = json_match.group(1)
+                alerts_data = json.loads(json_str)
+                # Convert the alerts to visual format
+                return self._convert_alerts_to_visual_format(alerts_data)
+            else:
+                # Fallback to direct detection
+                # First, get transaction data
+                all_transactions = []
+                for wallet in wallets:
+                    transfers = self.arbiscan_client.fetch_all_token_transfers(
+                        address=wallet
+                    )
+                    all_transactions.extend(transfers)
+                if not all_transactions:
+                    return []
+                transactions_df = pd.DataFrame(all_transactions)
+                # Calculate price impact for each transaction
+                price_data = {}
+                for idx, row in transactions_df.iterrows():
+                    tx_hash = row.get('hash', '')
+                    if not tx_hash:
+                        continue
+                    # Get symbol
+                    symbol = row.get('tokenSymbol', '')
+                    if not symbol:
+                        continue
+                    # Get timestamp
+                    timestamp = row.get('timeStamp', 0)
+                    if not timestamp:
+                        continue
+                    # Convert timestamp to datetime
+                    if isinstance(timestamp, (int, float)):
+                        tx_time = datetime.fromtimestamp(int(timestamp))
+                    else:
+                        tx_time = timestamp
+                    # Get price impact
+                    symbol_usd = f"{symbol}USD"
+                    impact = self.gemini_client.get_price_impact(
+                        symbol=symbol_usd,
+                        transaction_time=tx_time,
+                        lookback_minutes=5,
+                        lookahead_minutes=5
+                    )
+                    price_data[tx_hash] = impact
+                # Detect wash trading
+                wash_trading_alerts = self.data_processor.detect_wash_trading(
+                    transactions_df=transactions_df,
+                    addresses=wallets,
+                    sensitivity=sensitivity
+                )
+                # Detect pump and dump
+                pump_and_dump_alerts = self.data_processor.detect_pump_and_dump(
+                    transactions_df=transactions_df,
+                    price_data=price_data,
+                    sensitivity=sensitivity
+                )
+                # Combine alerts
+                all_alerts = wash_trading_alerts + pump_and_dump_alerts
+                return all_alerts
+        except Exception as e:
+            print(f"Error detecting manipulation: {str(e)}")
+            return []
+    def _convert_alerts_to_visual_format(self, alerts_data: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+        """
+        Convert manipulation alerts data to visual format with charts
+        Args:
+            alerts_data: Alerts data from agents
+        Returns:
+            List of alerts with visualizations
+        """
+        visual_alerts = []
+        for alert in alerts_data:
+            # Create chart based on alert type
+            if 'evidence' in alert and alert['evidence']:
+                evidence_data = []
+                # Check if evidence is a JSON string
+                if isinstance(alert['evidence'], str):
+                    try:
+                        evidence_data = pd.read_json(alert['evidence'])
+                    except:
+                        evidence_data = pd.DataFrame()
+                else:
+                    evidence_data = pd.DataFrame(alert['evidence'])
+                # Create visualization based on alert type
+                if not evidence_data.empty:
+                    import plotly.graph_objects as go
+                    import plotly.express as px
+                    # Check for timestamp column
+                    if 'Timestamp' in evidence_data.columns:
+                        time_col = 'Timestamp'
+                    elif 'timeStamp' in evidence_data.columns:
+                        time_col = 'timeStamp'
+                    elif 'timestamp' in evidence_data.columns:
+                        time_col = 'timestamp'
+                    else:
+                        time_col = None
+                    # Different visualizations based on alert type
+                    if alert.get('type') == 'Wash Trading' and time_col:
+                        # Create scatter plot of wash trading
+                        fig = px.scatter(
+                            evidence_data,
+                            x=time_col,
+                            y=evidence_data.get('Amount', evidence_data.get('tokenAmount', evidence_data.get('value', 0))),
+                            color=evidence_data.get('From', evidence_data.get('from', 'Unknown')),
+                            title=f"Wash Trading Evidence: {alert.get('title', '')}"
+                        )
+                    elif alert.get('type') == 'Pump and Dump' and time_col and 'pre_price' in evidence_data.columns:
+                        # Create price line for pump and dump
+                        fig = go.Figure()
+                        # Plot price line
+                        fig.add_trace(go.Scatter(
+                            x=evidence_data[time_col],
+                            y=evidence_data['pre_price'],
+                            mode='lines+markers',
+                            name='Price Before Transaction',
+                            line=dict(color='blue')
+                        ))
+                        fig.add_trace(go.Scatter(
+                            x=evidence_data[time_col],
+                            y=evidence_data['post_price'],
+                            mode='lines+markers',
+                            name='Price After Transaction',
+                            line=dict(color='red')
+                        ))
+                        fig.update_layout(
+                            title=f"Pump and Dump Evidence: {alert.get('title', '')}",
+                            xaxis_title='Time',
+                            yaxis_title='Price',
+                            hovermode='closest'
+                        )
+                    elif alert.get('type') == 'Momentum Ignition' and time_col and 'impact_pct' in evidence_data.columns:
+                        # Create impact scatter for momentum ignition
+                        fig = px.scatter(
+                            evidence_data,
+                            x=time_col,
+                            y='impact_pct',
+                            size=abs(evidence_data['impact_pct']),
+                            color='impact_pct',
+                            color_continuous_scale='RdBu',
+                            title=f"Momentum Ignition Evidence: {alert.get('title', '')}"
+                        )
+                    else:
+                        # Generic timeline view
+                        if time_col:
+                            fig = px.timeline(
+                                evidence_data,
+                                x_start=time_col,
+                                x_end=time_col,
+                                y=evidence_data.get('From', evidence_data.get('from', 'Unknown')),
+                                color=alert.get('risk_level', 'Medium'),
+                                title=f"Alert Evidence: {alert.get('title', '')}"
+                            )
+                        else:
+                            fig = None
+                else:
+                    fig = None
+            else:
+                fig = None
+                evidence_data = pd.DataFrame()
+            # Create visual alert object
+            visual_alert = {
+                "type": alert.get("type", "Unknown"),
+                "addresses": alert.get("addresses", []),
+                "risk_level": alert.get("risk_level", "Medium"),
+                "description": alert.get("description", ""),
+                "detection_time": alert.get("detection_time", datetime.now().strftime("%Y-%m-%d %H:%M:%S")),
+                "title": alert.get("title", "Alert"),
+                "evidence": evidence_data,
+                "chart": fig
+            }
+            visual_alerts.append(visual_alert)
+        return visual_alerts
+    def generate_report(self,
+                      wallets: List[str],
+                      start_date: datetime,
+                      end_date: datetime,
+                      report_type: str = "Transaction Summary",
+                      export_format: str = "PDF") -> Dict[str, Any]:
+        """
+        Generate a report of whale activity
+        Args:
+            wallets: List of wallet addresses to include in the report
+            start_date: Start date for report period
+            end_date: End date for report period
+            report_type: Type of report to generate
+            export_format: Format for the report (CSV, PDF, PNG)
+        Returns:
+            Dictionary with report data
+        """
+        from modules.visualizer import Visualizer
+        visualizer = Visualizer()
+        agents = self.create_agents()
+        # Define tasks
+        data_collection_task = Task(
+            description=f"""
+            Collect all transactions for the following wallets: {', '.join(wallets)}
+            between {start_date.strftime('%Y-%m-%d')} and {end_date.strftime('%Y-%m-%d')}.
+            """,
+            agent=agents["data_collector"],
+            expected_output="""
+            A comprehensive dataset of all transactions for the specified wallets.
+            """
+        )
+        report_task = Task(
+            description=f"""
+            Generate a {report_type} report in {export_format} format.
+            The report should include:
+            1. Executive summary of wallet activity
+            2. Transaction analysis
+            3. Pattern identification (if applicable)
+            4. Price impact analysis (if applicable)
+            5. Manipulation detection (if applicable)
+            Organize the information clearly and provide actionable insights.
+            """,
+            agent=agents["report_generator"],
+            expected_output=f"""
+            A complete {export_format} report with all relevant analyses.
+            """,
+            context=[data_collection_task]
+        )
+        # Create and run the crew
+        crew = Crew(
+            agents=[agents["data_collector"], agents["report_generator"]],
+            tasks=[data_collection_task, report_task],
+            verbose=2,
+            process=Process.sequential
+        )
+        result = crew.kickoff()
+        # Process the result - for reports, we'll use our visualizer directly
+        # First, get transaction data
+        all_transactions = []
+        for wallet in wallets:
+            transfers = self.arbiscan_client.fetch_all_token_transfers(
+                address=wallet
+            )
+            all_transactions.extend(transfers)
+        if not all_transactions:
+            return {
+                "filename": f"no_data_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.{export_format.lower()}",
+                "content": ""
+            }
+        transactions_df = pd.DataFrame(all_transactions)
+        # Generate the report based on format
+        filename = f"whale_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+        if export_format == "CSV":
+            content = visualizer.generate_csv_report(
+                transactions_df=transactions_df,
+                report_type=report_type
+            )
+            filename += ".csv"
+            return {
+                "filename": filename,
+                "content": content
+            }
+        elif export_format == "PDF":
+            # For PDF we need to get more data
+            # Run pattern detection
+            patterns = self.identify_trading_patterns(
+                wallets=wallets,
+                start_date=start_date,
+                end_date=end_date
+            )
+            # Run price impact analysis
+            price_impact = self.analyze_price_impact(
+                wallets=wallets,
+                start_date=start_date,
+                end_date=end_date
+            )
+            # Run manipulation detection
+            alerts = self.detect_manipulation(
+                wallets=wallets,
+                start_date=start_date,
+                end_date=end_date
+            )
+            content = visualizer.generate_pdf_report(
+                transactions_df=transactions_df,
+                patterns=patterns,
+                price_impact=price_impact,
+                alerts=alerts,
+                title=f"Whale Analysis Report: {report_type}",
+                start_date=start_date,
+                end_date=end_date
+            )
+            filename += ".pdf"
+            return {
+                "filename": filename,
+                "content": content
+            }
+        elif export_format == "PNG":
+            # For PNG we'll create a chart based on report type
+            if report_type == "Transaction Summary":
+                fig = visualizer.create_transaction_timeline(transactions_df)
+            elif report_type == "Pattern Analysis":
+                fig = visualizer.create_volume_chart(transactions_df)
+            elif report_type == "Price Impact":
+                # Run price impact analysis first
+                price_impact = self.analyze_price_impact(
+                    wallets=wallets,
+                    start_date=start_date,
+                    end_date=end_date
+                )
+                fig = price_impact.get('impact_chart', visualizer.create_transaction_timeline(transactions_df))
+            else:  # "Manipulation Detection" or "Complete Analysis"
+                fig = visualizer.create_network_graph(transactions_df)
+            content = visualizer.generate_png_chart(fig)
+            filename += ".png"
+            return {
+                "filename": filename,
+                "content": content
+            }
+        else:
+            return {
+                "filename": f"unsupported_format_{datetime.now().strftime('%Y%m%d_%H%M%S')}.txt",
+                "content": "Unsupported export format requested."
+            }

modules/crew_tools.py ADDED Viewed

	@@ -0,0 +1,362 @@

+"""
+Properly implemented tools for the WhaleAnalysisCrewSystem
+"""
+import json
+import pandas as pd
+from datetime import datetime
+from typing import Any, Dict, List, Optional, Type
+from pydantic import BaseModel, Field
+import logging
+from modules.api_client import ArbiscanClient, GeminiClient
+from modules.data_processor import DataProcessor
+from langchain.tools import BaseTool
+class GetTokenTransfersInput(BaseModel):
+    """Input for the get_token_transfers tool."""
+    address: str = Field(..., description="Wallet address to query")
+    contract_address: Optional[str] = Field(None, description="Optional token contract address to filter by")
+# Global clients that will be used by all tools
+_GLOBAL_ARBISCAN_CLIENT = None
+_GLOBAL_GEMINI_CLIENT = None
+_GLOBAL_DATA_PROCESSOR = None
+def set_global_clients(arbiscan_client=None, gemini_client=None, data_processor=None):
+    """Set global client instances that will be used by all tools"""
+    global _GLOBAL_ARBISCAN_CLIENT, _GLOBAL_GEMINI_CLIENT, _GLOBAL_DATA_PROCESSOR
+    if arbiscan_client:
+        _GLOBAL_ARBISCAN_CLIENT = arbiscan_client
+    if gemini_client:
+        _GLOBAL_GEMINI_CLIENT = gemini_client
+    if data_processor:
+        _GLOBAL_DATA_PROCESSOR = data_processor
+class ArbiscanGetTokenTransfersTool(BaseTool):
+    """Tool for fetching token transfers from Arbiscan."""
+    name = "arbiscan_get_token_transfers"
+    description = "Get ERC-20 token transfers for a specific address"
+    args_schema: Type[BaseModel] = GetTokenTransfersInput
+    def __init__(self, arbiscan_client=None):
+        super().__init__()
+        # Store reference to client if provided, otherwise we'll use global instance
+        if arbiscan_client:
+            set_global_clients(arbiscan_client=arbiscan_client)
+    def _run(self, address: str, contract_address: Optional[str] = None) -> str:
+        global _GLOBAL_ARBISCAN_CLIENT
+        if not _GLOBAL_ARBISCAN_CLIENT:
+            return json.dumps({"error": "Arbiscan client not initialized. Please set global client first."})
+        try:
+            transfers = _GLOBAL_ARBISCAN_CLIENT.get_token_transfers(
+                address=address,
+                contract_address=contract_address
+            )
+            return json.dumps(transfers)
+        except Exception as e:
+            logging.error(f"Error in ArbiscanGetTokenTransfersTool: {str(e)}")
+            return json.dumps({"error": str(e)})
+class GetNormalTransactionsInput(BaseModel):
+    """Input for the get_normal_transactions tool."""
+    address: str = Field(..., description="Wallet address to query")
+class ArbiscanGetNormalTransactionsTool(BaseTool):
+    """Tool for fetching normal transactions from Arbiscan."""
+    name = "arbiscan_get_normal_transactions"
+    description = "Get normal transactions (ETH/ARB transfers) for a specific address"
+    args_schema: Type[BaseModel] = GetNormalTransactionsInput
+    def __init__(self, arbiscan_client=None):
+        super().__init__()
+        # Store reference to client if provided, otherwise we'll use global instance
+        if arbiscan_client:
+            set_global_clients(arbiscan_client=arbiscan_client)
+    def _run(self, address: str, startblock: int = 0, endblock: int = 99999999, page: int = 1, offset: int = 10) -> str:
+        global _GLOBAL_ARBISCAN_CLIENT
+        if not _GLOBAL_ARBISCAN_CLIENT:
+            return json.dumps({"error": "Arbiscan client not initialized. Please set global client first."})
+        try:
+            txs = _GLOBAL_ARBISCAN_CLIENT.get_normal_transactions(
+                address=address,
+                start_block=startblock,
+                end_block=endblock,
+                page=page,
+                offset=offset
+            )
+            return json.dumps(txs)
+        except Exception as e:
+            logging.error(f"Error in ArbiscanGetNormalTransactionsTool: {str(e)}")
+            return json.dumps({"error": str(e)})
+class GetInternalTransactionsInput(BaseModel):
+    """Input for the get_internal_transactions tool."""
+    address: str = Field(..., description="Wallet address to query")
+class ArbiscanGetInternalTransactionsTool(BaseTool):
+    """Tool for fetching internal transactions from Arbiscan."""
+    name = "arbiscan_get_internal_transactions"
+    description = "Get internal transactions for a specific address"
+    args_schema: Type[BaseModel] = GetInternalTransactionsInput
+    def __init__(self, arbiscan_client=None):
+        super().__init__()
+        # Store reference to client if provided, otherwise we'll use global instance
+        if arbiscan_client:
+            set_global_clients(arbiscan_client=arbiscan_client)
+    def _run(self, address: str, startblock: int = 0, endblock: int = 99999999, page: int = 1, offset: int = 10) -> str:
+        global _GLOBAL_ARBISCAN_CLIENT
+        if not _GLOBAL_ARBISCAN_CLIENT:
+            return json.dumps({"error": "Arbiscan client not initialized. Please set global client first."})
+        try:
+            txs = _GLOBAL_ARBISCAN_CLIENT.get_internal_transactions(
+                address=address,
+                start_block=startblock,
+                end_block=endblock,
+                page=page,
+                offset=offset
+            )
+            return json.dumps(txs)
+        except Exception as e:
+            logging.error(f"Error in ArbiscanGetInternalTransactionsTool: {str(e)}")
+            return json.dumps({"error": str(e)})
+class FetchWhaleTransactionsInput(BaseModel):
+    """Input for the fetch_whale_transactions tool."""
+    addresses: List[str] = Field(..., description="List of wallet addresses to query")
+    token_address: Optional[str] = Field(None, description="Optional token contract address to filter by")
+    min_token_amount: Optional[float] = Field(None, description="Minimum token amount")
+    min_usd_value: Optional[float] = Field(None, description="Minimum USD value")
+class ArbiscanFetchWhaleTransactionsTool(BaseTool):
+    """Tool for fetching whale transactions from Arbiscan."""
+    name = "arbiscan_fetch_whale_transactions"
+    description = "Fetch whale transactions for a list of addresses"
+    args_schema: Type[BaseModel] = FetchWhaleTransactionsInput
+    def __init__(self, arbiscan_client=None):
+        super().__init__()
+        # Store reference to client if provided, otherwise we'll use global instance
+        if arbiscan_client:
+            set_global_clients(arbiscan_client=arbiscan_client)
+    def _run(self, addresses: List[str], token_address: Optional[str] = None,
+              min_token_amount: Optional[float] = None, min_usd_value: Optional[float] = None) -> str:
+        global _GLOBAL_ARBISCAN_CLIENT
+        if not _GLOBAL_ARBISCAN_CLIENT:
+            return json.dumps({"error": "Arbiscan client not initialized. Please set global client first."})
+        try:
+            transactions_df = _GLOBAL_ARBISCAN_CLIENT.fetch_whale_transactions(
+                addresses=addresses,
+                token_address=token_address,
+                min_token_amount=min_token_amount,
+                min_usd_value=min_usd_value,
+                max_pages=5  # Limit to 5 pages to prevent excessive API calls
+            )
+            return transactions_df.to_json(orient="records")
+        except Exception as e:
+            logging.error(f"Error in ArbiscanFetchWhaleTransactionsTool: {str(e)}")
+            return json.dumps({"error": str(e)})
+class GetCurrentPriceInput(BaseModel):
+    """Input for the get_current_price tool."""
+    symbol: str = Field(..., description="Token symbol (e.g., 'ETHUSD')")
+class GeminiGetCurrentPriceTool(BaseTool):
+    """Tool for getting current token price from Gemini."""
+    name = "gemini_get_current_price"
+    description = "Get the current price of a token"
+    args_schema: Type[BaseModel] = GetCurrentPriceInput
+    def __init__(self, gemini_client=None):
+        super().__init__()
+        # Store reference to client if provided, otherwise we'll use global instance
+        if gemini_client:
+            set_global_clients(gemini_client=gemini_client)
+    def _run(self, symbol: str) -> str:
+        global _GLOBAL_GEMINI_CLIENT
+        if not _GLOBAL_GEMINI_CLIENT:
+            return json.dumps({"error": "Gemini client not initialized. Please set global client first."})
+        try:
+            price = _GLOBAL_GEMINI_CLIENT.get_current_price(symbol)
+            return json.dumps({"symbol": symbol, "price": price})
+        except Exception as e:
+            logging.error(f"Error in GeminiGetCurrentPriceTool: {str(e)}")
+            return json.dumps({"error": str(e)})
+class GetHistoricalPricesInput(BaseModel):
+    """Input for the get_historical_prices tool."""
+    symbol: str = Field(..., description="Token symbol (e.g., 'ETHUSD')")
+    start_time: str = Field(..., description="Start datetime in ISO format")
+    end_time: str = Field(..., description="End datetime in ISO format")
+class GeminiGetHistoricalPricesTool(BaseTool):
+    """Tool for getting historical token prices from Gemini."""
+    name = "gemini_get_historical_prices"
+    description = "Get historical prices for a token within a time range"
+    args_schema: Type[BaseModel] = GetHistoricalPricesInput
+    def __init__(self, gemini_client=None):
+        super().__init__()
+        # Store reference to client if provided, otherwise we'll use global instance
+        if gemini_client:
+            set_global_clients(gemini_client=gemini_client)
+    def _run(
+        self,
+        symbol: str,
+        start_time: Optional[str] = None,
+        end_time: Optional[str] = None,
+        interval: str = "15m"
+    ) -> str:
+        global _GLOBAL_GEMINI_CLIENT
+        if not _GLOBAL_GEMINI_CLIENT:
+            return json.dumps({"error": "Gemini client not initialized. Please set global client first."})
+        try:
+            # Convert string times to datetime if provided
+            start_dt = None
+            end_dt = None
+            if start_time:
+                start_dt = datetime.fromisoformat(start_time)
+            if end_time:
+                end_dt = datetime.fromisoformat(end_time)
+            prices = _GLOBAL_GEMINI_CLIENT.get_historical_prices(
+                symbol=symbol,
+                start_time=start_dt,
+                end_time=end_dt,
+                interval=interval
+            )
+            return json.dumps(prices)
+        except Exception as e:
+            logging.error(f"Error in GeminiGetHistoricalPricesTool: {str(e)}")
+            return json.dumps({"error": str(e)})
+class IdentifyPatternsInput(BaseModel):
+    """Input for the identify_patterns tool."""
+    transactions_json: str = Field(..., description="JSON string of transactions")
+    n_clusters: int = Field(3, description="Number of clusters for K-Means")
+class DataProcessorIdentifyPatternsTool(BaseTool):
+    """Tool for identifying trading patterns using the DataProcessor."""
+    name = "data_processor_identify_patterns"
+    description = "Identify trading patterns in a set of transactions"
+    args_schema: Type[BaseModel] = IdentifyPatternsInput
+    def __init__(self, data_processor=None):
+        super().__init__()
+        # Store reference to processor if provided, otherwise we'll use global instance
+        if data_processor:
+            set_global_clients(data_processor=data_processor)
+    def _run(self, transactions_json: List[Dict[str, Any]], n_clusters: int = 3) -> str:
+        global _GLOBAL_DATA_PROCESSOR
+        if not _GLOBAL_DATA_PROCESSOR:
+            return json.dumps({"error": "Data processor not initialized. Please set global processor first."})
+        try:
+            # Convert JSON to DataFrame
+            transactions_df = pd.DataFrame(transactions_json)
+            # Ensure required columns exist
+            required_columns = ['timeStamp', 'hash', 'from', 'to', 'value', 'tokenSymbol']
+            for col in required_columns:
+                if col not in transactions_df.columns:
+                    return json.dumps({
+                        "error": f"Missing required column: {col}",
+                        "available_columns": list(transactions_df.columns)
+                    })
+            # Run pattern identification
+            patterns = _GLOBAL_DATA_PROCESSOR.identify_patterns(
+                transactions_df=transactions_df,
+                n_clusters=n_clusters
+            )
+            return json.dumps(patterns)
+        except Exception as e:
+            logging.error(f"Error in DataProcessorIdentifyPatternsTool: {str(e)}")
+            return json.dumps({"error": str(e)})
+class DetectAnomalousTransactionsInput(BaseModel):
+    """Input for the detect_anomalous_transactions tool."""
+    transactions_json: str = Field(..., description="JSON string of transactions")
+    sensitivity: str = Field("Medium", description="Detection sensitivity ('Low', 'Medium', 'High')")
+class DataProcessorDetectAnomalousTransactionsTool(BaseTool):
+    """Tool for detecting anomalous transactions using the DataProcessor."""
+    name = "data_processor_detect_anomalies"
+    description = "Detect anomalous transactions in a dataset"
+    args_schema: Type[BaseModel] = DetectAnomalousTransactionsInput
+    def __init__(self, data_processor=None):
+        super().__init__()
+        # Store reference to processor if provided, otherwise we'll use global instance
+        if data_processor:
+            set_global_clients(data_processor=data_processor)
+    def _run(self, transactions_json: List[Dict[str, Any]], sensitivity: str = "Medium") -> str:
+        global _GLOBAL_DATA_PROCESSOR
+        if not _GLOBAL_DATA_PROCESSOR:
+            return json.dumps({"error": "Data processor not initialized. Please set global processor first."})
+        try:
+            # Convert JSON to DataFrame
+            transactions_df = pd.DataFrame(transactions_json)
+            # Ensure required columns exist
+            required_columns = ['timeStamp', 'hash', 'from', 'to', 'value', 'tokenSymbol']
+            for col in required_columns:
+                if col not in transactions_df.columns:
+                    return json.dumps({
+                        "error": f"Missing required column: {col}",
+                        "available_columns": list(transactions_df.columns)
+                    })
+            # Run anomaly detection
+            anomalies = _GLOBAL_DATA_PROCESSOR.detect_anomalous_transactions(
+                transactions_df=transactions_df,
+                sensitivity=sensitivity
+            )
+            return json.dumps(anomalies)
+        except Exception as e:
+            logging.error(f"Error in DataProcessorDetectAnomalousTransactionsTool: {str(e)}")
+            return json.dumps({"error": str(e)})

modules/data_processor.py ADDED Viewed

	@@ -0,0 +1,1425 @@

+import pandas as pd
+import numpy as np
+from datetime import datetime, timedelta
+from typing import Dict, List, Optional, Union, Any, Tuple
+from sklearn.cluster import KMeans, DBSCAN
+from sklearn.preprocessing import StandardScaler
+import plotly.graph_objects as go
+import plotly.express as px
+import logging
+import time
+class DataProcessor:
+    """
+    Process and analyze transaction data from blockchain APIs
+    """
+    def __init__(self):
+        pass
+    def aggregate_transactions(self,
+                              transactions_df: pd.DataFrame,
+                              time_window: str = 'D') -> pd.DataFrame:
+        """
+        Aggregate transactions by time window
+        Args:
+            transactions_df: DataFrame of transactions
+            time_window: Time window for aggregation (e.g., 'D' for day, 'H' for hour)
+        Returns:
+            Aggregated DataFrame with transaction counts and volumes
+        """
+        if transactions_df.empty:
+            return pd.DataFrame()
+        # Ensure timestamp column is datetime
+        if 'Timestamp' in transactions_df.columns:
+            timestamp_col = 'Timestamp'
+        elif 'timeStamp' in transactions_df.columns:
+            timestamp_col = 'timeStamp'
+        else:
+            raise ValueError("Timestamp column not found in transactions DataFrame")
+        # Ensure amount column exists
+        if 'Amount' in transactions_df.columns:
+            amount_col = 'Amount'
+        elif 'tokenAmount' in transactions_df.columns:
+            amount_col = 'tokenAmount'
+        elif 'value' in transactions_df.columns:
+            # Try to adjust for decimals if 'tokenDecimal' exists
+            if 'tokenDecimal' in transactions_df.columns:
+                transactions_df['adjustedValue'] = transactions_df['value'].astype(float) / (10 ** transactions_df['tokenDecimal'].astype(int))
+                amount_col = 'adjustedValue'
+            else:
+                amount_col = 'value'
+        else:
+            raise ValueError("Amount column not found in transactions DataFrame")
+        # Resample by time window
+        transactions_df = transactions_df.copy()
+        try:
+            transactions_df.set_index(pd.DatetimeIndex(transactions_df[timestamp_col]), inplace=True)
+        except Exception as e:
+            print(f"Error setting DatetimeIndex: {str(e)}")
+            # Create a safe index as a fallback
+            transactions_df['safe_timestamp'] = pd.date_range(
+                start='2025-01-01',
+                periods=len(transactions_df),
+                freq='H'
+            )
+            transactions_df.set_index('safe_timestamp', inplace=True)
+        # Identify buy vs sell transactions based on 'from' and 'to' addresses
+        if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
+            from_col, to_col = 'From', 'To'
+        elif 'from' in transactions_df.columns and 'to' in transactions_df.columns:
+            from_col, to_col = 'from', 'to'
+        else:
+            # If we can't determine direction, just aggregate total volume
+            agg_df = transactions_df.resample(time_window).agg({
+                amount_col: 'sum',
+                timestamp_col: 'count'
+            })
+            agg_df.columns = ['Volume', 'Count']
+            return agg_df.reset_index()
+        # Calculate net flow for each wallet address (positive = inflow, negative = outflow)
+        wallet_addresses = set(transactions_df[from_col].unique()) | set(transactions_df[to_col].unique())
+        results = []
+        for wallet in wallet_addresses:
+            wallet_df = transactions_df.copy()
+            # Mark transactions as inflow or outflow
+            wallet_df['Direction'] = 'Unknown'
+            wallet_df.loc[wallet_df[to_col] == wallet, 'Direction'] = 'In'
+            wallet_df.loc[wallet_df[from_col] == wallet, 'Direction'] = 'Out'
+            # Calculate net flow
+            wallet_df['NetFlow'] = wallet_df[amount_col]
+            wallet_df.loc[wallet_df['Direction'] == 'Out', 'NetFlow'] = -wallet_df.loc[wallet_df['Direction'] == 'Out', amount_col]
+            # Aggregate by time window
+            wallet_agg = wallet_df.resample(time_window).agg({
+                'NetFlow': 'sum',
+                timestamp_col: 'count'
+            })
+            wallet_agg.columns = ['NetFlow', 'Count']
+            wallet_agg['Wallet'] = wallet
+            results.append(wallet_agg.reset_index())
+        if not results:
+            return pd.DataFrame()
+        combined_df = pd.concat(results, ignore_index=True)
+        return combined_df
+    # Cache for pattern identification to avoid repeating expensive calculations
+    _pattern_cache = {}
+    def identify_patterns(self,
+                          transactions_df: pd.DataFrame,
+                          n_clusters: int = 3) -> List[Dict[str, Any]]:
+        """
+        Identify trading patterns using clustering algorithms
+        Args:
+            transactions_df: DataFrame of transactions
+            n_clusters: Number of clusters to identify
+        Returns:
+            List of pattern dictionaries containing name, description, and confidence
+        """
+        # Check for empty data early to avoid processing
+        if transactions_df.empty:
+            return []
+        # Create a cache key based on DataFrame hash and number of clusters
+        try:
+            cache_key = f"{hash(tuple(transactions_df.columns))}_{len(transactions_df)}_{n_clusters}"
+            # Check cache first
+            if cache_key in self._pattern_cache:
+                return self._pattern_cache[cache_key]
+        except Exception:
+            # If hashing fails, proceed without caching
+            cache_key = None
+        try:
+            # Create a reference instead of a deep copy to improve memory usage
+            df = transactions_df
+            # Ensure timestamp column exists - optimize column presence checks
+            timestamp_cols = ['Timestamp', 'timeStamp']
+            timestamp_col = next((col for col in timestamp_cols if col in df.columns), None)
+            if timestamp_col:
+                # Convert timestamp only if needed
+                if not pd.api.types.is_datetime64_any_dtype(df[timestamp_col]):
+                    try:
+                        # Use vectorized operations instead of astype where possible
+                        if df[timestamp_col].dtype == 'object':
+                            df[timestamp_col] = pd.to_datetime(df[timestamp_col], errors='coerce')
+                        else:
+                            df[timestamp_col] = pd.to_datetime(df[timestamp_col], unit='s', errors='coerce')
+                    except Exception as e:
+                        # Create a date range index as fallback
+                        df['dummy_timestamp'] = pd.date_range(start='2025-01-01', periods=len(df), freq='H')
+                        timestamp_col = 'dummy_timestamp'
+            else:
+                # If no timestamp column, create a dummy index
+                df['dummy_timestamp'] = pd.date_range(start='2025-01-01', periods=len(df), freq='H')
+                timestamp_col = 'dummy_timestamp'
+            # Efficiently calculate floor hour using vectorized operations
+            df['hour'] = df[timestamp_col].dt.floor('H')
+            # Check for address columns efficiently
+            if 'From' in df.columns and 'To' in df.columns:
+                from_col, to_col = 'From', 'To'
+            elif 'from' in df.columns and 'to' in df.columns:
+                from_col, to_col = 'from', 'to'
+            else:
+                # Create dummy addresses only if necessary
+                df['from'] = [f'0x{i:040x}' for i in range(len(df))]
+                df['to'] = [f'0x{(i+1):040x}' for i in range(len(df))]
+                from_col, to_col = 'from', 'to'
+            # Efficiently determine amount column
+            amount_cols = ['Amount', 'tokenAmount', 'value', 'adjustedValue']
+            amount_col = next((col for col in amount_cols if col in df.columns), None)
+            if not amount_col:
+                # Handle special case for token values with decimals
+                if 'value' in df.columns and 'tokenDecimal' in df.columns:
+                    # Vectorized calculation for improved performance
+                    try:
+                        # Ensure values are numeric
+                        df['value_numeric'] = pd.to_numeric(df['value'], errors='coerce')
+                        df['tokenDecimal_numeric'] = pd.to_numeric(df['tokenDecimal'], errors='coerce').fillna(18)
+                        df['adjustedValue'] = df['value_numeric'] / (10 ** df['tokenDecimal_numeric'])
+                        amount_col = 'adjustedValue'
+                    except Exception as e:
+                        logging.warning(f"Error converting values: {e}")
+                        df['dummy_amount'] = 1.0
+                        amount_col = 'dummy_amount'
+                else:
+                    # Fallback to dummy values
+                    df['dummy_amount'] = 1.0
+                    amount_col = 'dummy_amount'
+            # Ensure the amount column is numeric
+            try:
+                if amount_col in df.columns:
+                    df[f"{amount_col}_numeric"] = pd.to_numeric(df[amount_col], errors='coerce').fillna(0)
+                    amount_col = f"{amount_col}_numeric"
+            except Exception:
+                # If conversion fails, create a dummy numeric column
+                df['safe_amount'] = 1.0
+                amount_col = 'safe_amount'
+            # Calculate metrics using optimized groupby operations
+            # Use a more efficient approach with built-in pandas aggregation
+            agg_df = df.groupby('hour').agg(
+                Count=pd.NamedAgg(column=from_col, aggfunc='count'),
+            ).reset_index()
+            # For NetFlow calculation, we need an additional pass
+            # This uses a more efficient calculation method
+            def calc_netflow(group):
+                # Use optimized filtering and calculations for better performance
+                first_to = group[to_col].iloc[0] if len(group) > 0 else None
+                first_from = group[from_col].iloc[0] if len(group) > 0 else None
+                if first_to is not None and first_from is not None:
+                    # Ensure values are converted to numeric before summing
+                    try:
+                        # Convert to numeric with pd.to_numeric, coerce errors to NaN
+                        total_in = pd.to_numeric(group.loc[group[to_col] == first_to, amount_col], errors='coerce').sum()
+                        total_out = pd.to_numeric(group.loc[group[from_col] == first_from, amount_col], errors='coerce').sum()
+                        # Replace NaN with 0 to avoid propagation
+                        if pd.isna(total_in): total_in = 0.0
+                        if pd.isna(total_out): total_out = 0.0
+                        return float(total_in) - float(total_out)
+                    except Exception as e:
+                        import logging
+                        logging.debug(f"Error converting values to numeric: {e}")
+                        return 0.0
+                return 0.0
+            # Calculate NetFlow using apply instead of loop
+            netflows = df.groupby('hour').apply(calc_netflow)
+            agg_df['NetFlow'] = netflows.values
+            # Early return if not enough data for clustering
+            if agg_df.empty or len(agg_df) < n_clusters:
+                return []
+            # Ensure we don't have too many clusters for the dataset
+            actual_n_clusters = min(n_clusters, max(2, len(agg_df) // 2))
+            # Prepare features for clustering - with careful type handling
+            try:
+                if 'NetFlow' in agg_df.columns:
+                    # Ensure NetFlow is numeric
+                    agg_df['NetFlow'] = pd.to_numeric(agg_df['NetFlow'], errors='coerce').fillna(0)
+                    features = agg_df[['NetFlow', 'Count']].copy()
+                    primary_metric = 'NetFlow'
+                else:
+                    # Calculate Volume if needed
+                    if 'Volume' not in agg_df.columns and amount_col in df.columns:
+                        # Calculate volume with numeric conversion
+                        volume_by_hour = pd.to_numeric(df[amount_col], errors='coerce').fillna(0).groupby(df['hour']).sum()
+                        agg_df['Volume'] = agg_df['hour'].map(volume_by_hour)
+                    # Ensure Volume exists and is numeric
+                    if 'Volume' not in agg_df.columns:
+                        agg_df['Volume'] = 1.0  # Default value if calculation failed
+                    else:
+                        agg_df['Volume'] = pd.to_numeric(agg_df['Volume'], errors='coerce').fillna(1.0)
+                    # Ensure Count is numeric
+                    agg_df['Count'] = pd.to_numeric(agg_df['Count'], errors='coerce').fillna(1.0)
+                    features = agg_df[['Volume', 'Count']].copy()
+                    primary_metric = 'Volume'
+                # Final check to ensure features are numeric
+                for col in features.columns:
+                    features[col] = pd.to_numeric(features[col], errors='coerce').fillna(0)
+            except Exception as e:
+                logging.warning(f"Error preparing clustering features: {e}")
+                # Create safe dummy features if everything else fails
+                agg_df['SafeFeature'] = 1.0
+                agg_df['Count'] = 1.0
+                features = agg_df[['SafeFeature', 'Count']].copy()
+                primary_metric = 'SafeFeature'
+            # Scale features - import only when needed for efficiency
+            from sklearn.preprocessing import StandardScaler
+            scaler = StandardScaler()
+            scaled_features = scaler.fit_transform(features)
+            # Use K-Means with reduced complexity
+            from sklearn.cluster import KMeans
+            kmeans = KMeans(n_clusters=actual_n_clusters, random_state=42, n_init=10, max_iter=100)
+            agg_df['Cluster'] = kmeans.fit_predict(scaled_features)
+            # Calculate time-based metrics from the hour column directly
+            if 'hour' in agg_df.columns:
+                try:
+                    # Convert to datetime for hour and day extraction if needed
+                    hour_series = pd.to_datetime(agg_df['hour'])
+                    agg_df['Hour'] = hour_series.dt.hour
+                    agg_df['Day'] = hour_series.dt.dayofweek
+                except Exception:
+                    # Fallback for non-convertible data
+                    agg_df['Hour'] = 0
+                    agg_df['Day'] = 0
+            else:
+                # Default values if no hour column
+                agg_df['Hour'] = 0
+                agg_df['Day'] = 0
+            # Identify patterns efficiently
+            patterns = []
+            for i in range(actual_n_clusters):
+                # Use boolean indexing for better performance
+                cluster_mask = agg_df['Cluster'] == i
+                cluster_df = agg_df[cluster_mask]
+                if len(cluster_df) == 0:
+                    continue
+                if primary_metric == 'NetFlow':
+                    # Use numpy methods for faster calculation
+                    avg_flow = cluster_df['NetFlow'].mean()
+                    flow_std = cluster_df['NetFlow'].std()
+                    behavior = "Accumulation" if avg_flow > 0 else "Distribution"
+                    volume_metric = f"Net Flow: {avg_flow:.2f} ± {flow_std:.2f}"
+                else:
+                    # Use Volume metrics - optimize to avoid redundant calculations
+                    avg_volume = cluster_df['Volume'].mean() if 'Volume' in cluster_df else 0
+                    volume_std = cluster_df['Volume'].std() if 'Volume' in cluster_df else 0
+                    behavior = "High Volume" if 'Volume' in agg_df and avg_volume > agg_df['Volume'].mean() else "Low Volume"
+                    volume_metric = f"Volume: {avg_volume:.2f} ± {volume_std:.2f}"
+                # Pattern characteristics
+                pattern_metrics = {
+                    "avg_flow": avg_flow,
+                    "flow_std": flow_std,
+                    "avg_count": cluster_df['Count'].mean(),
+                    "max_flow": cluster_df['NetFlow'].max(),
+                    "min_flow": cluster_df['NetFlow'].min(),
+                    "common_hour": cluster_df['Hour'].mode()[0] if not cluster_df['Hour'].empty else None,
+                    "common_day": cluster_df['Day'].mode()[0] if not cluster_df['Day'].empty else None
+                }
+                # Enhanced confidence calculation
+                if primary_metric == 'NetFlow':
+                    # Calculate within-cluster variance as a percentage of total variance
+                    cluster_variance = cluster_df['NetFlow'].var()
+                    total_variance = agg_df['NetFlow'].var() or 1  # Avoid division by zero
+                    confidence = max(0.4, min(0.95, 1 - (cluster_variance / total_variance)))
+                else:
+                    # Calculate within-cluster variance as a percentage of total variance
+                    cluster_variance = cluster_df['Volume'].var()
+                    total_variance = agg_df['Volume'].var() or 1  # Avoid division by zero
+                    confidence = max(0.4, min(0.95, 1 - (cluster_variance / total_variance)))
+                # Create enhanced pattern charts - Main Chart
+                if primary_metric == 'NetFlow':
+                    main_fig = px.scatter(cluster_df, x=cluster_df.index, y='NetFlow',
+                                    size='Count', color='Cluster',
+                                    title=f"Pattern {i+1}: {behavior}",
+                                    labels={'NetFlow': 'Net Token Flow', 'index': 'Time'},
+                                    color_discrete_sequence=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd'])
+                    # Add a trend line
+                    main_fig.add_trace(go.Scatter(
+                        x=cluster_df.index,
+                        y=cluster_df['NetFlow'].rolling(window=3, min_periods=1).mean(),
+                        mode='lines',
+                        name='Trend',
+                        line=dict(width=2, dash='dash', color='rgba(0,0,0,0.5)')
+                    ))
+                    # Add a zero reference line
+                    main_fig.add_shape(
+                        type="line",
+                        x0=cluster_df.index.min(),
+                        y0=0,
+                        x1=cluster_df.index.max(),
+                        y1=0,
+                        line=dict(color="red", width=1, dash="dot"),
+                    )
+                else:
+                    main_fig = px.scatter(cluster_df, x=cluster_df.index, y='Volume',
+                                    size='Count', color='Cluster',
+                                    title=f"Pattern {i+1}: {behavior}",
+                                    labels={'Volume': 'Transaction Volume', 'index': 'Time'},
+                                    color_discrete_sequence=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd'])
+                    # Add a trend line
+                    main_fig.add_trace(go.Scatter(
+                        x=cluster_df.index,
+                        y=cluster_df['Volume'].rolling(window=3, min_periods=1).mean(),
+                        mode='lines',
+                        name='Trend',
+                        line=dict(width=2, dash='dash', color='rgba(0,0,0,0.5)')
+                    ))
+                main_fig.update_layout(
+                    template="plotly_white",
+                    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1),
+                    margin=dict(l=20, r=20, t=50, b=20),
+                    height=400
+                )
+                # Create hourly distribution chart
+                hour_counts = cluster_df.groupby('Hour')['Count'].sum().reindex(range(24), fill_value=0)
+                hour_fig = px.bar(x=hour_counts.index, y=hour_counts.values,
+                                title="Hourly Distribution",
+                                labels={'x': 'Hour of Day', 'y': 'Transaction Count'},
+                                color_discrete_sequence=['#1f77b4'])
+                hour_fig.update_layout(template="plotly_white", height=300)
+                # Create volume/flow distribution chart
+                if primary_metric == 'NetFlow':
+                    hist_data = cluster_df['NetFlow']
+                    hist_title = "Net Flow Distribution"
+                    hist_label = "Net Flow"
+                else:
+                    hist_data = cluster_df['Volume']
+                    hist_title = "Volume Distribution"
+                    hist_label = "Volume"
+                dist_fig = px.histogram(hist_data,
+                                      title=hist_title,
+                                      labels={'value': hist_label, 'count': 'Frequency'},
+                                      color_discrete_sequence=['#2ca02c'])
+                dist_fig.update_layout(template="plotly_white", height=300)
+                # Find related transactions
+                if not transactions_df.empty:
+                    # Get timestamps from this cluster
+                    cluster_times = pd.to_datetime(cluster_df.index)
+                    # Create time windows for matching
+                    time_windows = [(t - pd.Timedelta(hours=1), t + pd.Timedelta(hours=1)) for t in cluster_times]
+                    # Find transactions within these time windows
+                    pattern_txs = transactions_df[transactions_df[timestamp_col].apply(
+                        lambda x: any((start <= x <= end) for start, end in time_windows)
+                    )].copy()
+                    # If we have too many, sample them
+                    if len(pattern_txs) > 10:
+                        pattern_txs = pattern_txs.sample(10)
+                    # If we have too few, just sample from all transactions
+                    if len(pattern_txs) < 5 and len(transactions_df) >= 5:
+                        pattern_txs = transactions_df.sample(min(5, len(transactions_df)))
+                else:
+                    pattern_txs = pd.DataFrame()
+                # Comprehensive pattern dictionary
+                pattern = {
+                    "name": behavior,
+                    "description": f"This pattern shows {behavior.lower()} activity.",
+                    "strategy": "Unknown",
+                    "risk_profile": "Unknown",
+                    "time_insight": "Unknown",
+                    "cluster_id": i,
+                    "metrics": pattern_metrics,
+                    "occurrence_count": len(cluster_df),
+                    "volume_metric": volume_metric,
+                    "confidence": confidence,
+                    "impact": 0.0,
+                    "charts": {
+                        "main": main_fig,
+                        "hourly_distribution": hour_fig,
+                        "value_distribution": dist_fig
+                    },
+                    "examples": pattern_txs
+                }
+                patterns.append(pattern)
+            # Cache results for future reuse
+            if cache_key:
+                self._pattern_cache[cache_key] = patterns
+            return patterns
+        except Exception as e:
+            import logging
+            logging.warning(f"Error during pattern identification: {str(e)}")
+            return []
+    # Create enhanced pattern detection method with visualization capabilities
+            if primary_metric == 'NetFlow':
+                main_fig = px.scatter(cluster_df, x=cluster_df.index, y='NetFlow',
+                                size='Count', color='Cluster',
+                                title=f"Pattern {i+1}: {behavior}",
+                                labels={'NetFlow': 'Net Token Flow', 'index': 'Time'},
+                                color_discrete_sequence=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd'])
+                # Add a trend line
+                main_fig.add_trace(go.Scatter(
+                    x=cluster_df.index,
+                    y=cluster_df['NetFlow'].rolling(window=3, min_periods=1).mean(),
+                    mode='lines',
+                    name='Trend',
+                    line=dict(width=2, dash='dash', color='rgba(0,0,0,0.5)')
+                ))
+                # Add a zero reference line
+                main_fig.add_shape(
+                    type="line",
+                    x0=cluster_df.index.min(),
+                    y0=0,
+                    x1=cluster_df.index.max(),
+                    y1=0,
+                    line=dict(color="red", width=1, dash="dot"),
+                )
+            else:
+                main_fig = px.scatter(cluster_df, x=cluster_df.index, y='Volume',
+                                size='Count', color='Cluster',
+                                title=f"Pattern {i+1}: {behavior}",
+                                labels={'Volume': 'Transaction Volume', 'index': 'Time'},
+                                color_discrete_sequence=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd'])
+                # Add a trend line
+                main_fig.add_trace(go.Scatter(
+                    x=cluster_df.index,
+                    y=cluster_df['Volume'].rolling(window=3, min_periods=1).mean(),
+                    mode='lines',
+                    name='Trend',
+                    line=dict(width=2, dash='dash', color='rgba(0,0,0,0.5)')
+                ))
+            main_fig.update_layout(
+                template="plotly_white",
+                legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1),
+                margin=dict(l=20, r=20, t=50, b=20),
+                height=400
+            )
+            # Create hourly distribution chart
+            hour_counts = cluster_df.groupby('Hour')['Count'].sum().reindex(range(24), fill_value=0)
+            hour_fig = px.bar(x=hour_counts.index, y=hour_counts.values,
+                            title="Hourly Distribution",
+                            labels={'x': 'Hour of Day', 'y': 'Transaction Count'},
+                            color_discrete_sequence=['#1f77b4'])
+            hour_fig.update_layout(template="plotly_white", height=300)
+            # Create volume/flow distribution chart
+            if primary_metric == 'NetFlow':
+                hist_data = cluster_df['NetFlow']
+                hist_title = "Net Flow Distribution"
+                hist_label = "Net Flow"
+            else:
+                hist_data = cluster_df['Volume']
+                hist_title = "Volume Distribution"
+                hist_label = "Volume"
+            dist_fig = px.histogram(hist_data,
+                                  title=hist_title,
+                                  labels={'value': hist_label, 'count': 'Frequency'},
+                                  color_discrete_sequence=['#2ca02c'])
+            dist_fig.update_layout(template="plotly_white", height=300)
+            # Find related transactions
+            if not transactions_df.empty:
+                # Get timestamps from this cluster
+                cluster_times = pd.to_datetime(cluster_df.index)
+                # Create time windows for matching
+                time_windows = [(t - pd.Timedelta(hours=1), t + pd.Timedelta(hours=1)) for t in cluster_times]
+                # Find transactions within these time windows
+                pattern_txs = transactions_df[transactions_df[timestamp_col].apply(
+                    lambda x: any((start <= x <= end) for start, end in time_windows)
+                )].copy()
+                # If we have too many, sample them
+                if len(pattern_txs) > 10:
+                    pattern_txs = pattern_txs.sample(10)
+                # If we have too few, just sample from all transactions
+                if len(pattern_txs) < 5 and len(transactions_df) >= 5:
+                    pattern_txs = transactions_df.sample(min(5, len(transactions_df)))
+            else:
+                pattern_txs = pd.DataFrame()
+            # Comprehensive pattern dictionary
+            pattern = {
+                "name": behavior,
+                "description": description,
+                "strategy": strategy,
+                "risk_profile": risk_profile,
+                "time_insight": time_insight,
+                "cluster_id": i,
+                "metrics": pattern_metrics,
+                "occurrence_count": len(cluster_df),
+                "volume_metric": volume_metric,
+                "confidence": confidence,
+                "charts": {
+                    "main": main_fig,
+                    "hourly_distribution": hour_fig,
+                    "value_distribution": dist_fig
+                },
+                "examples": pattern_txs
+            }
+            patterns.append(pattern)
+        return patterns
+    def detect_anomalous_transactions(self,
+                                     transactions_df: pd.DataFrame,
+                                     sensitivity: str = "Medium") -> pd.DataFrame:
+        """
+        Detect anomalous transactions using statistical methods
+        Args:
+            transactions_df: DataFrame of transactions
+            sensitivity: Detection sensitivity ("Low", "Medium", "High")
+        Returns:
+            DataFrame of anomalous transactions
+        """
+        if transactions_df.empty:
+            return pd.DataFrame()
+        # Ensure amount column exists
+        if 'Amount' in transactions_df.columns:
+            amount_col = 'Amount'
+        elif 'tokenAmount' in transactions_df.columns:
+            amount_col = 'tokenAmount'
+        elif 'value' in transactions_df.columns:
+            # Try to adjust for decimals if 'tokenDecimal' exists
+            if 'tokenDecimal' in transactions_df.columns:
+                transactions_df['adjustedValue'] = transactions_df['value'].astype(float) / (10 ** transactions_df['tokenDecimal'].astype(int))
+                amount_col = 'adjustedValue'
+            else:
+                amount_col = 'value'
+        else:
+            raise ValueError("Amount column not found in transactions DataFrame")
+        # Define sensitivity thresholds
+        if sensitivity == "Low":
+            z_threshold = 3.0  # Outliers beyond 3 standard deviations
+        elif sensitivity == "Medium":
+            z_threshold = 2.5  # Outliers beyond 2.5 standard deviations
+        else:  # High
+            z_threshold = 2.0  # Outliers beyond 2 standard deviations
+        # Calculate z-score for amount
+        mean_amount = transactions_df[amount_col].mean()
+        std_amount = transactions_df[amount_col].std()
+        if std_amount == 0:
+            return pd.DataFrame()
+        transactions_df['z_score'] = abs((transactions_df[amount_col] - mean_amount) / std_amount)
+        # Flag anomalous transactions
+        anomalies = transactions_df[transactions_df['z_score'] > z_threshold].copy()
+        # Add risk level based on z-score
+        anomalies['risk_level'] = 'Medium'
+        anomalies.loc[anomalies['z_score'] > z_threshold * 1.5, 'risk_level'] = 'High'
+        anomalies.loc[anomalies['z_score'] <= z_threshold * 1.2, 'risk_level'] = 'Low'
+        return anomalies
+    def analyze_price_impact(self,
+                             transactions_df: pd.DataFrame,
+                             price_data: Dict[str, Dict[str, Any]]) -> Dict[str, Any]:
+        """
+        Analyze the price impact of transactions with enhanced visualizations
+        Args:
+            transactions_df: DataFrame of transactions
+            price_data: Dictionary of price impact data for each transaction
+        Returns:
+            Dictionary with comprehensive price impact analysis and visualizations
+        """
+        if transactions_df.empty or not price_data:
+            # Create an empty chart for the default case
+            empty_fig = go.Figure()
+            empty_fig.update_layout(
+                title="No Price Impact Data Available",
+                xaxis_title="Time",
+                yaxis_title="Price Impact (%)",
+                height=400,
+                template="plotly_white"
+            )
+            empty_fig.add_annotation(
+                text="No transactions found with price impact data",
+                showarrow=False,
+                font=dict(size=14)
+            )
+            return {
+                'avg_impact_pct': 0,
+                'max_impact_pct': 0,
+                'min_impact_pct': 0,
+                'significant_moves_count': 0,
+                'total_transactions': 0,
+                'charts': {
+                    'main_chart': empty_fig,
+                    'impact_distribution': empty_fig,
+                    'cumulative_impact': empty_fig,
+                    'hourly_impact': empty_fig
+                },
+                'transactions_with_impact': pd.DataFrame(),
+                'insights': [],
+                'impact_summary': "No price impact data available"
+            }
+        # Ensure timestamp column is datetime
+        if 'Timestamp' in transactions_df.columns:
+            timestamp_col = 'Timestamp'
+        elif 'timeStamp' in transactions_df.columns:
+            timestamp_col = 'timeStamp'
+            # Convert timestamp to datetime if it's not already
+            if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
+                transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col], unit='s')
+        else:
+            raise ValueError("Timestamp column not found in transactions DataFrame")
+        # Combine price impact data with transactions
+        impact_data = []
+        for idx, row in transactions_df.iterrows():
+            tx_hash = row.get('Transaction Hash', row.get('hash', None))
+            if not tx_hash or tx_hash not in price_data:
+                continue
+            tx_impact = price_data[tx_hash]
+            if tx_impact['impact_pct'] is None:
+                continue
+            # Get token symbol if available
+            token_symbol = row.get('tokenSymbol', 'Unknown')
+            token_amount = row.get('value', 0)
+            if 'tokenDecimal' in row:
+                try:
+                    token_amount = float(token_amount) / (10 ** int(row.get('tokenDecimal', 0)))
+                except (ValueError, TypeError):
+                    token_amount = 0
+            impact_data.append({
+                'transaction_hash': tx_hash,
+                'timestamp': row[timestamp_col],
+                'pre_price': tx_impact['pre_price'],
+                'post_price': tx_impact['post_price'],
+                'impact_pct': tx_impact['impact_pct'],
+                'token_symbol': token_symbol,
+                'token_amount': token_amount,
+                'from': row.get('from', ''),
+                'to': row.get('to', ''),
+                'hour': row[timestamp_col].hour if isinstance(row[timestamp_col], pd.Timestamp) else 0
+            })
+        if not impact_data:
+            # Create an empty chart for the default case
+            empty_fig = go.Figure()
+            empty_fig.update_layout(
+                title="No Price Impact Data Available",
+                xaxis_title="Time",
+                yaxis_title="Price Impact (%)",
+                height=400,
+                template="plotly_white"
+            )
+            empty_fig.add_annotation(
+                text="No transactions found with price impact data",
+                showarrow=False,
+                font=dict(size=14)
+            )
+            return {
+                'avg_impact_pct': 0,
+                'max_impact_pct': 0,
+                'min_impact_pct': 0,
+                'significant_moves_count': 0,
+                'total_transactions': len(transactions_df) if not transactions_df.empty else 0,
+                'charts': {
+                    'main_chart': empty_fig,
+                    'impact_distribution': empty_fig,
+                    'cumulative_impact': empty_fig,
+                    'hourly_impact': empty_fig
+                },
+                'transactions_with_impact': pd.DataFrame(),
+                'insights': [],
+                'impact_summary': "No price impact data available"
+            }
+        impact_df = pd.DataFrame(impact_data)
+        # Calculate aggregate metrics
+        avg_impact = impact_df['impact_pct'].mean()
+        max_impact = impact_df['impact_pct'].max()
+        min_impact = impact_df['impact_pct'].min()
+        median_impact = impact_df['impact_pct'].median()
+        std_impact = impact_df['impact_pct'].std()
+        # Count significant moves (>1% impact)
+        significant_threshold = 1.0
+        high_impact_threshold = 3.0
+        significant_moves = len(impact_df[abs(impact_df['impact_pct']) > significant_threshold])
+        high_impact_moves = len(impact_df[abs(impact_df['impact_pct']) > high_impact_threshold])
+        positive_impacts = len(impact_df[impact_df['impact_pct'] > 0])
+        negative_impacts = len(impact_df[impact_df['impact_pct'] < 0])
+        # Calculate cumulative impact
+        impact_df = impact_df.sort_values('timestamp')
+        impact_df['cumulative_impact'] = impact_df['impact_pct'].cumsum()
+        # Generate insights
+        insights = []
+        # Market direction bias
+        if avg_impact > 0.5:
+            insights.append({
+                "title": "Positive Price Pressure",
+                "description": f"Transactions show an overall positive price impact of {avg_impact:.2f}%, suggesting accumulation or market strength."
+            })
+        elif avg_impact < -0.5:
+            insights.append({
+                "title": "Negative Price Pressure",
+                "description": f"Transactions show an overall negative price impact of {avg_impact:.2f}%, suggesting distribution or market weakness."
+            })
+        # Volatility analysis
+        if std_impact > 2.0:
+            insights.append({
+                "title": "High Market Volatility",
+                "description": f"Price impact shows high volatility (std: {std_impact:.2f}%), indicating potential market manipulation or whipsaw conditions."
+            })
+        # Significant impacts
+        if high_impact_moves > 0:
+            insights.append({
+                "title": "High Impact Transactions",
+                "description": f"Detected {high_impact_moves} high-impact transactions (>{high_impact_threshold}% price change), indicating potential market-moving activity."
+            })
+        # Temporal patterns
+        hourly_impact = impact_df.groupby('hour')['impact_pct'].mean()
+        if len(hourly_impact) > 0:
+            max_hour = hourly_impact.abs().idxmax()
+            max_hour_impact = hourly_impact[max_hour]
+            insights.append({
+                "title": "Time-Based Pattern",
+                "description": f"Highest price impact occurs around {max_hour}:00 with an average of {max_hour_impact:.2f}%."
+            })
+        # Create impact summary text
+        impact_summary = f"Analysis of {len(impact_df)} price-impacting transactions shows an average impact of {avg_impact:.2f}% "
+        impact_summary += f"(range: {min_impact:.2f}% to {max_impact:.2f}%). "
+        impact_summary += f"Found {significant_moves} significant price moves and {high_impact_moves} high-impact transactions. "
+        if positive_impacts > negative_impacts:
+            impact_summary += f"There is a bias towards positive price impact ({positive_impacts} positive vs {negative_impacts} negative)."
+        elif negative_impacts > positive_impacts:
+            impact_summary += f"There is a bias towards negative price impact ({negative_impacts} negative vs {positive_impacts} positive)."
+        else:
+            impact_summary += "The price impact is balanced between positive and negative moves."
+        # Create enhanced main visualization
+        main_fig = go.Figure()
+        # Add scatter plot for impact
+        main_fig.add_trace(go.Scatter(
+            x=impact_df['timestamp'],
+            y=impact_df['impact_pct'],
+            mode='markers+lines',
+            marker=dict(
+                size=impact_df['impact_pct'].abs() * 1.5 + 5,
+                color=impact_df['impact_pct'],
+                colorscale='RdBu_r',
+                line=dict(width=1),
+                symbol=['circle' if val >= 0 else 'diamond' for val in impact_df['impact_pct']]
+            ),
+            text=[
+                f"TX: {tx[:8]}...{tx[-6:]}<br>" +
+                f"Impact: {impact:.2f}%<br>" +
+                f"Token: {token} ({amount:.4f})<br>" +
+                f"From: {src[:6]}...{src[-4:]}<br>" +
+                f"To: {dst[:6]}...{dst[-4:]}"
+                for tx, impact, token, amount, src, dst in zip(
+                    impact_df['transaction_hash'],
+                    impact_df['impact_pct'],
+                    impact_df['token_symbol'],
+                    impact_df['token_amount'],
+                    impact_df['from'],
+                    impact_df['to']
+                )
+            ],
+            hovertemplate='%{text}<br>Time: %{x}<extra></extra>',
+            name='Price Impact'
+        ))
+        # Add a moving average trendline
+        window_size = max(3, len(impact_df) // 10)  # Dynamic window size
+        if len(impact_df) >= window_size:
+            impact_df['ma'] = impact_df['impact_pct'].rolling(window=window_size, min_periods=1).mean()
+            main_fig.add_trace(go.Scatter(
+                x=impact_df['timestamp'],
+                y=impact_df['ma'],
+                mode='lines',
+                line=dict(width=2, color='rgba(255,165,0,0.7)'),
+                name=f'Moving Avg ({window_size} period)'
+            ))
+        # Add a zero line for reference
+        main_fig.add_shape(
+            type='line',
+            x0=impact_df['timestamp'].min(),
+            y0=0,
+            x1=impact_df['timestamp'].max(),
+            y1=0,
+            line=dict(color='gray', width=1, dash='dash')
+        )
+        # Add colored regions for significant impact
+        # Add green band for normal price movement
+        main_fig.add_shape(
+            type='rect',
+            x0=impact_df['timestamp'].min(),
+            y0=-significant_threshold,
+            x1=impact_df['timestamp'].max(),
+            y1=significant_threshold,
+            fillcolor='rgba(0,255,0,0.1)',
+            line=dict(width=0),
+            layer='below'
+        )
+        # Add warning bands for higher impact movements
+        main_fig.add_shape(
+            type='rect',
+            x0=impact_df['timestamp'].min(),
+            y0=significant_threshold,
+            x1=impact_df['timestamp'].max(),
+            y1=high_impact_threshold,
+            fillcolor='rgba(255,255,0,0.1)',
+            line=dict(width=0),
+            layer='below'
+        )
+        main_fig.add_shape(
+            type='rect',
+            x0=impact_df['timestamp'].min(),
+            y0=-high_impact_threshold,
+            x1=impact_df['timestamp'].max(),
+            y1=-significant_threshold,
+            fillcolor='rgba(255,255,0,0.1)',
+            line=dict(width=0),
+            layer='below'
+        )
+        # Add high impact regions
+        main_fig.add_shape(
+            type='rect',
+            x0=impact_df['timestamp'].min(),
+            y0=high_impact_threshold,
+            x1=impact_df['timestamp'].max(),
+            y1=max(high_impact_threshold * 2, max_impact * 1.1),
+            fillcolor='rgba(255,0,0,0.1)',
+            line=dict(width=0),
+            layer='below'
+        )
+        main_fig.add_shape(
+            type='rect',
+            x0=impact_df['timestamp'].min(),
+            y0=min(high_impact_threshold * -2, min_impact * 1.1),
+            x1=impact_df['timestamp'].max(),
+            y1=-high_impact_threshold,
+            fillcolor='rgba(255,0,0,0.1)',
+            line=dict(width=0),
+            layer='below'
+        )
+        main_fig.update_layout(
+            title='Price Impact of Whale Transactions',
+            xaxis_title='Timestamp',
+            yaxis_title='Price Impact (%)',
+            hovermode='closest',
+            template="plotly_white",
+            legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1),
+            margin=dict(l=20, r=20, t=50, b=20)
+        )
+        # Create impact distribution histogram
+        dist_fig = px.histogram(
+            impact_df['impact_pct'],
+            nbins=20,
+            labels={'value': 'Price Impact (%)', 'count': 'Frequency'},
+            title='Distribution of Price Impact',
+            color_discrete_sequence=['#3366CC']
+        )
+        # Add a vertical line at the mean
+        dist_fig.add_vline(x=avg_impact, line_dash="dash", line_color="red")
+        dist_fig.add_annotation(x=avg_impact, y=0.85, yref="paper", text=f"Mean: {avg_impact:.2f}%",
+                              showarrow=True, arrowhead=2, arrowcolor="red", ax=40)
+        # Add a vertical line at zero
+        dist_fig.add_vline(x=0, line_dash="solid", line_color="black")
+        dist_fig.update_layout(
+            template="plotly_white",
+            bargap=0.1,
+            height=350
+        )
+        # Create cumulative impact chart
+        cumul_fig = go.Figure()
+        cumul_fig.add_trace(go.Scatter(
+            x=impact_df['timestamp'],
+            y=impact_df['cumulative_impact'],
+            mode='lines',
+            fill='tozeroy',
+            line=dict(width=2, color='#2ca02c'),
+            name='Cumulative Impact'
+        ))
+        cumul_fig.update_layout(
+            title='Cumulative Price Impact Over Time',
+            xaxis_title='Timestamp',
+            yaxis_title='Cumulative Price Impact (%)',
+            template="plotly_white",
+            height=350
+        )
+        # Create hourly impact analysis
+        hourly_impact = impact_df.groupby('hour')['impact_pct'].agg(['mean', 'count', 'std']).reset_index()
+        hourly_impact = hourly_impact.sort_values('hour')
+        hour_fig = go.Figure()
+        hour_fig.add_trace(go.Bar(
+            x=hourly_impact['hour'],
+            y=hourly_impact['mean'],
+            error_y=dict(type='data', array=hourly_impact['std'], visible=True),
+            marker_color=hourly_impact['mean'].apply(lambda x: 'green' if x > 0 else 'red'),
+            name='Average Impact'
+        ))
+        hour_fig.update_layout(
+            title='Price Impact by Hour of Day',
+            xaxis_title='Hour of Day',
+            yaxis_title='Average Price Impact (%)',
+            template="plotly_white",
+            height=350,
+            xaxis=dict(tickmode='linear', tick0=0, dtick=2)
+        )
+        # Join with original transactions
+        transactions_df = transactions_df.copy()
+        transactions_df['Timestamp_key'] = transactions_df[timestamp_col]
+        impact_df['Timestamp_key'] = impact_df['timestamp']
+        merged_df = pd.merge(
+            transactions_df,
+            impact_df[['Timestamp_key', 'impact_pct', 'pre_price', 'post_price', 'cumulative_impact']],
+            on='Timestamp_key',
+            how='left'
+        )
+        # Final result with enhanced output
+        return {
+            'avg_impact_pct': avg_impact,
+            'max_impact_pct': max_impact,
+            'min_impact_pct': min_impact,
+            'median_impact_pct': median_impact,
+            'std_impact_pct': std_impact,
+            'significant_moves_count': significant_moves,
+            'high_impact_moves_count': high_impact_moves,
+            'positive_impacts_count': positive_impacts,
+            'negative_impacts_count': negative_impacts,
+            'total_transactions': len(transactions_df),
+            'charts': {
+                'main_chart': main_fig,
+                'impact_distribution': dist_fig,
+                'cumulative_impact': cumul_fig,
+                'hourly_impact': hour_fig
+            },
+            'transactions_with_impact': merged_df,
+            'insights': insights,
+            'impact_summary': impact_summary
+        }
+    def detect_wash_trading(self,
+                           transactions_df: pd.DataFrame,
+                           addresses: List[str],
+                           time_window_minutes: int = 60,
+                           sensitivity: str = "Medium") -> List[Dict[str, Any]]:
+        """
+        Detect potential wash trading between addresses
+        Args:
+            transactions_df: DataFrame of transactions
+            addresses: List of addresses to analyze
+            time_window_minutes: Time window for detecting wash trades
+            sensitivity: Detection sensitivity ("Low", "Medium", "High")
+        Returns:
+            List of potential wash trading incidents
+        """
+        if transactions_df.empty or not addresses:
+            return []
+        # Ensure from/to columns exist
+        if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
+            from_col, to_col = 'From', 'To'
+        elif 'from' in transactions_df.columns and 'to' in transactions_df.columns:
+            from_col, to_col = 'from', 'to'
+        else:
+            raise ValueError("From/To columns not found in transactions DataFrame")
+        # Ensure timestamp column exists
+        if 'Timestamp' in transactions_df.columns:
+            timestamp_col = 'Timestamp'
+        elif 'timeStamp' in transactions_df.columns:
+            timestamp_col = 'timeStamp'
+        else:
+            raise ValueError("Timestamp column not found in transactions DataFrame")
+        # Ensure timestamp is datetime
+        if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
+            transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col])
+        # Define sensitivity thresholds
+        if sensitivity == "Low":
+            min_cycles = 3  # Minimum number of back-and-forth transactions
+            max_time_diff = 120  # Maximum minutes between transactions
+        elif sensitivity == "Medium":
+            min_cycles = 2
+            max_time_diff = 60
+        else:  # High
+            min_cycles = 1
+            max_time_diff = 30
+        # Filter transactions involving the addresses
+        address_txs = transactions_df[
+            (transactions_df[from_col].isin(addresses)) |
+            (transactions_df[to_col].isin(addresses))
+        ].copy()
+        if address_txs.empty:
+            return []
+        # Sort by timestamp
+        address_txs = address_txs.sort_values(by=timestamp_col)
+        # Detect cycles of transactions between same addresses
+        wash_trades = []
+        for addr1 in addresses:
+            for addr2 in addresses:
+                if addr1 == addr2:
+                    continue
+                # Find transactions from addr1 to addr2
+                a1_to_a2 = address_txs[
+                    (address_txs[from_col] == addr1) &
+                    (address_txs[to_col] == addr2)
+                ]
+                # Find transactions from addr2 to addr1
+                a2_to_a1 = address_txs[
+                    (address_txs[from_col] == addr2) &
+                    (address_txs[to_col] == addr1)
+                ]
+                if a1_to_a2.empty or a2_to_a1.empty:
+                    continue
+                # Check for back-and-forth patterns
+                cycles = 0
+                evidence = []
+                for _, tx1 in a1_to_a2.iterrows():
+                    tx1_time = tx1[timestamp_col]
+                    # Find return transactions within the time window
+                    return_txs = a2_to_a1[
+                        (a2_to_a1[timestamp_col] > tx1_time) &
+                        (a2_to_a1[timestamp_col] <= tx1_time + pd.Timedelta(minutes=max_time_diff))
+                    ]
+                    if not return_txs.empty:
+                        cycles += 1
+                        evidence.append(tx1)
+                        evidence.append(return_txs.iloc[0])
+                if cycles >= min_cycles:
+                    # Create visualization
+                    if evidence:
+                        evidence_df = pd.DataFrame(evidence)
+                        fig = px.scatter(
+                            evidence_df,
+                            x=timestamp_col,
+                            y=evidence_df.get('Amount', evidence_df.get('tokenAmount', evidence_df.get('value', 0))),
+                            color=from_col,
+                            title=f"Potential Wash Trading Between {addr1[:8]}... and {addr2[:8]}..."
+                        )
+                    else:
+                        fig = None
+                    wash_trades.append({
+                        "type": "Wash Trading",
+                        "addresses": [addr1, addr2],
+                        "risk_level": "High" if cycles >= min_cycles * 2 else "Medium",
+                        "description": f"Detected {cycles} cycles of back-and-forth transactions between addresses",
+                        "detection_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+                        "title": f"Wash Trading Pattern ({cycles} cycles)",
+                        "evidence": pd.DataFrame(evidence) if evidence else None,
+                        "chart": fig
+                    })
+        return wash_trades
+    def detect_pump_and_dump(self,
+                            transactions_df: pd.DataFrame,
+                            price_data: Dict[str, Dict[str, Any]],
+                            sensitivity: str = "Medium") -> List[Dict[str, Any]]:
+        """
+        Detect potential pump and dump schemes
+        Args:
+            transactions_df: DataFrame of transactions
+            price_data: Dictionary of price impact data for each transaction
+            sensitivity: Detection sensitivity ("Low", "Medium", "High")
+        Returns:
+            List of potential pump and dump incidents
+        """
+        if transactions_df.empty or not price_data:
+            return []
+        # Ensure timestamp column exists
+        if 'Timestamp' in transactions_df.columns:
+            timestamp_col = 'Timestamp'
+        elif 'timeStamp' in transactions_df.columns:
+            timestamp_col = 'timeStamp'
+        else:
+            raise ValueError("Timestamp column not found in transactions DataFrame")
+        # Ensure from/to columns exist
+        if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
+            from_col, to_col = 'From', 'To'
+        elif 'from' in transactions_df.columns and 'to' in transactions_df.columns:
+            from_col, to_col = 'from', 'to'
+        else:
+            raise ValueError("From/To columns not found in transactions DataFrame")
+        # Ensure timestamp is datetime
+        if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
+            transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col])
+        # Define sensitivity thresholds
+        if sensitivity == "Low":
+            accumulation_threshold = 5  # Number of buys to consider accumulation
+            pump_threshold = 10.0  # % price increase to trigger pump
+            dump_threshold = -8.0  # % price decrease to trigger dump
+        elif sensitivity == "Medium":
+            accumulation_threshold = 3
+            pump_threshold = 7.0
+            dump_threshold = -5.0
+        else:  # High
+            accumulation_threshold = 2
+            pump_threshold = 5.0
+            dump_threshold = -3.0
+        # Combine price impact data with transactions
+        txs_with_impact = []
+        for idx, row in transactions_df.iterrows():
+            tx_hash = row.get('Transaction Hash', row.get('hash', None))
+            if not tx_hash or tx_hash not in price_data:
+                continue
+            tx_impact = price_data[tx_hash]
+            if tx_impact['impact_pct'] is None:
+                continue
+            txs_with_impact.append({
+                'transaction_hash': tx_hash,
+                'timestamp': row[timestamp_col],
+                'from': row[from_col],
+                'to': row[to_col],
+                'pre_price': tx_impact['pre_price'],
+                'post_price': tx_impact['post_price'],
+                'impact_pct': tx_impact['impact_pct']
+            })
+        if not txs_with_impact:
+            return []
+        impact_df = pd.DataFrame(txs_with_impact)
+        impact_df = impact_df.sort_values(by='timestamp')
+        # Look for accumulation phases followed by price pumps and then dumps
+        pump_and_dumps = []
+        # Group by address to analyze per wallet
+        address_groups = {}
+        for from_addr in impact_df['from'].unique():
+            address_groups[from_addr] = impact_df[impact_df['from'] == from_addr]
+        for to_addr in impact_df['to'].unique():
+            if to_addr in address_groups:
+                address_groups[to_addr] = pd.concat([
+                    address_groups[to_addr],
+                    impact_df[impact_df['to'] == to_addr]
+                ])
+            else:
+                address_groups[to_addr] = impact_df[impact_df['to'] == to_addr]
+        for address, addr_df in address_groups.items():
+            # Skip if not enough transactions
+            if len(addr_df) < accumulation_threshold + 2:
+                continue
+            # Look for continuous price increase followed by sharp drop
+            window_size = min(len(addr_df), 10)
+            for i in range(len(addr_df) - window_size + 1):
+                window = addr_df.iloc[i:i+window_size]
+                # Get cumulative price change in window
+                if len(window) >= 2:
+                    first_price = window.iloc[0]['pre_price']
+                    last_price = window.iloc[-1]['post_price']
+                    if first_price is None or last_price is None:
+                        continue
+                    cumulative_change = ((last_price - first_price) / first_price) * 100
+                    # Check for pump phase
+                    max_price = window['post_price'].max()
+                    max_idx = window['post_price'].idxmax()
+                    if max_idx < len(window) - 1:
+                        max_to_end = ((window.iloc[-1]['post_price'] - max_price) / max_price) * 100
+                        # If we have a pump followed by a dump
+                        if (cumulative_change > pump_threshold or
+                            any(window['impact_pct'] > pump_threshold)) and max_to_end < dump_threshold:
+                            # Create chart
+                            fig = go.Figure()
+                            # Plot price line
+                            times = [t.timestamp() for t in window['timestamp']]
+                            prices = []
+                            for _, row in window.iterrows():
+                                prices.append(row['pre_price'])
+                                prices.append(row['post_price'])
+                            times_expanded = []
+                            for t in times:
+                                times_expanded.append(t - 60)  # 1 min before
+                                times_expanded.append(t + 60)  # 1 min after
+                            fig.add_trace(go.Scatter(
+                                x=times_expanded,
+                                y=prices,
+                                mode='lines+markers',
+                                name='Price',
+                                line=dict(color='blue')
+                            ))
+                            # Highlight pump and dump phases
+                            max_time_idx = window.index.get_loc(max_idx)
+                            pump_x = times_expanded[:max_time_idx*2+2]
+                            pump_y = prices[:max_time_idx*2+2]
+                            dump_x = times_expanded[max_time_idx*2:]
+                            dump_y = prices[max_time_idx*2:]
+                            fig.add_trace(go.Scatter(
+                                x=pump_x,
+                                y=pump_y,
+                                mode='lines',
+                                line=dict(color='green', width=3),
+                                name='Pump Phase'
+                            ))
+                            fig.add_trace(go.Scatter(
+                                x=dump_x,
+                                y=dump_y,
+                                mode='lines',
+                                line=dict(color='red', width=3),
+                                name='Dump Phase'
+                            ))
+                            fig.update_layout(
+                                title='Potential Pump and Dump Pattern',
+                                xaxis_title='Time',
+                                yaxis_title='Price',
+                                hovermode='closest'
+                            )
+                            pump_and_dumps.append({
+                                "type": "Pump and Dump",
+                                "addresses": [address],
+                                "risk_level": "High" if max_to_end < dump_threshold * 1.5 else "Medium",
+                                "description": f"Price pumped {cumulative_change:.2f}% before dropping {max_to_end:.2f}%",
+                                "detection_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+                                "title": f"Pump ({cumulative_change:.1f}%) and Dump ({max_to_end:.1f}%)",
+                                "evidence": window,
+                                "chart": fig
+                            })
+        return pump_and_dumps

modules/detection.py ADDED Viewed

	@@ -0,0 +1,684 @@

+import pandas as pd
+import numpy as np
+from datetime import datetime, timedelta
+from typing import Dict, List, Optional, Union, Any, Tuple
+import plotly.graph_objects as go
+import plotly.express as px
+class ManipulationDetector:
+    """
+    Detect potential market manipulation patterns in whale transactions
+    """
+    def __init__(self):
+        # Define known manipulation patterns
+        self.patterns = {
+            "pump_and_dump": {
+                "description": "Rapid buys followed by coordinated sell-offs, causing price to first rise then crash",
+                "risk_factor": 0.8
+            },
+            "wash_trading": {
+                "description": "Self-trading across multiple addresses to create false impression of market activity",
+                "risk_factor": 0.9
+            },
+            "spoofing": {
+                "description": "Large orders placed then canceled before execution to manipulate price",
+                "risk_factor": 0.7
+            },
+            "layering": {
+                "description": "Multiple orders at different price levels to create false impression of market depth",
+                "risk_factor": 0.6
+            },
+            "momentum_ignition": {
+                "description": "Creating sharp price moves to trigger other participants' momentum-based trading",
+                "risk_factor": 0.5
+            }
+        }
+    def detect_wash_trading(self,
+                           transactions_df: pd.DataFrame,
+                           addresses: List[str],
+                           sensitivity: str = "Medium",
+                           lookback_hours: int = 24) -> List[Dict[str, Any]]:
+        """
+        Detect potential wash trading between addresses
+        Args:
+            transactions_df: DataFrame of transactions
+            addresses: List of addresses to analyze
+            sensitivity: Detection sensitivity ("Low", "Medium", "High")
+            lookback_hours: Hours to look back for wash trading patterns
+        Returns:
+            List of potential wash trading alerts
+        """
+        if transactions_df.empty or not addresses:
+            return []
+        # Ensure from/to columns exist
+        if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
+            from_col, to_col = 'From', 'To'
+        elif 'from' in transactions_df.columns and 'to' in transactions_df.columns:
+            from_col, to_col = 'from', 'to'
+        else:
+            raise ValueError("From/To columns not found in transactions DataFrame")
+        # Ensure timestamp column exists
+        if 'Timestamp' in transactions_df.columns:
+            timestamp_col = 'Timestamp'
+        elif 'timeStamp' in transactions_df.columns:
+            timestamp_col = 'timeStamp'
+        else:
+            raise ValueError("Timestamp column not found in transactions DataFrame")
+        # Ensure timestamp is datetime
+        if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
+            if isinstance(transactions_df[timestamp_col].iloc[0], (int, float)):
+                transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col], unit='s')
+            else:
+                transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col])
+        # Define sensitivity thresholds
+        if sensitivity == "Low":
+            min_cycles = 3  # Minimum number of back-and-forth transactions
+            max_time_diff = 120  # Maximum minutes between transactions
+        elif sensitivity == "Medium":
+            min_cycles = 2
+            max_time_diff = 60
+        else:  # High
+            min_cycles = 1
+            max_time_diff = 30
+        # Filter transactions by lookback period
+        lookback_time = datetime.now() - timedelta(hours=lookback_hours)
+        recent_txs = transactions_df[transactions_df[timestamp_col] >= lookback_time]
+        if recent_txs.empty:
+            return []
+        # Filter transactions involving the addresses
+        address_txs = recent_txs[
+            (recent_txs[from_col].isin(addresses)) |
+            (recent_txs[to_col].isin(addresses))
+        ].copy()
+        if address_txs.empty:
+            return []
+        # Sort by timestamp
+        address_txs = address_txs.sort_values(by=timestamp_col)
+        # Detect cycles of transactions between same addresses
+        wash_trades = []
+        for addr1 in addresses:
+            for addr2 in addresses:
+                if addr1 == addr2:
+                    continue
+                # Find transactions from addr1 to addr2
+                a1_to_a2 = address_txs[
+                    (address_txs[from_col] == addr1) &
+                    (address_txs[to_col] == addr2)
+                ]
+                # Find transactions from addr2 to addr1
+                a2_to_a1 = address_txs[
+                    (address_txs[from_col] == addr2) &
+                    (address_txs[to_col] == addr1)
+                ]
+                if a1_to_a2.empty or a2_to_a1.empty:
+                    continue
+                # Check for back-and-forth patterns
+                cycles = 0
+                evidence = []
+                for _, tx1 in a1_to_a2.iterrows():
+                    tx1_time = tx1[timestamp_col]
+                    # Find return transactions within the time window
+                    return_txs = a2_to_a1[
+                        (a2_to_a1[timestamp_col] > tx1_time) &
+                        (a2_to_a1[timestamp_col] <= tx1_time + pd.Timedelta(minutes=max_time_diff))
+                    ]
+                    if not return_txs.empty:
+                        cycles += 1
+                        evidence.append(tx1)
+                        evidence.append(return_txs.iloc[0])
+                if cycles >= min_cycles:
+                    # Create visualization
+                    if evidence:
+                        evidence_df = pd.DataFrame(evidence)
+                        # Get amount column
+                        if 'Amount' in evidence_df.columns:
+                            amount_col = 'Amount'
+                        elif 'tokenAmount' in evidence_df.columns:
+                            amount_col = 'tokenAmount'
+                        elif 'value' in evidence_df.columns:
+                            # Try to adjust for decimals if 'tokenDecimal' exists
+                            if 'tokenDecimal' in evidence_df.columns:
+                                evidence_df['adjustedValue'] = evidence_df['value'].astype(float) / (10 ** evidence_df['tokenDecimal'].astype(int))
+                                amount_col = 'adjustedValue'
+                            else:
+                                amount_col = 'value'
+                        else:
+                            amount_col = None
+                        # Create figure if amount column exists
+                        if amount_col:
+                            fig = px.scatter(
+                                evidence_df,
+                                x=timestamp_col,
+                                y=amount_col,
+                                color=from_col,
+                                title=f"Potential Wash Trading Between {addr1[:8]}... and {addr2[:8]}..."
+                            )
+                        else:
+                            fig = None
+                    else:
+                        fig = None
+                    wash_trades.append({
+                        "type": "Wash Trading",
+                        "addresses": [addr1, addr2],
+                        "risk_level": "High" if cycles >= min_cycles * 2 else "Medium",
+                        "description": f"Detected {cycles} cycles of back-and-forth transactions between addresses",
+                        "detection_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+                        "title": f"Wash Trading Pattern ({cycles} cycles)",
+                        "evidence": pd.DataFrame(evidence) if evidence else None,
+                        "chart": fig
+                    })
+        return wash_trades
+    def detect_pump_and_dump(self,
+                            transactions_df: pd.DataFrame,
+                            price_data: Dict[str, Dict[str, Any]],
+                            sensitivity: str = "Medium") -> List[Dict[str, Any]]:
+        """
+        Detect potential pump and dump schemes
+        Args:
+            transactions_df: DataFrame of transactions
+            price_data: Dictionary of price impact data for each transaction
+            sensitivity: Detection sensitivity ("Low", "Medium", "High")
+        Returns:
+            List of potential pump and dump alerts
+        """
+        if transactions_df.empty or not price_data:
+            return []
+        # Ensure timestamp column exists
+        if 'Timestamp' in transactions_df.columns:
+            timestamp_col = 'Timestamp'
+        elif 'timeStamp' in transactions_df.columns:
+            timestamp_col = 'timeStamp'
+        else:
+            raise ValueError("Timestamp column not found in transactions DataFrame")
+        # Ensure from/to columns exist
+        if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
+            from_col, to_col = 'From', 'To'
+        elif 'from' in transactions_df.columns and 'to' in transactions_df.columns:
+            from_col, to_col = 'from', 'to'
+        else:
+            raise ValueError("From/To columns not found in transactions DataFrame")
+        # Ensure timestamp is datetime
+        if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
+            if isinstance(transactions_df[timestamp_col].iloc[0], (int, float)):
+                transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col], unit='s')
+            else:
+                transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col])
+        # Define sensitivity thresholds
+        if sensitivity == "Low":
+            accumulation_threshold = 5  # Number of buys to consider accumulation
+            pump_threshold = 10.0  # % price increase to trigger pump
+            dump_threshold = -8.0  # % price decrease to trigger dump
+        elif sensitivity == "Medium":
+            accumulation_threshold = 3
+            pump_threshold = 7.0
+            dump_threshold = -5.0
+        else:  # High
+            accumulation_threshold = 2
+            pump_threshold = 5.0
+            dump_threshold = -3.0
+        # Combine price impact data with transactions
+        txs_with_impact = []
+        for idx, row in transactions_df.iterrows():
+            tx_hash = row.get('Transaction Hash', row.get('hash', None))
+            if not tx_hash or tx_hash not in price_data:
+                continue
+            tx_impact = price_data[tx_hash]
+            if tx_impact['impact_pct'] is None:
+                continue
+            txs_with_impact.append({
+                'transaction_hash': tx_hash,
+                'timestamp': row[timestamp_col],
+                'from': row[from_col],
+                'to': row[to_col],
+                'pre_price': tx_impact['pre_price'],
+                'post_price': tx_impact['post_price'],
+                'impact_pct': tx_impact['impact_pct']
+            })
+        if not txs_with_impact:
+            return []
+        impact_df = pd.DataFrame(txs_with_impact)
+        impact_df = impact_df.sort_values(by='timestamp')
+        # Look for accumulation phases followed by price pumps and then dumps
+        pump_and_dumps = []
+        # Group by address to analyze per wallet
+        address_groups = {}
+        for from_addr in impact_df['from'].unique():
+            address_groups[from_addr] = impact_df[impact_df['from'] == from_addr]
+        for to_addr in impact_df['to'].unique():
+            if to_addr in address_groups:
+                address_groups[to_addr] = pd.concat([
+                    address_groups[to_addr],
+                    impact_df[impact_df['to'] == to_addr]
+                ])
+            else:
+                address_groups[to_addr] = impact_df[impact_df['to'] == to_addr]
+        for address, addr_df in address_groups.items():
+            # Skip if not enough transactions
+            if len(addr_df) < accumulation_threshold + 2:
+                continue
+            # Look for continuous price increase followed by sharp drop
+            window_size = min(len(addr_df), 10)
+            for i in range(len(addr_df) - window_size + 1):
+                window = addr_df.iloc[i:i+window_size]
+                # Get cumulative price change in window
+                if len(window) >= 2:
+                    first_price = window.iloc[0]['pre_price']
+                    last_price = window.iloc[-1]['post_price']
+                    if first_price is None or last_price is None:
+                        continue
+                    cumulative_change = ((last_price - first_price) / first_price) * 100
+                    # Check for pump phase
+                    max_price = window['post_price'].max()
+                    max_idx = window['post_price'].idxmax()
+                    if max_idx < len(window) - 1:
+                        max_to_end = ((window.iloc[-1]['post_price'] - max_price) / max_price) * 100
+                        # If we have a pump followed by a dump
+                        if (cumulative_change > pump_threshold or
+                            any(window['impact_pct'] > pump_threshold)) and max_to_end < dump_threshold:
+                            # Create chart
+                            fig = go.Figure()
+                            # Plot price line
+                            times = [t.timestamp() for t in window['timestamp']]
+                            prices = []
+                            for _, row in window.iterrows():
+                                prices.append(row['pre_price'])
+                                prices.append(row['post_price'])
+                            times_expanded = []
+                            for t in times:
+                                times_expanded.append(t - 60)  # 1 min before
+                                times_expanded.append(t + 60)  # 1 min after
+                            fig.add_trace(go.Scatter(
+                                x=times_expanded,
+                                y=prices,
+                                mode='lines+markers',
+                                name='Price',
+                                line=dict(color='blue')
+                            ))
+                            # Highlight pump and dump phases
+                            max_time_idx = window.index.get_loc(max_idx)
+                            pump_x = times_expanded[:max_time_idx*2+2]
+                            pump_y = prices[:max_time_idx*2+2]
+                            dump_x = times_expanded[max_time_idx*2:]
+                            dump_y = prices[max_time_idx*2:]
+                            fig.add_trace(go.Scatter(
+                                x=pump_x,
+                                y=pump_y,
+                                mode='lines',
+                                line=dict(color='green', width=3),
+                                name='Pump Phase'
+                            ))
+                            fig.add_trace(go.Scatter(
+                                x=dump_x,
+                                y=dump_y,
+                                mode='lines',
+                                line=dict(color='red', width=3),
+                                name='Dump Phase'
+                            ))
+                            fig.update_layout(
+                                title='Potential Pump and Dump Pattern',
+                                xaxis_title='Time',
+                                yaxis_title='Price',
+                                hovermode='closest'
+                            )
+                            pump_and_dumps.append({
+                                "type": "Pump and Dump",
+                                "addresses": [address],
+                                "risk_level": "High" if max_to_end < dump_threshold * 1.5 else "Medium",
+                                "description": f"Price pumped {cumulative_change:.2f}% before dropping {max_to_end:.2f}%",
+                                "detection_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+                                "title": f"Pump ({cumulative_change:.1f}%) and Dump ({max_to_end:.1f}%)",
+                                "evidence": window,
+                                "chart": fig
+                            })
+        return pump_and_dumps
+    def detect_spoofing(self,
+                       transactions_df: pd.DataFrame,
+                       order_book_data: Optional[pd.DataFrame] = None,
+                       sensitivity: str = "Medium") -> List[Dict[str, Any]]:
+        """
+        Detect potential spoofing (placing and quickly canceling large orders)
+        Args:
+            transactions_df: DataFrame of transactions
+            order_book_data: Optional DataFrame of order book data
+            sensitivity: Detection sensitivity ("Low", "Medium", "High")
+        Returns:
+            List of potential spoofing alerts
+        """
+        # Note: This is a placeholder since we don't have direct order book data
+        # In a real implementation, this would analyze order placement and cancellations
+        # For now, return an empty list as we can't detect spoofing without order book data
+        return []
+    def detect_layering(self,
+                       transactions_df: pd.DataFrame,
+                       order_book_data: Optional[pd.DataFrame] = None,
+                       sensitivity: str = "Medium") -> List[Dict[str, Any]]:
+        """
+        Detect potential layering (placing multiple orders at different price levels)
+        Args:
+            transactions_df: DataFrame of transactions
+            order_book_data: Optional DataFrame of order book data
+            sensitivity: Detection sensitivity ("Low", "Medium", "High")
+        Returns:
+            List of potential layering alerts
+        """
+        # Note: This is a placeholder since we don't have direct order book data
+        # In a real implementation, this would analyze order book depth and patterns
+        # For now, return an empty list as we can't detect layering without order book data
+        return []
+    def detect_momentum_ignition(self,
+                               transactions_df: pd.DataFrame,
+                               price_data: Dict[str, Dict[str, Any]],
+                               sensitivity: str = "Medium") -> List[Dict[str, Any]]:
+        """
+        Detect potential momentum ignition (creating sharp price moves)
+        Args:
+            transactions_df: DataFrame of transactions
+            price_data: Dictionary of price impact data for each transaction
+            sensitivity: Detection sensitivity ("Low", "Medium", "High")
+        Returns:
+            List of potential momentum ignition alerts
+        """
+        if transactions_df.empty or not price_data:
+            return []
+        # Ensure timestamp column exists
+        if 'Timestamp' in transactions_df.columns:
+            timestamp_col = 'Timestamp'
+        elif 'timeStamp' in transactions_df.columns:
+            timestamp_col = 'timeStamp'
+        else:
+            raise ValueError("Timestamp column not found in transactions DataFrame")
+        # Ensure timestamp is datetime
+        if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
+            if isinstance(transactions_df[timestamp_col].iloc[0], (int, float)):
+                transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col], unit='s')
+            else:
+                transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col])
+        # Define sensitivity thresholds
+        if sensitivity == "Low":
+            impact_threshold = 15.0  # % price impact to trigger alert
+            time_window_minutes = 5  # Time window to look for follow-up transactions
+        elif sensitivity == "Medium":
+            impact_threshold = 10.0
+            time_window_minutes = 10
+        else:  # High
+            impact_threshold = 5.0
+            time_window_minutes = 15
+        # Combine price impact data with transactions
+        txs_with_impact = []
+        for idx, row in transactions_df.iterrows():
+            tx_hash = row.get('Transaction Hash', row.get('hash', None))
+            if not tx_hash or tx_hash not in price_data:
+                continue
+            tx_impact = price_data[tx_hash]
+            if tx_impact['impact_pct'] is None:
+                continue
+            txs_with_impact.append({
+                'transaction_hash': tx_hash,
+                'timestamp': row[timestamp_col],
+                'from': row.get('From', row.get('from', 'Unknown')),
+                'to': row.get('To', row.get('to', 'Unknown')),
+                'pre_price': tx_impact['pre_price'],
+                'post_price': tx_impact['post_price'],
+                'impact_pct': tx_impact['impact_pct']
+            })
+        if not txs_with_impact:
+            return []
+        impact_df = pd.DataFrame(txs_with_impact)
+        impact_df = impact_df.sort_values(by='timestamp')
+        # Look for large price impacts followed by increased trading activity
+        momentum_alerts = []
+        # Find high-impact transactions
+        high_impact_txs = impact_df[abs(impact_df['impact_pct']) > impact_threshold]
+        for idx, high_impact_tx in high_impact_txs.iterrows():
+            tx_time = high_impact_tx['timestamp']
+            # Look for increased trading activity after the high-impact transaction
+            follow_up_window = impact_df[
+                (impact_df['timestamp'] > tx_time) &
+                (impact_df['timestamp'] <= tx_time + pd.Timedelta(minutes=time_window_minutes))
+            ]
+            # Compare activity to baseline (same time window before the transaction)
+            baseline_window = impact_df[
+                (impact_df['timestamp'] < tx_time) &
+                (impact_df['timestamp'] >= tx_time - pd.Timedelta(minutes=time_window_minutes))
+            ]
+            if len(follow_up_window) > len(baseline_window) * 1.5 and len(follow_up_window) >= 3:
+                # Create chart
+                fig = go.Figure()
+                # Plot price timeline
+                all_relevant_txs = pd.concat([
+                    pd.DataFrame([high_impact_tx]),
+                    follow_up_window,
+                    baseline_window
+                ]).sort_values(by='timestamp')
+                # Create time series for price
+                timestamps = all_relevant_txs['timestamp']
+                prices = []
+                for _, row in all_relevant_txs.iterrows():
+                    prices.append(row['pre_price'])
+                    prices.append(row['post_price'])
+                times_expanded = []
+                for t in timestamps:
+                    times_expanded.append(t - pd.Timedelta(seconds=30))
+                    times_expanded.append(t + pd.Timedelta(seconds=30))
+                # Plot price line
+                fig.add_trace(go.Scatter(
+                    x=times_expanded[:len(prices)],  # In case of any length mismatch
+                    y=prices[:len(times_expanded)],
+                    mode='lines',
+                    name='Price'
+                ))
+                # Highlight the high-impact transaction
+                fig.add_trace(go.Scatter(
+                    x=[high_impact_tx['timestamp']],
+                    y=[high_impact_tx['post_price']],
+                    mode='markers',
+                    marker=dict(
+                        size=15,
+                        color='red',
+                        symbol='circle'
+                    ),
+                    name='Momentum Ignition'
+                ))
+                # Highlight the follow-up transactions
+                if not follow_up_window.empty:
+                    fig.add_trace(go.Scatter(
+                        x=follow_up_window['timestamp'],
+                        y=follow_up_window['post_price'],
+                        mode='markers',
+                        marker=dict(
+                            size=10,
+                            color='orange',
+                            symbol='circle'
+                        ),
+                        name='Follow-up Activity'
+                    ))
+                fig.update_layout(
+                    title='Potential Momentum Ignition Pattern',
+                    xaxis_title='Time',
+                    yaxis_title='Price',
+                    hovermode='closest'
+                )
+                momentum_alerts.append({
+                    "type": "Momentum Ignition",
+                    "addresses": [high_impact_tx['from']],
+                    "risk_level": "High" if abs(high_impact_tx['impact_pct']) > impact_threshold * 1.5 else "Medium",
+                    "description": f"Large {high_impact_tx['impact_pct']:.2f}% price move followed by {len(follow_up_window)} transactions in {time_window_minutes} minutes (vs {len(baseline_window)} in baseline)",
+                    "detection_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+                    "title": f"Momentum Ignition ({high_impact_tx['impact_pct']:.1f}% price move)",
+                    "evidence": pd.concat([pd.DataFrame([high_impact_tx]), follow_up_window]),
+                    "chart": fig
+                })
+        return momentum_alerts
+    def run_all_detections(self,
+                         transactions_df: pd.DataFrame,
+                         addresses: List[str],
+                         price_data: Dict[str, Dict[str, Any]] = None,
+                         order_book_data: Optional[pd.DataFrame] = None,
+                         sensitivity: str = "Medium") -> List[Dict[str, Any]]:
+        """
+        Run all manipulation detection algorithms
+        Args:
+            transactions_df: DataFrame of transactions
+            addresses: List of addresses to analyze
+            price_data: Optional dictionary of price impact data for each transaction
+            order_book_data: Optional DataFrame of order book data
+            sensitivity: Detection sensitivity ("Low", "Medium", "High")
+        Returns:
+            List of potential manipulation alerts
+        """
+        if transactions_df.empty:
+            return []
+        all_alerts = []
+        # Detect wash trading
+        wash_trading_alerts = self.detect_wash_trading(
+            transactions_df=transactions_df,
+            addresses=addresses,
+            sensitivity=sensitivity
+        )
+        all_alerts.extend(wash_trading_alerts)
+        # Detect pump and dump (if price data available)
+        if price_data:
+            pump_and_dump_alerts = self.detect_pump_and_dump(
+                transactions_df=transactions_df,
+                price_data=price_data,
+                sensitivity=sensitivity
+            )
+            all_alerts.extend(pump_and_dump_alerts)
+            # Detect momentum ignition (if price data available)
+            momentum_alerts = self.detect_momentum_ignition(
+                transactions_df=transactions_df,
+                price_data=price_data,
+                sensitivity=sensitivity
+            )
+            all_alerts.extend(momentum_alerts)
+        # Detect spoofing (if order book data available)
+        if order_book_data is not None:
+            spoofing_alerts = self.detect_spoofing(
+                transactions_df=transactions_df,
+                order_book_data=order_book_data,
+                sensitivity=sensitivity
+            )
+            all_alerts.extend(spoofing_alerts)
+            # Detect layering (if order book data available)
+            layering_alerts = self.detect_layering(
+                transactions_df=transactions_df,
+                order_book_data=order_book_data,
+                sensitivity=sensitivity
+            )
+            all_alerts.extend(layering_alerts)
+        # Sort alerts by risk level
+        risk_order = {"High": 0, "Medium": 1, "Low": 2}
+        all_alerts.sort(key=lambda x: risk_order.get(x.get("risk_level", "Low"), 3))
+        return all_alerts

modules/tools.py ADDED Viewed

	@@ -0,0 +1,373 @@

+import json
+import pandas as pd
+from datetime import datetime
+from typing import Dict, List, Optional, Union, Any, Tuple
+from langchain.tools import tool
+from modules.api_client import ArbiscanClient, GeminiClient
+from modules.data_processor import DataProcessor
+# Tools for Arbiscan API
+class ArbiscanTools:
+    def __init__(self, arbiscan_client: ArbiscanClient):
+        self.client = arbiscan_client
+    @tool("get_token_transfers")
+    def get_token_transfers(self, address: str, contract_address: Optional[str] = None) -> str:
+        """
+        Get ERC-20 token transfers for a specific address
+        Args:
+            address: Wallet address
+            contract_address: Optional token contract address to filter by
+        Returns:
+            List of token transfers as JSON string
+        """
+        transfers = self.client.get_token_transfers(
+            address=address,
+            contract_address=contract_address
+        )
+        return json.dumps(transfers)
+    @tool("get_token_balance")
+    def get_token_balance(self, address: str, contract_address: str) -> str:
+        """
+        Get the current balance of a specific token for an address
+        Args:
+            address: Wallet address
+            contract_address: Token contract address
+        Returns:
+            Token balance
+        """
+        balance = self.client.get_token_balance(
+            address=address,
+            contract_address=contract_address
+        )
+        return balance
+    @tool("get_normal_transactions")
+    def get_normal_transactions(self, address: str) -> str:
+        """
+        Get normal transactions (ETH/ARB transfers) for a specific address
+        Args:
+            address: Wallet address
+        Returns:
+            List of normal transactions as JSON string
+        """
+        transactions = self.client.get_normal_transactions(address=address)
+        return json.dumps(transactions)
+    @tool("get_internal_transactions")
+    def get_internal_transactions(self, address: str) -> str:
+        """
+        Get internal transactions for a specific address
+        Args:
+            address: Wallet address
+        Returns:
+            List of internal transactions as JSON string
+        """
+        transactions = self.client.get_internal_transactions(address=address)
+        return json.dumps(transactions)
+    @tool("fetch_whale_transactions")
+    def fetch_whale_transactions(self,
+                              addresses: List[str],
+                              token_address: Optional[str] = None,
+                              min_token_amount: Optional[float] = None,
+                              min_usd_value: Optional[float] = None) -> str:
+        """
+        Fetch whale transactions for a list of addresses
+        Args:
+            addresses: List of wallet addresses
+            token_address: Optional token contract address to filter by
+            min_token_amount: Minimum token amount
+            min_usd_value: Minimum USD value
+        Returns:
+            DataFrame of whale transactions as JSON string
+        """
+        transactions_df = self.client.fetch_whale_transactions(
+            addresses=addresses,
+            token_address=token_address,
+            min_token_amount=min_token_amount,
+            min_usd_value=min_usd_value
+        )
+        return transactions_df.to_json(orient="records")
+# Tools for Gemini API
+class GeminiTools:
+    def __init__(self, gemini_client: GeminiClient):
+        self.client = gemini_client
+    @tool("get_current_price")
+    def get_current_price(self, symbol: str) -> str:
+        """
+        Get the current price of a token
+        Args:
+            symbol: Token symbol (e.g., "ETHUSD")
+        Returns:
+            Current price
+        """
+        price = self.client.get_current_price(symbol=symbol)
+        return str(price) if price is not None else "Price not found"
+    @tool("get_historical_prices")
+    def get_historical_prices(self,
+                             symbol: str,
+                             start_time: str,
+                             end_time: str) -> str:
+        """
+        Get historical prices for a token within a time range
+        Args:
+            symbol: Token symbol (e.g., "ETHUSD")
+            start_time: Start datetime in ISO format
+            end_time: End datetime in ISO format
+        Returns:
+            DataFrame of historical prices as JSON string
+        """
+        # Parse datetime strings
+        start_time_dt = datetime.fromisoformat(start_time.replace('Z', '+00:00'))
+        end_time_dt = datetime.fromisoformat(end_time.replace('Z', '+00:00'))
+        prices_df = self.client.get_historical_prices(
+            symbol=symbol,
+            start_time=start_time_dt,
+            end_time=end_time_dt
+        )
+        if prices_df is not None:
+            return prices_df.to_json(orient="records")
+        else:
+            return "[]"
+    @tool("get_price_impact")
+    def get_price_impact(self,
+                        symbol: str,
+                        transaction_time: str,
+                        lookback_minutes: int = 5,
+                        lookahead_minutes: int = 5) -> str:
+        """
+        Analyze the price impact before and after a transaction
+        Args:
+            symbol: Token symbol (e.g., "ETHUSD")
+            transaction_time: Transaction datetime in ISO format
+            lookback_minutes: Minutes to look back before the transaction
+            lookahead_minutes: Minutes to look ahead after the transaction
+        Returns:
+            Price impact data as JSON string
+        """
+        # Parse datetime string
+        transaction_time_dt = datetime.fromisoformat(transaction_time.replace('Z', '+00:00'))
+        impact_data = self.client.get_price_impact(
+            symbol=symbol,
+            transaction_time=transaction_time_dt,
+            lookback_minutes=lookback_minutes,
+            lookahead_minutes=lookahead_minutes
+        )
+        # Convert to JSON string
+        result = {
+            "pre_price": impact_data["pre_price"],
+            "post_price": impact_data["post_price"],
+            "impact_pct": impact_data["impact_pct"]
+        }
+        return json.dumps(result)
+# Tools for Data Processor
+class DataProcessorTools:
+    def __init__(self, data_processor: DataProcessor):
+        self.processor = data_processor
+    @tool("aggregate_transactions")
+    def aggregate_transactions(self,
+                              transactions_json: str,
+                              time_window: str = 'D') -> str:
+        """
+        Aggregate transactions by time window
+        Args:
+            transactions_json: JSON string of transactions
+            time_window: Time window for aggregation (e.g., 'D' for day, 'H' for hour)
+        Returns:
+            Aggregated DataFrame as JSON string
+        """
+        # Convert JSON to DataFrame
+        transactions_df = pd.read_json(transactions_json)
+        # Process data
+        agg_df = self.processor.aggregate_transactions(
+            transactions_df=transactions_df,
+            time_window=time_window
+        )
+        # Convert result to JSON
+        return agg_df.to_json(orient="records")
+    @tool("identify_patterns")
+    def identify_patterns(self,
+                         transactions_json: str,
+                         n_clusters: int = 3) -> str:
+        """
+        Identify trading patterns using clustering
+        Args:
+            transactions_json: JSON string of transactions
+            n_clusters: Number of clusters for K-Means
+        Returns:
+            List of pattern dictionaries as JSON string
+        """
+        # Convert JSON to DataFrame
+        transactions_df = pd.read_json(transactions_json)
+        # Process data
+        patterns = self.processor.identify_patterns(
+            transactions_df=transactions_df,
+            n_clusters=n_clusters
+        )
+        # Convert result to JSON
+        result = []
+        for pattern in patterns:
+            # Convert non-serializable objects to serializable format
+            pattern_json = {
+                "name": pattern["name"],
+                "description": pattern["description"],
+                "cluster_id": pattern["cluster_id"],
+                "occurrence_count": pattern["occurrence_count"],
+                "confidence": pattern["confidence"],
+                # Skip chart_data as it's not JSON serializable
+                "examples": pattern["examples"].to_json(orient="records") if isinstance(pattern["examples"], pd.DataFrame) else []
+            }
+            result.append(pattern_json)
+        return json.dumps(result)
+    @tool("detect_anomalous_transactions")
+    def detect_anomalous_transactions(self,
+                                     transactions_json: str,
+                                     sensitivity: str = "Medium") -> str:
+        """
+        Detect anomalous transactions using statistical methods
+        Args:
+            transactions_json: JSON string of transactions
+            sensitivity: Detection sensitivity ("Low", "Medium", "High")
+        Returns:
+            DataFrame of anomalous transactions as JSON string
+        """
+        # Convert JSON to DataFrame
+        transactions_df = pd.read_json(transactions_json)
+        # Process data
+        anomalies_df = self.processor.detect_anomalous_transactions(
+            transactions_df=transactions_df,
+            sensitivity=sensitivity
+        )
+        # Convert result to JSON
+        return anomalies_df.to_json(orient="records")
+    @tool("analyze_price_impact")
+    def analyze_price_impact(self,
+                           transactions_json: str,
+                           price_data_json: str) -> str:
+        """
+        Analyze the price impact of transactions
+        Args:
+            transactions_json: JSON string of transactions
+            price_data_json: JSON string of price impact data
+        Returns:
+            Price impact analysis as JSON string
+        """
+        # Convert JSON to DataFrame
+        transactions_df = pd.read_json(transactions_json)
+        # Convert price_data_json to dictionary
+        price_data = json.loads(price_data_json)
+        # Process data
+        impact_analysis = self.processor.analyze_price_impact(
+            transactions_df=transactions_df,
+            price_data=price_data
+        )
+        # Convert result to JSON (excluding non-serializable objects)
+        result = {
+            "avg_impact_pct": impact_analysis.get("avg_impact_pct"),
+            "max_impact_pct": impact_analysis.get("max_impact_pct"),
+            "min_impact_pct": impact_analysis.get("min_impact_pct"),
+            "significant_moves_count": impact_analysis.get("significant_moves_count"),
+            "total_transactions": impact_analysis.get("total_transactions"),
+            # Skip impact_chart as it's not JSON serializable
+            "transactions_with_impact": impact_analysis.get("transactions_with_impact").to_json(orient="records") if "transactions_with_impact" in impact_analysis else []
+        }
+        return json.dumps(result)
+    @tool("detect_wash_trading")
+    def detect_wash_trading(self,
+                          transactions_json: str,
+                          addresses_json: str,
+                          sensitivity: str = "Medium") -> str:
+        """
+        Detect potential wash trading between addresses
+        Args:
+            transactions_json: JSON string of transactions
+            addresses_json: JSON string of addresses to analyze
+            sensitivity: Detection sensitivity ("Low", "Medium", "High")
+        Returns:
+            List of potential wash trading incidents as JSON string
+        """
+        # Convert JSON to DataFrame
+        transactions_df = pd.read_json(transactions_json)
+        # Convert addresses_json to list
+        addresses = json.loads(addresses_json)
+        # Process data
+        wash_trades = self.processor.detect_wash_trading(
+            transactions_df=transactions_df,
+            addresses=addresses,
+            sensitivity=sensitivity
+        )
+        # Convert result to JSON (excluding non-serializable objects)
+        result = []
+        for trade in wash_trades:
+            trade_json = {
+                "type": trade["type"],
+                "addresses": trade["addresses"],
+                "risk_level": trade["risk_level"],
+                "description": trade["description"],
+                "detection_time": trade["detection_time"],
+                "title": trade["title"],
+                "evidence": trade["evidence"].to_json(orient="records") if isinstance(trade["evidence"], pd.DataFrame) else []
+                # Skip chart as it's not JSON serializable
+            }
+            result.append(trade_json)
+        return json.dumps(result)

modules/visualizer.py ADDED Viewed

	@@ -0,0 +1,638 @@

+import pandas as pd
+import numpy as np
+import plotly.graph_objects as go
+import plotly.express as px
+from datetime import datetime, timedelta
+from typing import Dict, List, Optional, Union, Any, Tuple
+import io
+import base64
+import matplotlib.pyplot as plt
+from matplotlib.backends.backend_pdf import PdfPages
+from reportlab.lib.pagesizes import letter
+from reportlab.pdfgen import canvas
+from reportlab.lib import colors
+from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, Spacer
+from reportlab.lib.styles import getSampleStyleSheet
+class Visualizer:
+    """
+    Generate visualizations and reports for whale transaction data
+    """
+    def __init__(self):
+        self.color_map = {
+            "buy": "green",
+            "sell": "red",
+            "transfer": "blue",
+            "other": "gray"
+        }
+    def create_transaction_timeline(self, transactions_df: pd.DataFrame) -> go.Figure:
+        """
+        Create a timeline visualization of transactions
+        Args:
+            transactions_df: DataFrame of transactions
+        Returns:
+            Plotly figure object
+        """
+        if transactions_df.empty:
+            fig = go.Figure()
+            fig.update_layout(
+                title="No Transaction Data Available",
+                xaxis_title="Date",
+                yaxis_title="Action",
+                height=400,
+                template="plotly_white"
+            )
+            fig.add_annotation(
+                text="No transaction data available for timeline",
+                showarrow=False,
+                font=dict(size=14)
+            )
+            return fig
+        try:
+            # Ensure timestamp column exists
+            if 'Timestamp' in transactions_df.columns:
+                timestamp_col = 'Timestamp'
+            elif 'timeStamp' in transactions_df.columns:
+                timestamp_col = 'timeStamp'
+                # Convert timestamp to datetime if it's not already
+                if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
+                    try:
+                        transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col].astype(float), unit='s')
+                    except Exception as e:
+                        print(f"Error converting timestamp: {str(e)}")
+                        transactions_df[timestamp_col] = pd.date_range(start='2025-01-01', periods=len(transactions_df), freq='H')
+            else:
+                # Create a dummy timestamp if none exists
+                transactions_df['dummy_timestamp'] = pd.date_range(start='2025-01-01', periods=len(transactions_df), freq='H')
+                timestamp_col = 'dummy_timestamp'
+            # Create figure
+            fig = go.Figure()
+            # Add transactions to timeline
+            for idx, row in transactions_df.iterrows():
+                # Determine transaction type
+                if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
+                    from_col, to_col = 'From', 'To'
+                else:
+                    from_col, to_col = 'from', 'to'
+                tx_type = "other"
+                hover_text = ""
+                if pd.isna(row[from_col]) or row[from_col] == '0x0000000000000000000000000000000000000000':
+                    tx_type = "buy"
+                    hover_text = f"Buy: {row[to_col]}"
+                elif pd.isna(row[to_col]) or row[to_col] == '0x0000000000000000000000000000000000000000':
+                    tx_type = "sell"
+                    hover_text = f"Sell: {row[from_col]}"
+                else:
+                    tx_type = "transfer"
+                    hover_text = f"Transfer: {row[from_col]} → {row[to_col]}"
+                # Add amount to hover text if available
+                if 'Amount' in row:
+                    hover_text += f"<br>Amount: {row['Amount']}"
+                elif 'value' in row:
+                    hover_text += f"<br>Value: {row['value']}"
+                # Add token info if available
+                if 'tokenSymbol' in row:
+                    hover_text += f"<br>Token: {row['tokenSymbol']}"
+                # Add transaction to timeline
+                fig.add_trace(go.Scatter(
+                    x=[row[timestamp_col]],
+                    y=[tx_type],
+                    mode='markers',
+                    marker=dict(
+                        size=12,
+                        color=self.color_map.get(tx_type, "gray"),
+                        line=dict(width=1, color='black')
+                    ),
+                    name=tx_type,
+                    text=hover_text,
+                    hoverinfo='text'
+                ))
+            # Update layout
+            fig.update_layout(
+                title='Whale Transaction Timeline',
+                xaxis_title='Time',
+                yaxis_title='Transaction Type',
+                height=400,
+                template='plotly_white',
+                showlegend=True,
+                hovermode='closest'
+            )
+            return fig
+        except Exception as e:
+            # If any error occurs, return a figure with error information
+            print(f"Error creating transaction timeline: {str(e)}")
+            fig = go.Figure()
+            fig.update_layout(
+                title="Error in Transaction Timeline",
+                xaxis_title="",
+                yaxis_title="",
+                height=400,
+                template="plotly_white"
+            )
+            fig.add_annotation(
+                text=f"Error generating timeline: {str(e)}",
+                showarrow=False,
+                font=dict(size=14, color="red")
+            )
+            return fig
+    def create_volume_chart(self, transactions_df: pd.DataFrame, time_window: str = 'D') -> go.Figure:
+        """
+        Create a volume chart aggregated by time window
+        Args:
+            transactions_df: DataFrame of transactions
+            time_window: Time window for aggregation (e.g., 'D' for day, 'H' for hour)
+        Returns:
+            Plotly figure object
+        """
+        # Create an empty figure with appropriate message if no data
+        if transactions_df.empty:
+            fig = go.Figure()
+            fig.update_layout(
+                title="No Transaction Data Available",
+                xaxis_title="Date",
+                yaxis_title="Volume",
+                height=400,
+                template="plotly_white"
+            )
+            fig.add_annotation(
+                text="No transactions found for volume analysis",
+                showarrow=False,
+                font=dict(size=14)
+            )
+            return fig
+        try:
+            # Create a deep copy to avoid modifying the original
+            df = transactions_df.copy()
+            # Ensure timestamp column exists and convert to datetime
+            if 'Timestamp' in df.columns:
+                timestamp_col = 'Timestamp'
+            elif 'timeStamp' in df.columns:
+                timestamp_col = 'timeStamp'
+            else:
+                # Create a dummy timestamp if none exists
+                df['dummy_timestamp'] = pd.date_range(start='2025-01-01', periods=len(df), freq='H')
+                timestamp_col = 'dummy_timestamp'
+            # Convert timestamp to datetime safely
+            if not pd.api.types.is_datetime64_any_dtype(df[timestamp_col]):
+                try:
+                    df[timestamp_col] = pd.to_datetime(df[timestamp_col].astype(float), unit='s')
+                except Exception as e:
+                    print(f"Error converting timestamp: {str(e)}")
+                    df[timestamp_col] = pd.date_range(start='2025-01-01', periods=len(df), freq='H')
+            # Ensure amount column exists
+            if 'Amount' in df.columns:
+                amount_col = 'Amount'
+            elif 'tokenAmount' in df.columns:
+                amount_col = 'tokenAmount'
+            elif 'value' in df.columns:
+                # Try to adjust for decimals if 'tokenDecimal' exists
+                if 'tokenDecimal' in df.columns:
+                    df['adjustedValue'] = df['value'].astype(float) / (10 ** df['tokenDecimal'].astype(int))
+                    amount_col = 'adjustedValue'
+                else:
+                    amount_col = 'value'
+            else:
+                # Create a dummy amount column if none exists
+                df['dummy_amount'] = 1.0
+                amount_col = 'dummy_amount'
+            # Alternative approach: manually aggregate by date to avoid index issues
+            df['date'] = df[timestamp_col].dt.date
+            # Group by date
+            volume_data = df.groupby('date').agg({
+                amount_col: 'sum',
+                timestamp_col: 'count'
+            }).reset_index()
+            volume_data.columns = ['Date', 'Volume', 'Count']
+            # Create figure
+            fig = go.Figure()
+            # Add volume bars
+            fig.add_trace(go.Bar(
+                x=volume_data['Date'],
+                y=volume_data['Volume'],
+                name='Volume',
+                marker_color='blue',
+                opacity=0.7
+            ))
+            # Add transaction count line
+            fig.add_trace(go.Scatter(
+                x=volume_data['Date'],
+                y=volume_data['Count'],
+                name='Transaction Count',
+                mode='lines+markers',
+                marker=dict(color='red'),
+                yaxis='y2'
+            ))
+            # Update layout
+            fig.update_layout(
+                title="Transaction Volume Over Time",
+                xaxis_title="Date",
+                yaxis_title="Volume",
+                yaxis2=dict(
+                    title="Transaction Count",
+                    overlaying="y",
+                    side="right"
+                ),
+                height=500,
+                template="plotly_white",
+                hovermode="x unified",
+                legend=dict(
+                    orientation="h",
+                    yanchor="bottom",
+                    y=1.02,
+                    xanchor="right",
+                    x=1
+                )
+            )
+            return fig
+        except Exception as e:
+            # If any error occurs, return a figure with error information
+            print(f"Error in create_volume_chart: {str(e)}")
+            fig = go.Figure()
+            fig.update_layout(
+                title="Error in Volume Chart",
+                xaxis_title="",
+                yaxis_title="",
+                height=400,
+                template="plotly_white"
+            )
+            fig.add_annotation(
+                text=f"Error generating volume chart: {str(e)}",
+                showarrow=False,
+                font=dict(size=14, color="red")
+            )
+            return fig
+    def plot_volume_by_day(self, transactions_df: pd.DataFrame) -> go.Figure:
+        """
+        Create a volume chart aggregated by day with improved visualization
+        Args:
+            transactions_df: DataFrame of transactions
+        Returns:
+            Plotly figure object
+        """
+        # This is a wrapper around create_volume_chart that specifically uses day as the time window
+        return self.create_volume_chart(transactions_df, time_window='D')
+    def plot_transaction_flow(self, transactions_df: pd.DataFrame) -> go.Figure:
+        """
+        Create a network flow visualization of transactions between wallets
+        Args:
+            transactions_df: DataFrame of transactions
+        Returns:
+            Plotly figure object
+        """
+        if transactions_df.empty:
+            # Return empty figure if no data
+            fig = go.Figure()
+            fig.update_layout(
+                title="No Transaction Flow Data Available",
+                xaxis_title="",
+                yaxis_title="",
+                height=400,
+                template="plotly_white"
+            )
+            fig.add_annotation(
+                text="No transactions found for flow analysis",
+                showarrow=False,
+                font=dict(size=14)
+            )
+            return fig
+        try:
+            # Ensure from/to columns exist
+            if 'From' in transactions_df.columns and 'To' in transactions_df.columns:
+                from_col, to_col = 'From', 'To'
+            elif 'from' in transactions_df.columns and 'to' in transactions_df.columns:
+                from_col, to_col = 'from', 'to'
+            else:
+                # Create an error visualization
+                fig = go.Figure()
+                fig.update_layout(
+                    title="Transaction Flow Error",
+                    xaxis_title="",
+                    yaxis_title="",
+                    height=400,
+                    template="plotly_white"
+                )
+                fig.add_annotation(
+                    text="From/To columns not found in transactions data",
+                    showarrow=False,
+                    font=dict(size=14, color="red")
+                )
+                return fig
+            # Ensure amount column exists
+            if 'Amount' in transactions_df.columns:
+                amount_col = 'Amount'
+            elif 'tokenAmount' in transactions_df.columns:
+                amount_col = 'tokenAmount'
+            elif 'value' in transactions_df.columns:
+                # Try to adjust for decimals if 'tokenDecimal' exists
+                if 'tokenDecimal' in transactions_df.columns:
+                    transactions_df['adjustedValue'] = transactions_df['value'].astype(float) / (10 ** transactions_df['tokenDecimal'].astype(int))
+                    amount_col = 'adjustedValue'
+                else:
+                    amount_col = 'value'
+            else:
+                # Create an error visualization
+                fig = go.Figure()
+                fig.update_layout(
+                    title="Transaction Flow Error",
+                    xaxis_title="",
+                    yaxis_title="",
+                    height=400,
+                    template="plotly_white"
+                )
+                fig.add_annotation(
+                    text="Amount column not found in transactions data",
+                    showarrow=False,
+                    font=dict(size=14, color="red")
+                )
+                return fig
+            # Aggregate flows between wallets
+            flow_df = transactions_df.groupby([from_col, to_col]).agg({
+                amount_col: ['sum', 'count']
+            }).reset_index()
+            flow_df.columns = [from_col, to_col, 'Value', 'Count']
+            # Limit to top 20 flows to keep visualization readable
+            top_flows = flow_df.sort_values('Value', ascending=False).head(20)
+            # Create Sankey diagram
+            # First, create a mapping of unique addresses to indices
+            all_addresses = pd.unique(top_flows[[from_col, to_col]].values.ravel('K'))
+            address_to_idx = {addr: i for i, addr in enumerate(all_addresses)}
+            # Create source, target, and value arrays for the Sankey diagram
+            sources = [address_to_idx[addr] for addr in top_flows[from_col]]
+            targets = [address_to_idx[addr] for addr in top_flows[to_col]]
+            values = top_flows['Value'].tolist()
+            # Create hover text
+            hover_text = [f"From: {src}<br>To: {tgt}<br>Value: {val:.2f}<br>Count: {cnt}"
+                         for src, tgt, val, cnt in zip(top_flows[from_col], top_flows[to_col],
+                                                      top_flows['Value'], top_flows['Count'])]
+            # Shorten addresses for node labels
+            node_labels = [f"{addr[:6]}...{addr[-4:]}" if len(addr) > 12 else addr
+                          for addr in all_addresses]
+            # Create Sankey diagram figure
+            fig = go.Figure(data=[go.Sankey(
+                node=dict(
+                    pad=15,
+                    thickness=20,
+                    line=dict(color="black", width=0.5),
+                    label=node_labels,
+                    color="blue"
+                ),
+                link=dict(
+                    source=sources,
+                    target=targets,
+                    value=values,
+                    label=hover_text,
+                    hovertemplate='%{label}<extra></extra>'
+                )
+            )])
+            fig.update_layout(
+                title="Whale Transaction Flow",
+                font_size=12,
+                height=600,
+                template="plotly_white"
+            )
+            return fig
+        except Exception as e:
+            # If any error occurs, return a figure with error information
+            print(f"Error in plot_transaction_flow: {str(e)}")
+            fig = go.Figure()
+            fig.update_layout(
+                title="Error in Transaction Flow",
+                xaxis_title="",
+                yaxis_title="",
+                height=400,
+                template="plotly_white"
+            )
+            fig.add_annotation(
+                text=f"Error generating transaction flow: {str(e)}",
+                showarrow=False,
+                font=dict(size=14, color="red")
+            )
+            return fig
+    def generate_pdf_report(self,
+                         transactions_df: pd.DataFrame,
+                         patterns: List[Dict[str, Any]] = None,
+                         price_impact: Dict[str, Any] = None,
+                         alerts: List[Dict[str, Any]] = None,
+                         title: str = "Whale Analysis Report",
+                         start_date: datetime = None,
+                         end_date: datetime = None) -> bytes:
+        """
+        Generate a PDF report of whale activity
+        Args:
+            transactions_df: DataFrame of transactions
+            patterns: List of pattern dictionaries
+            price_impact: Dictionary of price impact analysis
+            alerts: List of alert dictionaries
+            title: Report title
+            start_date: Start date for report period
+            end_date: End date for report period
+        Returns:
+            PDF report as bytes
+        """
+        buffer = io.BytesIO()
+        doc = SimpleDocTemplate(buffer, pagesize=letter)
+        elements = []
+        # Add title
+        styles = getSampleStyleSheet()
+        elements.append(Paragraph(title, styles['Title']))
+        # Add date range
+        if start_date and end_date:
+            date_range = f"Period: {start_date.strftime('%Y-%m-%d')} to {end_date.strftime('%Y-%m-%d')}"
+            elements.append(Paragraph(date_range, styles['Heading2']))
+        elements.append(Spacer(1, 12))
+        # Add transaction summary
+        if not transactions_df.empty:
+            elements.append(Paragraph("Transaction Summary", styles['Heading2']))
+            summary_data = [
+                ["Total Transactions", str(len(transactions_df))],
+                ["Unique Addresses", str(len(pd.unique(transactions_df['from'].tolist() + transactions_df['to'].tolist())))]
+            ]
+            # Add token breakdown if available
+            if 'tokenSymbol' in transactions_df.columns:
+                token_counts = transactions_df['tokenSymbol'].value_counts()
+                summary_data.append(["Most Common Token", f"{token_counts.index[0]} ({token_counts.iloc[0]} txns)"])
+            summary_table = Table(summary_data)
+            summary_table.setStyle(TableStyle([
+                ('BACKGROUND', (0, 0), (0, -1), colors.lightgrey),
+                ('GRID', (0, 0), (-1, -1), 1, colors.black),
+                ('PADDING', (0, 0), (-1, -1), 6),
+            ]))
+            elements.append(summary_table)
+            elements.append(Spacer(1, 12))
+        # Add pattern analysis
+        if patterns:
+            elements.append(Paragraph("Trading Patterns Detected", styles['Heading2']))
+            for i, pattern in enumerate(patterns):
+                pattern_text = f"Pattern {i+1}: {pattern.get('name', 'Unnamed')}\n"
+                pattern_text += f"Description: {pattern.get('description', 'No description')}\n"
+                if 'risk_profile' in pattern:
+                    pattern_text += f"Risk Profile: {pattern['risk_profile']}\n"
+                if 'confidence' in pattern:
+                    pattern_text += f"Confidence: {pattern['confidence']:.2f}\n"
+                elements.append(Paragraph(pattern_text, styles['Normal']))
+                elements.append(Spacer(1, 6))
+            elements.append(Spacer(1, 12))
+        # Add price impact analysis
+        if price_impact:
+            elements.append(Paragraph("Price Impact Analysis", styles['Heading2']))
+            impact_text = ""
+            if 'avg_impact' in price_impact:
+                impact_text += f"Average Impact: {price_impact['avg_impact']:.2f}%\n"
+            if 'max_impact' in price_impact:
+                impact_text += f"Maximum Impact: {price_impact['max_impact']:.2f}%\n"
+            if 'insights' in price_impact:
+                impact_text += f"Insights: {price_impact['insights']}\n"
+            elements.append(Paragraph(impact_text, styles['Normal']))
+            elements.append(Spacer(1, 12))
+        # Add alerts
+        if alerts:
+            elements.append(Paragraph("Alerts", styles['Heading2']))
+            for alert in alerts:
+                alert_text = f"{alert.get('level', 'Info')}: {alert.get('message', 'No details')}"
+                elements.append(Paragraph(alert_text, styles['Normal']))
+                elements.append(Spacer(1, 6))
+        # Build the PDF
+        doc.build(elements)
+        buffer.seek(0)
+        return buffer.getvalue()
+    def generate_csv_report(self,
+                         transactions_df: pd.DataFrame,
+                         report_type: str = "Transaction Summary") -> str:
+        """
+        Generate a CSV report of transaction data
+        Args:
+            transactions_df: DataFrame of transactions
+            report_type: Type of report to generate
+        Returns:
+            CSV data as string
+        """
+        if transactions_df.empty:
+            return "No data available for report"
+        if report_type == "Transaction Summary":
+            # Return basic transaction summary
+            return transactions_df.to_csv(index=False)
+        elif report_type == "Daily Volume":
+            # Get timestamp column
+            if 'Timestamp' in transactions_df.columns:
+                timestamp_col = 'Timestamp'
+            elif 'timeStamp' in transactions_df.columns:
+                timestamp_col = 'timeStamp'
+                # Convert timestamp to datetime if needed
+                if not pd.api.types.is_datetime64_any_dtype(transactions_df[timestamp_col]):
+                    try:
+                        transactions_df[timestamp_col] = pd.to_datetime(transactions_df[timestamp_col].astype(float), unit='s')
+                    except:
+                        return "Error processing timestamp data"
+            else:
+                return "Timestamp column not found"
+            # Get amount column
+            if 'Amount' in transactions_df.columns:
+                amount_col = 'Amount'
+            elif 'tokenAmount' in transactions_df.columns:
+                amount_col = 'tokenAmount'
+            elif 'value' in transactions_df.columns:
+                amount_col = 'value'
+            else:
+                return "Amount column not found"
+            # Aggregate by day
+            transactions_df['date'] = transactions_df[timestamp_col].dt.date
+            daily_volume = transactions_df.groupby('date').agg({
+                amount_col: 'sum',
+                'hash': 'count'  # Assuming 'hash' exists for all transactions
+            }).reset_index()
+            daily_volume.columns = ['Date', 'Volume', 'Transactions']
+            return daily_volume.to_csv(index=False)
+        else:
+            return "Unknown report type"
+    def generate_png_chart(self,
+                       fig: go.Figure,
+                       width: int = 1200,
+                       height: int = 800) -> bytes:
+        """
+        Convert a Plotly figure to PNG image data
+        Args:
+            fig: Plotly figure object
+            width: Image width in pixels
+            height: Image height in pixels
+        Returns:
+            PNG image as bytes
+        """
+        img_bytes = fig.to_image(format="png", width=width, height=height)
+        return img_bytes

requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+streamlit==1.30.0
+pandas==2.1.1
+numpy==1.26.0
+matplotlib==3.8.0
+plotly==5.18.0
+python-dotenv==1.0.0
+requests==2.31.0
+scikit-learn==1.3.1
+crewai>=0.28.0
+langchain>=0.1.0,<0.2.0
+reportlab==4.0.5
+weasyprint==60.1

test_api.py ADDED Viewed

	@@ -0,0 +1,205 @@

+import os
+import sys
+import json
+import urllib.request
+import urllib.parse
+import urllib.error
+from urllib.error import URLError, HTTPError
+# Simple dotenv implementation since the module may not be available
+def load_dotenv():
+    try:
+        with open('.env', 'r') as file:
+            for line in file:
+                line = line.strip()
+                if not line or line.startswith('#') or '=' not in line:
+                    continue
+                key, value = line.split('=', 1)
+                os.environ[key] = value
+    except Exception as e:
+        print(f"Error loading .env file: {e}")
+        return False
+    return True
+# Load environment variables
+load_dotenv()
+# Get API key from .env
+ARBISCAN_API_KEY = os.getenv("ARBISCAN_API_KEY")
+if not ARBISCAN_API_KEY:
+    print("ERROR: ARBISCAN_API_KEY not found in .env file")
+    sys.exit(1)
+print(f"Using Arbiscan API Key: {ARBISCAN_API_KEY[:5]}...")
+# Test addresses (known active ones)
+TEST_ADDRESSES = [
+    "0x5d8908afee1df9f7f0830105f8be828f97ce9e68",  # Arbitrum Treasury
+    "0x2b1ad6184a6b0fac06bd225ed37c2abc04415ff4",  # Large holder
+    "0xc47ff7f9efb3ef39c33a2c492a1372418d399ec2",  # Active trader
+]
+# User-provided addresses (from command line arguments)
+if len(sys.argv) > 1:
+    USER_ADDRESSES = sys.argv[1:]
+    TEST_ADDRESSES.extend(USER_ADDRESSES)
+    print(f"Added user-provided addresses: {USER_ADDRESSES}")
+def test_api_key():
+    """Test if the API key is valid"""
+    base_url = "https://api.arbiscan.io/api"
+    params = {
+        "module": "stats",
+        "action": "ethsupply",
+        "apikey": ARBISCAN_API_KEY
+    }
+    try:
+        print("\n===== TESTING API KEY =====")
+        # Construct URL with parameters
+        query_string = urllib.parse.urlencode(params)
+        url = f"{base_url}?{query_string}"
+        print(f"Making request to: {url}")
+        # Make the request
+        with urllib.request.urlopen(url) as response:
+            response_data = response.read().decode('utf-8')
+            data = json.loads(response_data)
+        print(f"Response status code: {response.status}")
+        print(f"Response JSON status: {data.get('status')}")
+        print(f"Response message: {data.get('message', 'No message')}")
+        if data.get("status") == "1":
+            print("✅ API KEY IS VALID")
+            return True
+        else:
+            print("❌ API KEY IS INVALID OR HAS ISSUES")
+            if "API Key" in data.get("message", ""):
+                print(f"Error message: {data.get('message')}")
+                print("→ You need to register for an API key at https://arbiscan.io/myapikey")
+            return False
+    except HTTPError as e:
+        print(f"❌ HTTP Error: {e.code} - {e.reason}")
+        return False
+    except URLError as e:
+        print(f"❌ URL Error: {e.reason}")
+        return False
+    except Exception as e:
+        print(f"❌ Error testing API key: {str(e)}")
+        return False
+def test_address(address):
+    """Test if an address has transactions on Arbitrum"""
+    base_url = "https://api.arbiscan.io/api"
+    # Test for token transfers
+    params_token = {
+        "module": "account",
+        "action": "tokentx",
+        "address": address,
+        "startblock": "0",
+        "endblock": "99999999",
+        "page": "1",
+        "offset": "10",  # Just get 10 for testing
+        "sort": "desc",
+        "apikey": ARBISCAN_API_KEY
+    }
+    # Test for normal transactions
+    params_normal = {
+        "module": "account",
+        "action": "txlist",
+        "address": address,
+        "startblock": "0",
+        "endblock": "99999999",
+        "page": "1",
+        "offset": "10",  # Just get 10 for testing
+        "sort": "desc",
+        "apikey": ARBISCAN_API_KEY
+    }
+    print(f"\n===== TESTING ADDRESS: {address} =====")
+    # Check token transfers
+    try:
+        print("Testing token transfers...")
+        # Construct URL with parameters
+        query_string = urllib.parse.urlencode(params_token)
+        url = f"{base_url}?{query_string}"
+        # Make the request
+        with urllib.request.urlopen(url) as response:
+            response_data = response.read().decode('utf-8')
+            data = json.loads(response_data)
+        if data.get("status") == "1":
+            transfers = data.get("result", [])
+            print(f"✅ Found {len(transfers)} token transfers")
+            if transfers:
+                print(f"First transfer: {json.dumps(transfers[0], indent=2)[:200]}...")
+        else:
+            print(f"❌ No token transfers found: {data.get('message', 'Unknown error')}")
+    except HTTPError as e:
+        print(f"❌ HTTP Error: {e.code} - {e.reason}")
+    except URLError as e:
+        print(f"❌ URL Error: {e.reason}")
+    except Exception as e:
+        print(f"❌ Error testing token transfers: {str(e)}")
+    # Check normal transactions
+    try:
+        print("\nTesting normal transactions...")
+        # Construct URL with parameters
+        query_string = urllib.parse.urlencode(params_normal)
+        url = f"{base_url}?{query_string}"
+        # Make the request
+        with urllib.request.urlopen(url) as response:
+            response_data = response.read().decode('utf-8')
+            data = json.loads(response_data)
+        if data.get("status") == "1":
+            transactions = data.get("result", [])
+            print(f"✅ Found {len(transactions)} normal transactions")
+            if transactions:
+                print(f"First transaction: {json.dumps(transactions[0], indent=2)[:200]}...")
+        else:
+            print(f"❌ No normal transactions found: {data.get('message', 'Unknown error')}")
+    except HTTPError as e:
+        print(f"❌ HTTP Error: {e.code} - {e.reason}")
+    except URLError as e:
+        print(f"❌ URL Error: {e.reason}")
+    except Exception as e:
+        print(f"❌ Error testing normal transactions: {str(e)}")
+def main():
+    """Main function to run tests"""
+    print("=================================================")
+    print("Arbitrum API Diagnostic Tool")
+    print("=================================================")
+    # Test the API key first
+    api_valid = test_api_key()
+    if not api_valid:
+        print("\n⚠️  Please update your API key in the .env file")
+        print("Register for an API key at https://arbiscan.io/myapikey")
+        return
+    # Test each address
+    for address in TEST_ADDRESSES:
+        test_address(address)
+    print("\n=================================================")
+    print("RECOMMENDATIONS:")
+    print("1. If your API key is invalid, update it in the .env file")
+    print("2. If test addresses work but yours don't, your addresses might not have activity on Arbitrum")
+    print("3. Use one of the working test addresses in your app for testing")
+    print("=================================================")
+if __name__ == "__main__":
+    main()