--- title: NBA Analysis emoji: ๐Ÿ”ฅ colorFrom: red colorTo: indigo sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false --- # ๐Ÿ€ NBA Data Analysis with CrewAI An intelligent NBA data analysis application powered by CrewAI multi-agent framework. Upload your NBA CSV data and get comprehensive analysis with insights, statistics, and engaging storylines generated by AI agents. ## โœจ Features - ๐Ÿค– **Multi-Agent AI System**: Three specialized agents (Engineer, Analyst, Storyteller) work together - ๐Ÿ“Š **Data Engineering**: Automatic data cleaning and preparation - ๐Ÿ” **Intelligent Analysis**: AI-powered insights and pattern detection - ๐Ÿ“ˆ **Statistical Analysis**: Top performers, trends, and key metrics - ๐Ÿ”Ž **Semantic Search**: Natural language queries on your data using vector embeddings - ๐Ÿ“ **Storytelling**: Engaging headlines and narratives from data - ๐ŸŽฏ **Parallel Processing**: Tasks run in parallel for faster results - ๐ŸŒ **Web Interface**: Easy-to-use Gradio web app - ๐Ÿ†“ **Free & Open Source**: Uses free-tier open-source LLM models ## ๐Ÿ—๏ธ Architecture The application uses a multi-agent system with the following components: - **Data Engineer Agent**: Processes and validates data - **Data Analyst Agent**: Performs statistical analysis and extracts insights - **Storyteller Agent**: Creates engaging narratives from analysis results ### Tech Stack - **CrewAI**: Multi-agent AI framework - **Gradio**: Web interface - **Pandas**: Data analysis - **ChromaDB**: Vector database for semantic search - **Sentence Transformers**: Embeddings for semantic search - **Hugging Face / Ollama**: Open-source LLM providers ## ๐Ÿ“‹ Prerequisites - Python 3.11 or 3.12 - pip or uv package manager - (Optional) Ollama for local testing ## ๐Ÿš€ Installation ### 1. Clone the Repository ```bash git clone cd NBA_Analysis ``` ### 2. Install Dependencies **Using uv (recommended):** ```bash uv sync ``` **Using pip:** ```bash pip install -r requirements.txt ``` ### 3. Prepare Your Data Place your NBA CSV file in the project directory, or upload it through the web interface. ## โš™๏ธ Configuration ### LLM Provider Setup The application supports multiple LLM providers. Configure via environment variables: #### Option 1: Hugging Face (Recommended for Deployment) 1. Get a free API token from [Hugging Face](https://huggingface.co/settings/tokens) 2. Set environment variables: ```bash export LLM_PROVIDER=huggingface export HF_API_KEY=your-hf-token export HF_MODEL=meta-llama/Llama-3.1-8B-Instruct # or any HF model ``` **Available Models:** - `meta-llama/Llama-3.1-8B-Instruct` (default, best quality) - `mistralai/Mistral-7B-Instruct-v0.2` (excellent quality) - `Qwen/Qwen2.5-7B-Instruct` (multilingual, great quality) - `meta-llama/Llama-3.2-3B-Instruct` (faster, smaller) #### Option 2: Ollama (For Local Testing) 1. Install Ollama: https://ollama.ai 2. Start Ollama service: ```bash ollama serve ``` 3. Download a model: ```bash ollama pull mistral # or llama3.2, qwen2.5:7b, etc. ``` 4. Set environment variables: ```bash export LLM_PROVIDER=ollama export OLLAMA_MODEL=mistral export OLLAMA_BASE_URL=http://localhost:11434/v1 ``` #### Option 3: OpenRouter (Alternative Free Option) 1. Get a free API key from [OpenRouter](https://openrouter.ai) 2. Set environment variables: ```bash export LLM_PROVIDER=openrouter export OPENROUTER_API_KEY=your-key export OPENROUTER_MODEL=google/gemma-2-2b-it:free ``` ### Default Configuration The application defaults to **Hugging Face** with **Llama 3.1 8B Instruct** model. No configuration needed if you set `HF_API_KEY`. ## ๐ŸŽฎ Usage ### Web Interface (Recommended) ```bash python app.py ``` Then open your browser to the URL shown (usually `http://localhost:7860`). **Features:** - Upload CSV file - Enter analysis query (or leave blank for comprehensive analysis) - Click "Analyze Dataset" for full analysis - Click "Analyze with Question" for quick queries ### Command Line ```bash python main.py ``` ## ๐Ÿ“– Example Queries - "Who are the top 5 three-point shooters?" - "Show me the best scoring games this season" - "Which players have the highest field goal percentage?" - "Analyze team performance trends" - "Find games with triple doubles" - "What are the most efficient shooters?" ## ๐Ÿ› ๏ธ Project Structure ``` NBA_Analysis/ โ”œโ”€โ”€ app.py # Gradio web interface โ”œโ”€โ”€ main.py # Command-line entry point โ”œโ”€โ”€ config.py # LLM and configuration settings โ”œโ”€โ”€ agents.py # AI agent definitions โ”œโ”€โ”€ crew.py # CrewAI crew orchestration โ”œโ”€โ”€ tasks.py # Task definitions โ”œโ”€โ”€ tools.py # Data access tools for agents โ”œโ”€โ”€ vector_db.py # Vector database for semantic search โ”œโ”€โ”€ requirements.txt # Python dependencies โ”œโ”€โ”€ pyproject.toml # Project configuration โ”œโ”€โ”€ test_local.sh # Script for local testing with Ollama โ”œโ”€โ”€ EXECUTION_FLOW.md # Detailed execution flow documentation โ””โ”€โ”€ README.md # This file ``` ## ๐Ÿ”ง Available Tools The agents have access to 5 data tools: 1. **read_nba_data**: Read sample rows to understand structure 2. **search_nba_data**: Filter and search CSV data 3. **get_nba_data_summary**: Get comprehensive dataset overview 4. **semantic_search_nba_data**: Natural language semantic search 5. **analyze_nba_data**: Execute pandas operations for advanced analysis ## ๐Ÿš€ Deployment ### Hugging Face Spaces (Free) 1. **Get API Keys:** - Hugging Face token: https://huggingface.co/settings/tokens - (Optional) OpenRouter key: https://openrouter.ai 2. **Create Space:** - Go to https://huggingface.co/spaces - Create new Space with Gradio SDK - Push your code 3. **Set Secrets:** - Space Settings โ†’ Repository secrets - Add `HF_API_KEY` = your Hugging Face token - (Optional) Add `LLM_PROVIDER` = `huggingface` - (Optional) Add `HF_MODEL` = your preferred model 4. **Deploy:** ```bash git remote add hf https://huggingface.co/spaces/yourusername/nba-analysis git push hf main ``` See `EXECUTION_FLOW.md` for detailed deployment instructions. ## ๐Ÿงช Local Testing ### Quick Test with Ollama ```bash # Make sure Ollama is running ollama serve # Run test script ./test_local.sh ``` Or manually: ```bash export LLM_PROVIDER=ollama export OLLAMA_MODEL=mistral export OLLAMA_BASE_URL=http://localhost:11434/v1 python app.py ``` ## ๐Ÿ“Š How It Works 1. **User Input**: Upload CSV + enter query 2. **Crew Creation**: Three agents are initialized with their roles 3. **Parallel Execution**: - Engineer validates data - Analyst performs analysis (runs in parallel) - Storyteller creates narrative (waits for Analyst) 4. **Tool Execution**: Agents use tools to access and analyze data 5. **LLM Processing**: AI generates insights and responses 6. **Result Aggregation**: All outputs are combined and formatted 7. **Display**: Results shown to user See `EXECUTION_FLOW.md` for detailed flow documentation. ## ๐ŸŽฏ Key Features Explained ### Semantic Search Uses vector embeddings to find semantically similar records. First run indexes the CSV, subsequent runs use cached embeddings. ### Parallel Processing Engineer and Analyst tasks run simultaneously for faster results. Storyteller waits for Analyst to complete. ### Multi-Agent Collaboration Each agent has a specialized role: - **Engineer**: Data quality and structure - **Analyst**: Statistical analysis and insights - **Storyteller**: Narrative and presentation ## ๐Ÿ”’ Environment Variables | Variable | Description | Default | |----------|-------------|---------| | `LLM_PROVIDER` | LLM provider (`huggingface`, `ollama`, `openrouter`) | `huggingface` | | `HF_API_KEY` | Hugging Face API token | Required if using HF | | `HF_MODEL` | Hugging Face model name | `meta-llama/Llama-3.1-8B-Instruct` | | `OLLAMA_MODEL` | Ollama model name | `mistral` | | `OLLAMA_BASE_URL` | Ollama server URL | `http://localhost:11434/v1` | | `OPENROUTER_API_KEY` | OpenRouter API key | Required if using OpenRouter | | `OPENROUTER_MODEL` | OpenRouter model name | `google/gemma-2-2b-it:free` | ## ๐Ÿ› Troubleshooting ### "ModuleNotFoundError: No module named 'crewai'" - Install dependencies: `pip install -r requirements.txt` or `uv sync` ### "HF_API_KEY not set" - Set your Hugging Face token as environment variable or in Space secrets ### "Connection refused" (Ollama) - Make sure `ollama serve` is running - Check port 11434 is available ### "Model not found" (Ollama) - Download the model: `ollama pull mistral` - List models: `ollama list` ### Slow responses - Use smaller models (Llama 3.2 3B instead of 8B) - Check your internet connection for API calls - For local: Use faster models like `llama3.2` ## ๐Ÿ“ License This project is open source. Check individual dependencies for their licenses. ## ๐Ÿค Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## ๐Ÿ“š Documentation - **Execution Flow**: See `EXECUTION_FLOW.md` for detailed flow - **CrewAI Docs**: https://docs.crewai.com - **Gradio Docs**: https://gradio.app/docs ## ๐ŸŽ“ What Was Built This project demonstrates: - Multi-agent AI systems with CrewAI - Parallel task execution - Semantic search with vector databases - Integration with multiple LLM providers - Web interface with Gradio - Free-tier deployment on Hugging Face Spaces ## ๐Ÿ’ก Tips - **First Run**: Vector DB indexing takes time on first use - **Large Files**: Use semantic search for large datasets - **Complex Queries**: Use "Analyze with Question" for specific queries - **Model Selection**: Larger models = better quality, slower speed - **Local Testing**: Use Ollama for faster iteration ## ๐Ÿ”— Links - **Hugging Face**: https://huggingface.co - **Ollama**: https://ollama.ai - **OpenRouter**: https://openrouter.ai - **CrewAI**: https://docs.crewai.com --- **Built with โค๏ธ using CrewAI and open-source LLMs**