NBA_Analysis / README.md
shekkari21's picture
added readme
b34fde9
---
title: NBA Analysis
emoji: 🔥
colorFrom: red
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
---
# 🏀 NBA Data Analysis with CrewAI
An intelligent NBA data analysis application powered by CrewAI multi-agent framework. Upload your NBA CSV data and get comprehensive analysis with insights, statistics, and engaging storylines generated by AI agents.
## ✨ Features
- 🤖 **Multi-Agent AI System**: Three specialized agents (Engineer, Analyst, Storyteller) work together
- 📊 **Data Engineering**: Automatic data cleaning and preparation
- 🔍 **Intelligent Analysis**: AI-powered insights and pattern detection
- 📈 **Statistical Analysis**: Top performers, trends, and key metrics
- 🔎 **Semantic Search**: Natural language queries on your data using vector embeddings
- 📝 **Storytelling**: Engaging headlines and narratives from data
- 🎯 **Parallel Processing**: Tasks run in parallel for faster results
- 🌐 **Web Interface**: Easy-to-use Gradio web app
- 🆓 **Free & Open Source**: Uses free-tier open-source LLM models
## 🏗️ Architecture
The application uses a multi-agent system with the following components:
- **Data Engineer Agent**: Processes and validates data
- **Data Analyst Agent**: Performs statistical analysis and extracts insights
- **Storyteller Agent**: Creates engaging narratives from analysis results
### Tech Stack
- **CrewAI**: Multi-agent AI framework
- **Gradio**: Web interface
- **Pandas**: Data analysis
- **ChromaDB**: Vector database for semantic search
- **Sentence Transformers**: Embeddings for semantic search
- **Hugging Face / Ollama**: Open-source LLM providers
## 📋 Prerequisites
- Python 3.11 or 3.12
- pip or uv package manager
- (Optional) Ollama for local testing
## 🚀 Installation
### 1. Clone the Repository
```bash
git clone <your-repo-url>
cd NBA_Analysis
```
### 2. Install Dependencies
**Using uv (recommended):**
```bash
uv sync
```
**Using pip:**
```bash
pip install -r requirements.txt
```
### 3. Prepare Your Data
Place your NBA CSV file in the project directory, or upload it through the web interface.
## ⚙️ Configuration
### LLM Provider Setup
The application supports multiple LLM providers. Configure via environment variables:
#### Option 1: Hugging Face (Recommended for Deployment)
1. Get a free API token from [Hugging Face](https://huggingface.co/settings/tokens)
2. Set environment variables:
```bash
export LLM_PROVIDER=huggingface
export HF_API_KEY=your-hf-token
export HF_MODEL=meta-llama/Llama-3.1-8B-Instruct # or any HF model
```
**Available Models:**
- `meta-llama/Llama-3.1-8B-Instruct` (default, best quality)
- `mistralai/Mistral-7B-Instruct-v0.2` (excellent quality)
- `Qwen/Qwen2.5-7B-Instruct` (multilingual, great quality)
- `meta-llama/Llama-3.2-3B-Instruct` (faster, smaller)
#### Option 2: Ollama (For Local Testing)
1. Install Ollama: https://ollama.ai
2. Start Ollama service:
```bash
ollama serve
```
3. Download a model:
```bash
ollama pull mistral # or llama3.2, qwen2.5:7b, etc.
```
4. Set environment variables:
```bash
export LLM_PROVIDER=ollama
export OLLAMA_MODEL=mistral
export OLLAMA_BASE_URL=http://localhost:11434/v1
```
#### Option 3: OpenRouter (Alternative Free Option)
1. Get a free API key from [OpenRouter](https://openrouter.ai)
2. Set environment variables:
```bash
export LLM_PROVIDER=openrouter
export OPENROUTER_API_KEY=your-key
export OPENROUTER_MODEL=google/gemma-2-2b-it:free
```
### Default Configuration
The application defaults to **Hugging Face** with **Llama 3.1 8B Instruct** model. No configuration needed if you set `HF_API_KEY`.
## 🎮 Usage
### Web Interface (Recommended)
```bash
python app.py
```
Then open your browser to the URL shown (usually `http://localhost:7860`).
**Features:**
- Upload CSV file
- Enter analysis query (or leave blank for comprehensive analysis)
- Click "Analyze Dataset" for full analysis
- Click "Analyze with Question" for quick queries
### Command Line
```bash
python main.py
```
## 📖 Example Queries
- "Who are the top 5 three-point shooters?"
- "Show me the best scoring games this season"
- "Which players have the highest field goal percentage?"
- "Analyze team performance trends"
- "Find games with triple doubles"
- "What are the most efficient shooters?"
## 🛠️ Project Structure
```
NBA_Analysis/
├── app.py # Gradio web interface
├── main.py # Command-line entry point
├── config.py # LLM and configuration settings
├── agents.py # AI agent definitions
├── crew.py # CrewAI crew orchestration
├── tasks.py # Task definitions
├── tools.py # Data access tools for agents
├── vector_db.py # Vector database for semantic search
├── requirements.txt # Python dependencies
├── pyproject.toml # Project configuration
├── test_local.sh # Script for local testing with Ollama
├── EXECUTION_FLOW.md # Detailed execution flow documentation
└── README.md # This file
```
## 🔧 Available Tools
The agents have access to 5 data tools:
1. **read_nba_data**: Read sample rows to understand structure
2. **search_nba_data**: Filter and search CSV data
3. **get_nba_data_summary**: Get comprehensive dataset overview
4. **semantic_search_nba_data**: Natural language semantic search
5. **analyze_nba_data**: Execute pandas operations for advanced analysis
## 🚀 Deployment
### Hugging Face Spaces (Free)
1. **Get API Keys:**
- Hugging Face token: https://huggingface.co/settings/tokens
- (Optional) OpenRouter key: https://openrouter.ai
2. **Create Space:**
- Go to https://huggingface.co/spaces
- Create new Space with Gradio SDK
- Push your code
3. **Set Secrets:**
- Space Settings → Repository secrets
- Add `HF_API_KEY` = your Hugging Face token
- (Optional) Add `LLM_PROVIDER` = `huggingface`
- (Optional) Add `HF_MODEL` = your preferred model
4. **Deploy:**
```bash
git remote add hf https://huggingface.co/spaces/yourusername/nba-analysis
git push hf main
```
See `EXECUTION_FLOW.md` for detailed deployment instructions.
## 🧪 Local Testing
### Quick Test with Ollama
```bash
# Make sure Ollama is running
ollama serve
# Run test script
./test_local.sh
```
Or manually:
```bash
export LLM_PROVIDER=ollama
export OLLAMA_MODEL=mistral
export OLLAMA_BASE_URL=http://localhost:11434/v1
python app.py
```
## 📊 How It Works
1. **User Input**: Upload CSV + enter query
2. **Crew Creation**: Three agents are initialized with their roles
3. **Parallel Execution**:
- Engineer validates data
- Analyst performs analysis (runs in parallel)
- Storyteller creates narrative (waits for Analyst)
4. **Tool Execution**: Agents use tools to access and analyze data
5. **LLM Processing**: AI generates insights and responses
6. **Result Aggregation**: All outputs are combined and formatted
7. **Display**: Results shown to user
See `EXECUTION_FLOW.md` for detailed flow documentation.
## 🎯 Key Features Explained
### Semantic Search
Uses vector embeddings to find semantically similar records. First run indexes the CSV, subsequent runs use cached embeddings.
### Parallel Processing
Engineer and Analyst tasks run simultaneously for faster results. Storyteller waits for Analyst to complete.
### Multi-Agent Collaboration
Each agent has a specialized role:
- **Engineer**: Data quality and structure
- **Analyst**: Statistical analysis and insights
- **Storyteller**: Narrative and presentation
## 🔒 Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `LLM_PROVIDER` | LLM provider (`huggingface`, `ollama`, `openrouter`) | `huggingface` |
| `HF_API_KEY` | Hugging Face API token | Required if using HF |
| `HF_MODEL` | Hugging Face model name | `meta-llama/Llama-3.1-8B-Instruct` |
| `OLLAMA_MODEL` | Ollama model name | `mistral` |
| `OLLAMA_BASE_URL` | Ollama server URL | `http://localhost:11434/v1` |
| `OPENROUTER_API_KEY` | OpenRouter API key | Required if using OpenRouter |
| `OPENROUTER_MODEL` | OpenRouter model name | `google/gemma-2-2b-it:free` |
## 🐛 Troubleshooting
### "ModuleNotFoundError: No module named 'crewai'"
- Install dependencies: `pip install -r requirements.txt` or `uv sync`
### "HF_API_KEY not set"
- Set your Hugging Face token as environment variable or in Space secrets
### "Connection refused" (Ollama)
- Make sure `ollama serve` is running
- Check port 11434 is available
### "Model not found" (Ollama)
- Download the model: `ollama pull mistral`
- List models: `ollama list`
### Slow responses
- Use smaller models (Llama 3.2 3B instead of 8B)
- Check your internet connection for API calls
- For local: Use faster models like `llama3.2`
## 📝 License
This project is open source. Check individual dependencies for their licenses.
## 🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## 📚 Documentation
- **Execution Flow**: See `EXECUTION_FLOW.md` for detailed flow
- **CrewAI Docs**: https://docs.crewai.com
- **Gradio Docs**: https://gradio.app/docs
## 🎓 What Was Built
This project demonstrates:
- Multi-agent AI systems with CrewAI
- Parallel task execution
- Semantic search with vector databases
- Integration with multiple LLM providers
- Web interface with Gradio
- Free-tier deployment on Hugging Face Spaces
## 💡 Tips
- **First Run**: Vector DB indexing takes time on first use
- **Large Files**: Use semantic search for large datasets
- **Complex Queries**: Use "Analyze with Question" for specific queries
- **Model Selection**: Larger models = better quality, slower speed
- **Local Testing**: Use Ollama for faster iteration
## 🔗 Links
- **Hugging Face**: https://huggingface.co
- **Ollama**: https://ollama.ai
- **OpenRouter**: https://openrouter.ai
- **CrewAI**: https://docs.crewai.com
---
**Built with ❤️ using CrewAI and open-source LLMs**