---
title: NBA Analysis
emoji: 🔥
colorFrom: red
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
---

# 🏀 NBA Data Analysis with CrewAI

An intelligent NBA data analysis application powered by CrewAI multi-agent framework. Upload your NBA CSV data and get comprehensive analysis with insights, statistics, and engaging storylines generated by AI agents.

## ✨ Features

- 🤖 **Multi-Agent AI System**: Three specialized agents (Engineer, Analyst, Storyteller) work together
- 📊 **Data Engineering**: Automatic data cleaning and preparation
- 🔍 **Intelligent Analysis**: AI-powered insights and pattern detection
- 📈 **Statistical Analysis**: Top performers, trends, and key metrics
- 🔎 **Semantic Search**: Natural language queries on your data using vector embeddings
- 📝 **Storytelling**: Engaging headlines and narratives from data
- 🎯 **Parallel Processing**: Tasks run in parallel for faster results
- 🌐 **Web Interface**: Easy-to-use Gradio web app
- 🆓 **Free & Open Source**: Uses free-tier open-source LLM models

## 🏗️ Architecture

The application uses a multi-agent system with the following components:

- **Data Engineer Agent**: Processes and validates data
- **Data Analyst Agent**: Performs statistical analysis and extracts insights
- **Storyteller Agent**: Creates engaging narratives from analysis results

### Tech Stack

- **CrewAI**: Multi-agent AI framework
- **Gradio**: Web interface
- **Pandas**: Data analysis
- **ChromaDB**: Vector database for semantic search
- **Sentence Transformers**: Embeddings for semantic search
- **Hugging Face / Ollama**: Open-source LLM providers

## 📋 Prerequisites

- Python 3.11 or 3.12
- pip or uv package manager
- (Optional) Ollama for local testing

## 🚀 Installation

### 1. Clone the Repository

```bash
git clone <your-repo-url>
cd NBA_Analysis
```

### 2. Install Dependencies

**Using uv (recommended):**
```bash
uv sync
```

**Using pip:**
```bash
pip install -r requirements.txt
```

### 3. Prepare Your Data

Place your NBA CSV file in the project directory, or upload it through the web interface.

## ⚙️ Configuration

### LLM Provider Setup

The application supports multiple LLM providers. Configure via environment variables:

#### Option 1: Hugging Face (Recommended for Deployment)

1. Get a free API token from [Hugging Face](https://huggingface.co/settings/tokens)
2. Set environment variables:
   ```bash
   export LLM_PROVIDER=huggingface
   export HF_API_KEY=your-hf-token
   export HF_MODEL=meta-llama/Llama-3.1-8B-Instruct  # or any HF model
   ```

**Available Models:**
- `meta-llama/Llama-3.1-8B-Instruct` (default, best quality)
- `mistralai/Mistral-7B-Instruct-v0.2` (excellent quality)
- `Qwen/Qwen2.5-7B-Instruct` (multilingual, great quality)
- `meta-llama/Llama-3.2-3B-Instruct` (faster, smaller)

#### Option 2: Ollama (For Local Testing)

1. Install Ollama: https://ollama.ai
2. Start Ollama service:
   ```bash
   ollama serve
   ```
3. Download a model:
   ```bash
   ollama pull mistral  # or llama3.2, qwen2.5:7b, etc.
   ```
4. Set environment variables:
   ```bash
   export LLM_PROVIDER=ollama
   export OLLAMA_MODEL=mistral
   export OLLAMA_BASE_URL=http://localhost:11434/v1
   ```

#### Option 3: OpenRouter (Alternative Free Option)

1. Get a free API key from [OpenRouter](https://openrouter.ai)
2. Set environment variables:
   ```bash
   export LLM_PROVIDER=openrouter
   export OPENROUTER_API_KEY=your-key
   export OPENROUTER_MODEL=google/gemma-2-2b-it:free
   ```

### Default Configuration

The application defaults to **Hugging Face** with **Llama 3.1 8B Instruct** model. No configuration needed if you set `HF_API_KEY`.

## 🎮 Usage

### Web Interface (Recommended)

```bash
python app.py
```

Then open your browser to the URL shown (usually `http://localhost:7860`).

**Features:**
- Upload CSV file
- Enter analysis query (or leave blank for comprehensive analysis)
- Click "Analyze Dataset" for full analysis
- Click "Analyze with Question" for quick queries

### Command Line

```bash
python main.py
```

## 📖 Example Queries

- "Who are the top 5 three-point shooters?"
- "Show me the best scoring games this season"
- "Which players have the highest field goal percentage?"
- "Analyze team performance trends"
- "Find games with triple doubles"
- "What are the most efficient shooters?"

## 🛠️ Project Structure

```
NBA_Analysis/
├── app.py                 # Gradio web interface
├── main.py                # Command-line entry point
├── config.py              # LLM and configuration settings
├── agents.py              # AI agent definitions
├── crew.py                # CrewAI crew orchestration
├── tasks.py               # Task definitions
├── tools.py               # Data access tools for agents
├── vector_db.py           # Vector database for semantic search
├── requirements.txt       # Python dependencies
├── pyproject.toml        # Project configuration
├── test_local.sh          # Script for local testing with Ollama
├── EXECUTION_FLOW.md      # Detailed execution flow documentation
└── README.md              # This file
```

## 🔧 Available Tools

The agents have access to 5 data tools:

1. **read_nba_data**: Read sample rows to understand structure
2. **search_nba_data**: Filter and search CSV data
3. **get_nba_data_summary**: Get comprehensive dataset overview
4. **semantic_search_nba_data**: Natural language semantic search
5. **analyze_nba_data**: Execute pandas operations for advanced analysis

## 🚀 Deployment

### Hugging Face Spaces (Free)

1. **Get API Keys:**
   - Hugging Face token: https://huggingface.co/settings/tokens
   - (Optional) OpenRouter key: https://openrouter.ai

2. **Create Space:**
   - Go to https://huggingface.co/spaces
   - Create new Space with Gradio SDK
   - Push your code

3. **Set Secrets:**
   - Space Settings → Repository secrets
   - Add `HF_API_KEY` = your Hugging Face token
   - (Optional) Add `LLM_PROVIDER` = `huggingface`
   - (Optional) Add `HF_MODEL` = your preferred model

4. **Deploy:**
   ```bash
   git remote add hf https://huggingface.co/spaces/yourusername/nba-analysis
   git push hf main
   ```

See `EXECUTION_FLOW.md` for detailed deployment instructions.

## 🧪 Local Testing

### Quick Test with Ollama

```bash
# Make sure Ollama is running
ollama serve

# Run test script
./test_local.sh
```

Or manually:
```bash
export LLM_PROVIDER=ollama
export OLLAMA_MODEL=mistral
export OLLAMA_BASE_URL=http://localhost:11434/v1
python app.py
```

## 📊 How It Works

1. **User Input**: Upload CSV + enter query
2. **Crew Creation**: Three agents are initialized with their roles
3. **Parallel Execution**: 
   - Engineer validates data
   - Analyst performs analysis (runs in parallel)
   - Storyteller creates narrative (waits for Analyst)
4. **Tool Execution**: Agents use tools to access and analyze data
5. **LLM Processing**: AI generates insights and responses
6. **Result Aggregation**: All outputs are combined and formatted
7. **Display**: Results shown to user

See `EXECUTION_FLOW.md` for detailed flow documentation.

## 🎯 Key Features Explained

### Semantic Search
Uses vector embeddings to find semantically similar records. First run indexes the CSV, subsequent runs use cached embeddings.

### Parallel Processing
Engineer and Analyst tasks run simultaneously for faster results. Storyteller waits for Analyst to complete.

### Multi-Agent Collaboration
Each agent has a specialized role:
- **Engineer**: Data quality and structure
- **Analyst**: Statistical analysis and insights
- **Storyteller**: Narrative and presentation

## 🔒 Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `LLM_PROVIDER` | LLM provider (`huggingface`, `ollama`, `openrouter`) | `huggingface` |
| `HF_API_KEY` | Hugging Face API token | Required if using HF |
| `HF_MODEL` | Hugging Face model name | `meta-llama/Llama-3.1-8B-Instruct` |
| `OLLAMA_MODEL` | Ollama model name | `mistral` |
| `OLLAMA_BASE_URL` | Ollama server URL | `http://localhost:11434/v1` |
| `OPENROUTER_API_KEY` | OpenRouter API key | Required if using OpenRouter |
| `OPENROUTER_MODEL` | OpenRouter model name | `google/gemma-2-2b-it:free` |

## 🐛 Troubleshooting

### "ModuleNotFoundError: No module named 'crewai'"
- Install dependencies: `pip install -r requirements.txt` or `uv sync`

### "HF_API_KEY not set"
- Set your Hugging Face token as environment variable or in Space secrets

### "Connection refused" (Ollama)
- Make sure `ollama serve` is running
- Check port 11434 is available

### "Model not found" (Ollama)
- Download the model: `ollama pull mistral`
- List models: `ollama list`

### Slow responses
- Use smaller models (Llama 3.2 3B instead of 8B)
- Check your internet connection for API calls
- For local: Use faster models like `llama3.2`

## 📝 License

This project is open source. Check individual dependencies for their licenses.

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## 📚 Documentation

- **Execution Flow**: See `EXECUTION_FLOW.md` for detailed flow
- **CrewAI Docs**: https://docs.crewai.com
- **Gradio Docs**: https://gradio.app/docs

## 🎓 What Was Built

This project demonstrates:
- Multi-agent AI systems with CrewAI
- Parallel task execution
- Semantic search with vector databases
- Integration with multiple LLM providers
- Web interface with Gradio
- Free-tier deployment on Hugging Face Spaces

## 💡 Tips

- **First Run**: Vector DB indexing takes time on first use
- **Large Files**: Use semantic search for large datasets
- **Complex Queries**: Use "Analyze with Question" for specific queries
- **Model Selection**: Larger models = better quality, slower speed
- **Local Testing**: Use Ollama for faster iteration

## 🔗 Links

- **Hugging Face**: https://huggingface.co
- **Ollama**: https://ollama.ai
- **OpenRouter**: https://openrouter.ai
- **CrewAI**: https://docs.crewai.com

---

**Built with ❤️ using CrewAI and open-source LLMs**