File size: 10,164 Bytes
2796eed
 
 
 
 
 
 
 
 
 
 
b34fde9
8d5b1f0
b34fde9
8d5b1f0
b34fde9
8d5b1f0
b34fde9
8d5b1f0
 
 
b34fde9
8d5b1f0
b34fde9
 
 
8d5b1f0
b34fde9
8d5b1f0
b34fde9
8d5b1f0
b34fde9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d5b1f0
 
 
 
 
b34fde9
 
8d5b1f0
b34fde9
8d5b1f0
b34fde9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d5b1f0
b34fde9
8d5b1f0
b34fde9
 
 
 
8d5b1f0
 
 
b34fde9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
---
title: NBA Analysis
emoji: ๐Ÿ”ฅ
colorFrom: red
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
---

# ๐Ÿ€ NBA Data Analysis with CrewAI

An intelligent NBA data analysis application powered by CrewAI multi-agent framework. Upload your NBA CSV data and get comprehensive analysis with insights, statistics, and engaging storylines generated by AI agents.

## โœจ Features

- ๐Ÿค– **Multi-Agent AI System**: Three specialized agents (Engineer, Analyst, Storyteller) work together
- ๐Ÿ“Š **Data Engineering**: Automatic data cleaning and preparation
- ๐Ÿ” **Intelligent Analysis**: AI-powered insights and pattern detection
- ๐Ÿ“ˆ **Statistical Analysis**: Top performers, trends, and key metrics
- ๐Ÿ”Ž **Semantic Search**: Natural language queries on your data using vector embeddings
- ๐Ÿ“ **Storytelling**: Engaging headlines and narratives from data
- ๐ŸŽฏ **Parallel Processing**: Tasks run in parallel for faster results
- ๐ŸŒ **Web Interface**: Easy-to-use Gradio web app
- ๐Ÿ†“ **Free & Open Source**: Uses free-tier open-source LLM models

## ๐Ÿ—๏ธ Architecture

The application uses a multi-agent system with the following components:

- **Data Engineer Agent**: Processes and validates data
- **Data Analyst Agent**: Performs statistical analysis and extracts insights
- **Storyteller Agent**: Creates engaging narratives from analysis results

### Tech Stack

- **CrewAI**: Multi-agent AI framework
- **Gradio**: Web interface
- **Pandas**: Data analysis
- **ChromaDB**: Vector database for semantic search
- **Sentence Transformers**: Embeddings for semantic search
- **Hugging Face / Ollama**: Open-source LLM providers

## ๐Ÿ“‹ Prerequisites

- Python 3.11 or 3.12
- pip or uv package manager
- (Optional) Ollama for local testing

## ๐Ÿš€ Installation

### 1. Clone the Repository

```bash
git clone <your-repo-url>
cd NBA_Analysis
```

### 2. Install Dependencies

**Using uv (recommended):**
```bash
uv sync
```

**Using pip:**
```bash
pip install -r requirements.txt
```

### 3. Prepare Your Data

Place your NBA CSV file in the project directory, or upload it through the web interface.

## โš™๏ธ Configuration

### LLM Provider Setup

The application supports multiple LLM providers. Configure via environment variables:

#### Option 1: Hugging Face (Recommended for Deployment)

1. Get a free API token from [Hugging Face](https://huggingface.co/settings/tokens)
2. Set environment variables:
   ```bash
   export LLM_PROVIDER=huggingface
   export HF_API_KEY=your-hf-token
   export HF_MODEL=meta-llama/Llama-3.1-8B-Instruct  # or any HF model
   ```

**Available Models:**
- `meta-llama/Llama-3.1-8B-Instruct` (default, best quality)
- `mistralai/Mistral-7B-Instruct-v0.2` (excellent quality)
- `Qwen/Qwen2.5-7B-Instruct` (multilingual, great quality)
- `meta-llama/Llama-3.2-3B-Instruct` (faster, smaller)

#### Option 2: Ollama (For Local Testing)

1. Install Ollama: https://ollama.ai
2. Start Ollama service:
   ```bash
   ollama serve
   ```
3. Download a model:
   ```bash
   ollama pull mistral  # or llama3.2, qwen2.5:7b, etc.
   ```
4. Set environment variables:
   ```bash
   export LLM_PROVIDER=ollama
   export OLLAMA_MODEL=mistral
   export OLLAMA_BASE_URL=http://localhost:11434/v1
   ```

#### Option 3: OpenRouter (Alternative Free Option)

1. Get a free API key from [OpenRouter](https://openrouter.ai)
2. Set environment variables:
   ```bash
   export LLM_PROVIDER=openrouter
   export OPENROUTER_API_KEY=your-key
   export OPENROUTER_MODEL=google/gemma-2-2b-it:free
   ```

### Default Configuration

The application defaults to **Hugging Face** with **Llama 3.1 8B Instruct** model. No configuration needed if you set `HF_API_KEY`.

## ๐ŸŽฎ Usage

### Web Interface (Recommended)

```bash
python app.py
```

Then open your browser to the URL shown (usually `http://localhost:7860`).

**Features:**
- Upload CSV file
- Enter analysis query (or leave blank for comprehensive analysis)
- Click "Analyze Dataset" for full analysis
- Click "Analyze with Question" for quick queries

### Command Line

```bash
python main.py
```

## ๐Ÿ“– Example Queries

- "Who are the top 5 three-point shooters?"
- "Show me the best scoring games this season"
- "Which players have the highest field goal percentage?"
- "Analyze team performance trends"
- "Find games with triple doubles"
- "What are the most efficient shooters?"

## ๐Ÿ› ๏ธ Project Structure

```
NBA_Analysis/
โ”œโ”€โ”€ app.py                 # Gradio web interface
โ”œโ”€โ”€ main.py                # Command-line entry point
โ”œโ”€โ”€ config.py              # LLM and configuration settings
โ”œโ”€โ”€ agents.py              # AI agent definitions
โ”œโ”€โ”€ crew.py                # CrewAI crew orchestration
โ”œโ”€โ”€ tasks.py               # Task definitions
โ”œโ”€โ”€ tools.py               # Data access tools for agents
โ”œโ”€โ”€ vector_db.py           # Vector database for semantic search
โ”œโ”€โ”€ requirements.txt       # Python dependencies
โ”œโ”€โ”€ pyproject.toml        # Project configuration
โ”œโ”€โ”€ test_local.sh          # Script for local testing with Ollama
โ”œโ”€โ”€ EXECUTION_FLOW.md      # Detailed execution flow documentation
โ””โ”€โ”€ README.md              # This file
```

## ๐Ÿ”ง Available Tools

The agents have access to 5 data tools:

1. **read_nba_data**: Read sample rows to understand structure
2. **search_nba_data**: Filter and search CSV data
3. **get_nba_data_summary**: Get comprehensive dataset overview
4. **semantic_search_nba_data**: Natural language semantic search
5. **analyze_nba_data**: Execute pandas operations for advanced analysis

## ๐Ÿš€ Deployment

### Hugging Face Spaces (Free)

1. **Get API Keys:**
   - Hugging Face token: https://huggingface.co/settings/tokens
   - (Optional) OpenRouter key: https://openrouter.ai

2. **Create Space:**
   - Go to https://huggingface.co/spaces
   - Create new Space with Gradio SDK
   - Push your code

3. **Set Secrets:**
   - Space Settings โ†’ Repository secrets
   - Add `HF_API_KEY` = your Hugging Face token
   - (Optional) Add `LLM_PROVIDER` = `huggingface`
   - (Optional) Add `HF_MODEL` = your preferred model

4. **Deploy:**
   ```bash
   git remote add hf https://huggingface.co/spaces/yourusername/nba-analysis
   git push hf main
   ```

See `EXECUTION_FLOW.md` for detailed deployment instructions.

## ๐Ÿงช Local Testing

### Quick Test with Ollama

```bash
# Make sure Ollama is running
ollama serve

# Run test script
./test_local.sh
```

Or manually:
```bash
export LLM_PROVIDER=ollama
export OLLAMA_MODEL=mistral
export OLLAMA_BASE_URL=http://localhost:11434/v1
python app.py
```

## ๐Ÿ“Š How It Works

1. **User Input**: Upload CSV + enter query
2. **Crew Creation**: Three agents are initialized with their roles
3. **Parallel Execution**: 
   - Engineer validates data
   - Analyst performs analysis (runs in parallel)
   - Storyteller creates narrative (waits for Analyst)
4. **Tool Execution**: Agents use tools to access and analyze data
5. **LLM Processing**: AI generates insights and responses
6. **Result Aggregation**: All outputs are combined and formatted
7. **Display**: Results shown to user

See `EXECUTION_FLOW.md` for detailed flow documentation.

## ๐ŸŽฏ Key Features Explained

### Semantic Search
Uses vector embeddings to find semantically similar records. First run indexes the CSV, subsequent runs use cached embeddings.

### Parallel Processing
Engineer and Analyst tasks run simultaneously for faster results. Storyteller waits for Analyst to complete.

### Multi-Agent Collaboration
Each agent has a specialized role:
- **Engineer**: Data quality and structure
- **Analyst**: Statistical analysis and insights
- **Storyteller**: Narrative and presentation

## ๐Ÿ”’ Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `LLM_PROVIDER` | LLM provider (`huggingface`, `ollama`, `openrouter`) | `huggingface` |
| `HF_API_KEY` | Hugging Face API token | Required if using HF |
| `HF_MODEL` | Hugging Face model name | `meta-llama/Llama-3.1-8B-Instruct` |
| `OLLAMA_MODEL` | Ollama model name | `mistral` |
| `OLLAMA_BASE_URL` | Ollama server URL | `http://localhost:11434/v1` |
| `OPENROUTER_API_KEY` | OpenRouter API key | Required if using OpenRouter |
| `OPENROUTER_MODEL` | OpenRouter model name | `google/gemma-2-2b-it:free` |

## ๐Ÿ› Troubleshooting

### "ModuleNotFoundError: No module named 'crewai'"
- Install dependencies: `pip install -r requirements.txt` or `uv sync`

### "HF_API_KEY not set"
- Set your Hugging Face token as environment variable or in Space secrets

### "Connection refused" (Ollama)
- Make sure `ollama serve` is running
- Check port 11434 is available

### "Model not found" (Ollama)
- Download the model: `ollama pull mistral`
- List models: `ollama list`

### Slow responses
- Use smaller models (Llama 3.2 3B instead of 8B)
- Check your internet connection for API calls
- For local: Use faster models like `llama3.2`

## ๐Ÿ“ License

This project is open source. Check individual dependencies for their licenses.

## ๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## ๐Ÿ“š Documentation

- **Execution Flow**: See `EXECUTION_FLOW.md` for detailed flow
- **CrewAI Docs**: https://docs.crewai.com
- **Gradio Docs**: https://gradio.app/docs

## ๐ŸŽ“ What Was Built

This project demonstrates:
- Multi-agent AI systems with CrewAI
- Parallel task execution
- Semantic search with vector databases
- Integration with multiple LLM providers
- Web interface with Gradio
- Free-tier deployment on Hugging Face Spaces

## ๐Ÿ’ก Tips

- **First Run**: Vector DB indexing takes time on first use
- **Large Files**: Use semantic search for large datasets
- **Complex Queries**: Use "Analyze with Question" for specific queries
- **Model Selection**: Larger models = better quality, slower speed
- **Local Testing**: Use Ollama for faster iteration

## ๐Ÿ”— Links

- **Hugging Face**: https://huggingface.co
- **Ollama**: https://ollama.ai
- **OpenRouter**: https://openrouter.ai
- **CrewAI**: https://docs.crewai.com

---

**Built with โค๏ธ using CrewAI and open-source LLMs**