Spaces:

shekkari21
/

NBA_Analysis

Sleeping

App Files Files Community

NBA_Analysis / README.md

shekkari21

added readme

b34fde9 about 2 months ago

preview code

raw

history blame contribute delete

10.2 kB

	---
	title: NBA Analysis
	emoji: 🔥
	colorFrom: red
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.49.1
	app_file: app.py
	pinned: false
	---

	# 🏀 NBA Data Analysis with CrewAI

	An intelligent NBA data analysis application powered by CrewAI multi-agent framework. Upload your NBA CSV data and get comprehensive analysis with insights, statistics, and engaging storylines generated by AI agents.

	## ✨ Features

	- 🤖 Multi-Agent AI System: Three specialized agents (Engineer, Analyst, Storyteller) work together
	- 📊 Data Engineering: Automatic data cleaning and preparation
	- 🔍 Intelligent Analysis: AI-powered insights and pattern detection
	- 📈 Statistical Analysis: Top performers, trends, and key metrics
	- 🔎 Semantic Search: Natural language queries on your data using vector embeddings
	- 📝 Storytelling: Engaging headlines and narratives from data
	- 🎯 Parallel Processing: Tasks run in parallel for faster results
	- 🌐 Web Interface: Easy-to-use Gradio web app
	- 🆓 Free & Open Source: Uses free-tier open-source LLM models

	## 🏗️ Architecture

	The application uses a multi-agent system with the following components:

	- Data Engineer Agent: Processes and validates data
	- Data Analyst Agent: Performs statistical analysis and extracts insights
	- Storyteller Agent: Creates engaging narratives from analysis results

	### Tech Stack

	- CrewAI: Multi-agent AI framework
	- Gradio: Web interface
	- Pandas: Data analysis
	- ChromaDB: Vector database for semantic search
	- Sentence Transformers: Embeddings for semantic search
	- Hugging Face / Ollama: Open-source LLM providers

	## 📋 Prerequisites

	- Python 3.11 or 3.12
	- pip or uv package manager
	- (Optional) Ollama for local testing

	## 🚀 Installation

	### 1. Clone the Repository

	```bash
	git clone <your-repo-url>
	cd NBA_Analysis
	```

	### 2. Install Dependencies

	Using uv (recommended):
	```bash
	uv sync
	```

	Using pip:
	```bash
	pip install -r requirements.txt
	```

	### 3. Prepare Your Data

	Place your NBA CSV file in the project directory, or upload it through the web interface.

	## ⚙️ Configuration

	### LLM Provider Setup

	The application supports multiple LLM providers. Configure via environment variables:

	#### Option 1: Hugging Face (Recommended for Deployment)

	1. Get a free API token from [Hugging Face](https://huggingface.co/settings/tokens)
	2. Set environment variables:
	```bash
	export LLM_PROVIDER=huggingface
	export HF_API_KEY=your-hf-token
	export HF_MODEL=meta-llama/Llama-3.1-8B-Instruct # or any HF model
	```

	Available Models:
	- `meta-llama/Llama-3.1-8B-Instruct` (default, best quality)
	- `mistralai/Mistral-7B-Instruct-v0.2` (excellent quality)
	- `Qwen/Qwen2.5-7B-Instruct` (multilingual, great quality)
	- `meta-llama/Llama-3.2-3B-Instruct` (faster, smaller)

	#### Option 2: Ollama (For Local Testing)

	1. Install Ollama: https://ollama.ai
	2. Start Ollama service:
	```bash
	ollama serve
	```
	3. Download a model:
	```bash
	ollama pull mistral # or llama3.2, qwen2.5:7b, etc.
	```
	4. Set environment variables:
	```bash
	export LLM_PROVIDER=ollama
	export OLLAMA_MODEL=mistral
	export OLLAMA_BASE_URL=http://localhost:11434/v1
	```

	#### Option 3: OpenRouter (Alternative Free Option)

	1. Get a free API key from [OpenRouter](https://openrouter.ai)
	2. Set environment variables:
	```bash
	export LLM_PROVIDER=openrouter
	export OPENROUTER_API_KEY=your-key
	export OPENROUTER_MODEL=google/gemma-2-2b-it:free
	```

	### Default Configuration

	The application defaults to Hugging Face with Llama 3.1 8B Instruct model. No configuration needed if you set `HF_API_KEY`.

	## 🎮 Usage

	### Web Interface (Recommended)

	```bash
	python app.py
	```

	Then open your browser to the URL shown (usually `http://localhost:7860`).

	Features:
	- Upload CSV file
	- Enter analysis query (or leave blank for comprehensive analysis)
	- Click "Analyze Dataset" for full analysis
	- Click "Analyze with Question" for quick queries

	### Command Line

	```bash
	python main.py
	```

	## 📖 Example Queries

	- "Who are the top 5 three-point shooters?"
	- "Show me the best scoring games this season"
	- "Which players have the highest field goal percentage?"
	- "Analyze team performance trends"
	- "Find games with triple doubles"
	- "What are the most efficient shooters?"

	## 🛠️ Project Structure

	```
	NBA_Analysis/
	├── app.py # Gradio web interface
	├── main.py # Command-line entry point
	├── config.py # LLM and configuration settings
	├── agents.py # AI agent definitions
	├── crew.py # CrewAI crew orchestration
	├── tasks.py # Task definitions
	├── tools.py # Data access tools for agents
	├── vector_db.py # Vector database for semantic search
	├── requirements.txt # Python dependencies
	├── pyproject.toml # Project configuration
	├── test_local.sh # Script for local testing with Ollama
	├── EXECUTION_FLOW.md # Detailed execution flow documentation
	└── README.md # This file
	```

	## 🔧 Available Tools

	The agents have access to 5 data tools:

	1. read_nba_data: Read sample rows to understand structure
	2. search_nba_data: Filter and search CSV data
	3. get_nba_data_summary: Get comprehensive dataset overview
	4. semantic_search_nba_data: Natural language semantic search
	5. analyze_nba_data: Execute pandas operations for advanced analysis

	## 🚀 Deployment

	### Hugging Face Spaces (Free)

	1. Get API Keys:
	- Hugging Face token: https://huggingface.co/settings/tokens
	- (Optional) OpenRouter key: https://openrouter.ai

	2. Create Space:
	- Go to https://huggingface.co/spaces
	- Create new Space with Gradio SDK
	- Push your code

	3. Set Secrets:
	- Space Settings → Repository secrets
	- Add `HF_API_KEY` = your Hugging Face token
	- (Optional) Add `LLM_PROVIDER` = `huggingface`
	- (Optional) Add `HF_MODEL` = your preferred model

	4. Deploy:
	```bash
	git remote add hf https://huggingface.co/spaces/yourusername/nba-analysis
	git push hf main
	```

	See `EXECUTION_FLOW.md` for detailed deployment instructions.

	## 🧪 Local Testing

	### Quick Test with Ollama

	```bash
	# Make sure Ollama is running
	ollama serve

	# Run test script
	./test_local.sh
	```

	Or manually:
	```bash
	export LLM_PROVIDER=ollama
	export OLLAMA_MODEL=mistral
	export OLLAMA_BASE_URL=http://localhost:11434/v1
	python app.py
	```

	## 📊 How It Works

	1. User Input: Upload CSV + enter query
	2. Crew Creation: Three agents are initialized with their roles
	3. Parallel Execution:
	- Engineer validates data
	- Analyst performs analysis (runs in parallel)
	- Storyteller creates narrative (waits for Analyst)
	4. Tool Execution: Agents use tools to access and analyze data
	5. LLM Processing: AI generates insights and responses
	6. Result Aggregation: All outputs are combined and formatted
	7. Display: Results shown to user

	See `EXECUTION_FLOW.md` for detailed flow documentation.

	## 🎯 Key Features Explained

	### Semantic Search
	Uses vector embeddings to find semantically similar records. First run indexes the CSV, subsequent runs use cached embeddings.

	### Parallel Processing
	Engineer and Analyst tasks run simultaneously for faster results. Storyteller waits for Analyst to complete.

	### Multi-Agent Collaboration
	Each agent has a specialized role:
	- Engineer: Data quality and structure
	- Analyst: Statistical analysis and insights
	- Storyteller: Narrative and presentation

	## 🔒 Environment Variables

	\| Variable \| Description \| Default \|
	\|----------\|-------------\|---------\|
	\| `LLM_PROVIDER` \| LLM provider (`huggingface`, `ollama`, `openrouter`) \| `huggingface` \|
	\| `HF_API_KEY` \| Hugging Face API token \| Required if using HF \|
	\| `HF_MODEL` \| Hugging Face model name \| `meta-llama/Llama-3.1-8B-Instruct` \|
	\| `OLLAMA_MODEL` \| Ollama model name \| `mistral` \|
	\| `OLLAMA_BASE_URL` \| Ollama server URL \| `http://localhost:11434/v1` \|
	\| `OPENROUTER_API_KEY` \| OpenRouter API key \| Required if using OpenRouter \|
	\| `OPENROUTER_MODEL` \| OpenRouter model name \| `google/gemma-2-2b-it:free` \|

	## 🐛 Troubleshooting

	### "ModuleNotFoundError: No module named 'crewai'"
	- Install dependencies: `pip install -r requirements.txt` or `uv sync`

	### "HF_API_KEY not set"
	- Set your Hugging Face token as environment variable or in Space secrets

	### "Connection refused" (Ollama)
	- Make sure `ollama serve` is running
	- Check port 11434 is available

	### "Model not found" (Ollama)
	- Download the model: `ollama pull mistral`
	- List models: `ollama list`

	### Slow responses
	- Use smaller models (Llama 3.2 3B instead of 8B)
	- Check your internet connection for API calls
	- For local: Use faster models like `llama3.2`

	## 📝 License

	This project is open source. Check individual dependencies for their licenses.

	## 🤝 Contributing

	Contributions are welcome! Please feel free to submit a Pull Request.

	## 📚 Documentation

	- Execution Flow: See `EXECUTION_FLOW.md` for detailed flow
	- CrewAI Docs: https://docs.crewai.com
	- Gradio Docs: https://gradio.app/docs

	## 🎓 What Was Built

	This project demonstrates:
	- Multi-agent AI systems with CrewAI
	- Parallel task execution
	- Semantic search with vector databases
	- Integration with multiple LLM providers
	- Web interface with Gradio
	- Free-tier deployment on Hugging Face Spaces

	## 💡 Tips

	- First Run: Vector DB indexing takes time on first use
	- Large Files: Use semantic search for large datasets
	- Complex Queries: Use "Analyze with Question" for specific queries
	- Model Selection: Larger models = better quality, slower speed
	- Local Testing: Use Ollama for faster iteration

	## 🔗 Links

	- Hugging Face: https://huggingface.co
	- Ollama: https://ollama.ai
	- OpenRouter: https://openrouter.ai
	- CrewAI: https://docs.crewai.com

	---

	Built with ❤️ using CrewAI and open-source LLMs