NBA_Analysis / README.md
shekkari21's picture
added readme
b34fde9

A newer version of the Gradio SDK is available: 6.3.0

Upgrade
metadata
title: NBA Analysis
emoji: ๐Ÿ”ฅ
colorFrom: red
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false

๐Ÿ€ NBA Data Analysis with CrewAI

An intelligent NBA data analysis application powered by CrewAI multi-agent framework. Upload your NBA CSV data and get comprehensive analysis with insights, statistics, and engaging storylines generated by AI agents.

โœจ Features

  • ๐Ÿค– Multi-Agent AI System: Three specialized agents (Engineer, Analyst, Storyteller) work together
  • ๐Ÿ“Š Data Engineering: Automatic data cleaning and preparation
  • ๐Ÿ” Intelligent Analysis: AI-powered insights and pattern detection
  • ๐Ÿ“ˆ Statistical Analysis: Top performers, trends, and key metrics
  • ๐Ÿ”Ž Semantic Search: Natural language queries on your data using vector embeddings
  • ๐Ÿ“ Storytelling: Engaging headlines and narratives from data
  • ๐ŸŽฏ Parallel Processing: Tasks run in parallel for faster results
  • ๐ŸŒ Web Interface: Easy-to-use Gradio web app
  • ๐Ÿ†“ Free & Open Source: Uses free-tier open-source LLM models

๐Ÿ—๏ธ Architecture

The application uses a multi-agent system with the following components:

  • Data Engineer Agent: Processes and validates data
  • Data Analyst Agent: Performs statistical analysis and extracts insights
  • Storyteller Agent: Creates engaging narratives from analysis results

Tech Stack

  • CrewAI: Multi-agent AI framework
  • Gradio: Web interface
  • Pandas: Data analysis
  • ChromaDB: Vector database for semantic search
  • Sentence Transformers: Embeddings for semantic search
  • Hugging Face / Ollama: Open-source LLM providers

๐Ÿ“‹ Prerequisites

  • Python 3.11 or 3.12
  • pip or uv package manager
  • (Optional) Ollama for local testing

๐Ÿš€ Installation

1. Clone the Repository

git clone <your-repo-url>
cd NBA_Analysis

2. Install Dependencies

Using uv (recommended):

uv sync

Using pip:

pip install -r requirements.txt

3. Prepare Your Data

Place your NBA CSV file in the project directory, or upload it through the web interface.

โš™๏ธ Configuration

LLM Provider Setup

The application supports multiple LLM providers. Configure via environment variables:

Option 1: Hugging Face (Recommended for Deployment)

  1. Get a free API token from Hugging Face
  2. Set environment variables:
    export LLM_PROVIDER=huggingface
    export HF_API_KEY=your-hf-token
    export HF_MODEL=meta-llama/Llama-3.1-8B-Instruct  # or any HF model
    

Available Models:

  • meta-llama/Llama-3.1-8B-Instruct (default, best quality)
  • mistralai/Mistral-7B-Instruct-v0.2 (excellent quality)
  • Qwen/Qwen2.5-7B-Instruct (multilingual, great quality)
  • meta-llama/Llama-3.2-3B-Instruct (faster, smaller)

Option 2: Ollama (For Local Testing)

  1. Install Ollama: https://ollama.ai
  2. Start Ollama service:
    ollama serve
    
  3. Download a model:
    ollama pull mistral  # or llama3.2, qwen2.5:7b, etc.
    
  4. Set environment variables:
    export LLM_PROVIDER=ollama
    export OLLAMA_MODEL=mistral
    export OLLAMA_BASE_URL=http://localhost:11434/v1
    

Option 3: OpenRouter (Alternative Free Option)

  1. Get a free API key from OpenRouter
  2. Set environment variables:
    export LLM_PROVIDER=openrouter
    export OPENROUTER_API_KEY=your-key
    export OPENROUTER_MODEL=google/gemma-2-2b-it:free
    

Default Configuration

The application defaults to Hugging Face with Llama 3.1 8B Instruct model. No configuration needed if you set HF_API_KEY.

๐ŸŽฎ Usage

Web Interface (Recommended)

python app.py

Then open your browser to the URL shown (usually http://localhost:7860).

Features:

  • Upload CSV file
  • Enter analysis query (or leave blank for comprehensive analysis)
  • Click "Analyze Dataset" for full analysis
  • Click "Analyze with Question" for quick queries

Command Line

python main.py

๐Ÿ“– Example Queries

  • "Who are the top 5 three-point shooters?"
  • "Show me the best scoring games this season"
  • "Which players have the highest field goal percentage?"
  • "Analyze team performance trends"
  • "Find games with triple doubles"
  • "What are the most efficient shooters?"

๐Ÿ› ๏ธ Project Structure

NBA_Analysis/
โ”œโ”€โ”€ app.py                 # Gradio web interface
โ”œโ”€โ”€ main.py                # Command-line entry point
โ”œโ”€โ”€ config.py              # LLM and configuration settings
โ”œโ”€โ”€ agents.py              # AI agent definitions
โ”œโ”€โ”€ crew.py                # CrewAI crew orchestration
โ”œโ”€โ”€ tasks.py               # Task definitions
โ”œโ”€โ”€ tools.py               # Data access tools for agents
โ”œโ”€โ”€ vector_db.py           # Vector database for semantic search
โ”œโ”€โ”€ requirements.txt       # Python dependencies
โ”œโ”€โ”€ pyproject.toml        # Project configuration
โ”œโ”€โ”€ test_local.sh          # Script for local testing with Ollama
โ”œโ”€โ”€ EXECUTION_FLOW.md      # Detailed execution flow documentation
โ””โ”€โ”€ README.md              # This file

๐Ÿ”ง Available Tools

The agents have access to 5 data tools:

  1. read_nba_data: Read sample rows to understand structure
  2. search_nba_data: Filter and search CSV data
  3. get_nba_data_summary: Get comprehensive dataset overview
  4. semantic_search_nba_data: Natural language semantic search
  5. analyze_nba_data: Execute pandas operations for advanced analysis

๐Ÿš€ Deployment

Hugging Face Spaces (Free)

  1. Get API Keys:

  2. Create Space:

  3. Set Secrets:

    • Space Settings โ†’ Repository secrets
    • Add HF_API_KEY = your Hugging Face token
    • (Optional) Add LLM_PROVIDER = huggingface
    • (Optional) Add HF_MODEL = your preferred model
  4. Deploy:

    git remote add hf https://huggingface.co/spaces/yourusername/nba-analysis
    git push hf main
    

See EXECUTION_FLOW.md for detailed deployment instructions.

๐Ÿงช Local Testing

Quick Test with Ollama

# Make sure Ollama is running
ollama serve

# Run test script
./test_local.sh

Or manually:

export LLM_PROVIDER=ollama
export OLLAMA_MODEL=mistral
export OLLAMA_BASE_URL=http://localhost:11434/v1
python app.py

๐Ÿ“Š How It Works

  1. User Input: Upload CSV + enter query
  2. Crew Creation: Three agents are initialized with their roles
  3. Parallel Execution:
    • Engineer validates data
    • Analyst performs analysis (runs in parallel)
    • Storyteller creates narrative (waits for Analyst)
  4. Tool Execution: Agents use tools to access and analyze data
  5. LLM Processing: AI generates insights and responses
  6. Result Aggregation: All outputs are combined and formatted
  7. Display: Results shown to user

See EXECUTION_FLOW.md for detailed flow documentation.

๐ŸŽฏ Key Features Explained

Semantic Search

Uses vector embeddings to find semantically similar records. First run indexes the CSV, subsequent runs use cached embeddings.

Parallel Processing

Engineer and Analyst tasks run simultaneously for faster results. Storyteller waits for Analyst to complete.

Multi-Agent Collaboration

Each agent has a specialized role:

  • Engineer: Data quality and structure
  • Analyst: Statistical analysis and insights
  • Storyteller: Narrative and presentation

๐Ÿ”’ Environment Variables

Variable Description Default
LLM_PROVIDER LLM provider (huggingface, ollama, openrouter) huggingface
HF_API_KEY Hugging Face API token Required if using HF
HF_MODEL Hugging Face model name meta-llama/Llama-3.1-8B-Instruct
OLLAMA_MODEL Ollama model name mistral
OLLAMA_BASE_URL Ollama server URL http://localhost:11434/v1
OPENROUTER_API_KEY OpenRouter API key Required if using OpenRouter
OPENROUTER_MODEL OpenRouter model name google/gemma-2-2b-it:free

๐Ÿ› Troubleshooting

"ModuleNotFoundError: No module named 'crewai'"

  • Install dependencies: pip install -r requirements.txt or uv sync

"HF_API_KEY not set"

  • Set your Hugging Face token as environment variable or in Space secrets

"Connection refused" (Ollama)

  • Make sure ollama serve is running
  • Check port 11434 is available

"Model not found" (Ollama)

  • Download the model: ollama pull mistral
  • List models: ollama list

Slow responses

  • Use smaller models (Llama 3.2 3B instead of 8B)
  • Check your internet connection for API calls
  • For local: Use faster models like llama3.2

๐Ÿ“ License

This project is open source. Check individual dependencies for their licenses.

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

๐Ÿ“š Documentation

๐ŸŽ“ What Was Built

This project demonstrates:

  • Multi-agent AI systems with CrewAI
  • Parallel task execution
  • Semantic search with vector databases
  • Integration with multiple LLM providers
  • Web interface with Gradio
  • Free-tier deployment on Hugging Face Spaces

๐Ÿ’ก Tips

  • First Run: Vector DB indexing takes time on first use
  • Large Files: Use semantic search for large datasets
  • Complex Queries: Use "Analyze with Question" for specific queries
  • Model Selection: Larger models = better quality, slower speed
  • Local Testing: Use Ollama for faster iteration

๐Ÿ”— Links


Built with โค๏ธ using CrewAI and open-source LLMs