Spaces:

shekkari21
/

NBA_Analysis

Sleeping

App Files Files Community

shekkari21 commited on Nov 22, 2025

Commit

b34fde9

1 Parent(s): ac32153

added readme

Browse files

Files changed (2) hide show

EXECUTION_FLOW.md +527 -0
README.md +307 -21

EXECUTION_FLOW.md ADDED Viewed

	@@ -0,0 +1,527 @@

+# Detailed Execution Flow - NBA Analysis Application
+This document explains step-by-step how user input flows through the application and gets executed.
+---
+## 🎯 High-Level Flow Overview
+```
+User Input (CSV + Query)
+    ↓
+app.py (Gradio Interface)
+    ↓
+crew.py (CrewAI Orchestration)
+    ↓
+agents.py (AI Agents)
+    ↓
+tasks.py (Task Definitions)
+    ↓
+tools.py (Data Access Tools)
+    ↓
+vector_db.py / pandas (Data Processing)
+    ↓
+config.py (LLM Configuration)
+    ↓
+LLM API (Hugging Face / Ollama / etc.)
+    ↓
+Results → User
+```
+---
+## 📋 Detailed Step-by-Step Execution
+### **Phase 1: User Input & Initialization**
+#### Step 1.1: User Interaction (`app.py`)
+- **File**: `app.py`
+- **Function**: `process_file_and_analyze()` or `process_question_only()`
+- **Input**:
+  - CSV file (uploaded via Gradio)
+  - User query (optional text)
+- **What happens**:
+  ```python
+  # Line 23-24: Validate file exists
+  if file is None:
+      return "Please upload a CSV file."
+  # Line 27-28: Set default query if empty
+  if not user_query:
+      user_query = "Provide comprehensive analysis..."
+  # Line 32-33: Extract file path
+  file_path = file.name
+  csv_path = file_path
+  ```
+#### Step 1.2: Crew Creation (`crew.py`)
+- **File**: `crew.py`
+- **Function**: `create_flow_crew(user_query, csv_path)`
+- **What happens**:
+  ```python
+  # Line 82-84: Create all agents
+  engineer_agent = create_engineer_agent(csv_path)
+  analyst_agent = create_analyst_agent(csv_path)
+  storyteller_agent = create_storyteller_agent()
+  # Line 88-94: Create tasks
+  data_engineering_task = create_data_engineering_task(...)
+  custom_analysis_task = create_custom_analysis_task(...)
+  storyteller_task = create_storyteller_task(...)
+  # Line 99-104: Create Crew with agents and tasks
+  return Crew(agents=[...], tasks=[...], process=Process.sequential)
+  ```
+---
+### **Phase 2: Agent Initialization**
+#### Step 2.1: LLM Configuration (`config.py`)
+- **File**: `config.py`
+- **Function**: `get_llm()`
+- **What happens**:
+  ```python
+  # Line 13: Check provider (default: "huggingface")
+  LLM_PROVIDER = os.getenv("LLM_PROVIDER", "huggingface")
+  # Line 54-64: Create LLM instance based on provider
+  if LLM_PROVIDER == "huggingface":
+      return LLM(
+          model=f"huggingface/{HF_MODEL}",
+          api_key=HF_API_KEY
+      )
+  # Similar for ollama, openrouter, etc.
+  ```
+- **Output**: Configured LLM instance (used by all agents)
+#### Step 2.2: Agent Creation (`agents.py`)
+- **File**: `agents.py`
+- **Functions**: `create_engineer_agent()`, `create_analyst_agent()`, `create_storyteller_agent()`
+- **What happens**:
+**Engineer Agent** (Lines 12-36):
+  ```python
+  # Line 22-23: Get data path and tools
+  data_path = csv_path or NBA_DATA_PATH
+  agent_tools = get_agent_tools(data_path)
+  # Line 25-36: Create agent with:
+  - role: "Data Engineer"
+  - goal: Process and clean data
+  - backstory: Expert data engineer description
+  - llm: Shared LLM instance
+  - tools: Data access tools (read, search, analyze)
+  ```
+**Analyst Agent** (Lines 39-69):
+  ```python
+  # Similar structure but with:
+  - role: "Data Analyst"
+  - goal: Extract insights and patterns
+  - backstory: Includes instructions to use analyze_nba_data for aggregations
+  - tools: Same data tools
+  ```
+**Storyteller Agent** (Lines 72-93):
+  ```python
+  - role: "Sports Storyteller"
+  - goal: Create engaging headlines from analysis
+  - tools: [] (no data tools, only uses LLM)
+  ```
+#### Step 2.3: Tools Initialization (`tools.py`)
+- **File**: `tools.py`
+- **Function**: `get_agent_tools(data_path)`
+- **What happens**:
+  ```python
+  # Returns list of 5 tools:
+  1. read_nba_data(limit) - Read sample rows
+  2. search_nba_data(query, column, value) - Filter/search CSV
+  3. get_nba_data_summary() - Get dataset overview
+  4. semantic_search_nba_data(query) - Vector search
+  5. analyze_nba_data(pandas_code) - Execute pandas operations
+  ```
+- **Note**: Each tool is wrapped with `@tool` decorator for CrewAI
+---
+### **Phase 3: Task Execution**
+#### Step 3.1: Crew Kickoff (`app.py` → `crew.py`)
+- **File**: `app.py` Line 36-37
+- **What happens**:
+  ```python
+  crew = create_flow_crew(user_query.strip(), csv_path)
+  result = crew.kickoff()  # This triggers execution
+  ```
+#### Step 3.2: Task 1 - Data Engineering (`tasks.py`)
+- **File**: `tasks.py` Lines 8-40
+- **Task**: `create_data_engineering_task()`
+- **Agent**: Engineer Agent
+- **Execution Flow**:
+  ```
+  1. Engineer Agent receives task description
+  2. LLM processes task: "Examine dataset, get summary..."
+  3. Agent decides to use: get_nba_data_summary()
+  4. Tool execution (tools.py):
+     - Reads CSV with pandas
+     - Calculates stats (rows, columns, unique values)
+     - Returns formatted summary
+  5. LLM receives tool output
+  6. LLM generates confirmation: "Dataset loaded, X rows, Y columns..."
+  7. Task complete → Output stored
+  ```
+#### Step 3.3: Task 2 - Data Analysis (`tasks.py`)
+- **File**: `tasks.py` Lines 55-95 (create_custom_analysis_task)
+- **Agent**: Analyst Agent
+- **Execution Flow**:
+  ```
+  1. Analyst Agent receives user query + task description
+  2. LLM analyzes query: "What does user want?"
+  3. Agent decides which tools to use:
+     - For aggregations → analyze_nba_data()
+     - For searches → search_nba_data() or semantic_search_nba_data()
+     - For overview → get_nba_data_summary()
+  4. Tool Execution Examples:
+  Example A: "Top 5 three-point shooters"
+    - Agent generates pandas code:
+      df.groupby('Player')['3P'].sum().sort_values(ascending=False).head(5)
+    - analyze_nba_data() executes code
+    - Returns DataFrame with results
+    - LLM formats output: "Top 5: Player1 (X), Player2 (Y)..."
+  Example B: "Find LeBron James games"
+    - Agent uses search_nba_data(query="LeBron James")
+    - Tool filters CSV, returns matching rows
+    - LLM analyzes results, provides insights
+  Example C: "High scoring games"
+    - Agent uses semantic_search_nba_data("high scoring games")
+    - Vector DB finds semantically similar records
+    - Returns top matches with similarity scores
+    - LLM provides analysis
+  5. LLM generates final analysis report
+  6. Task complete → Output stored
+  ```
+#### Step 3.4: Task 3 - Storytelling (`tasks.py`)
+- **File**: `tasks.py` Lines 98-130 (create_storyteller_task)
+- **Agent**: Storyteller Agent
+- **Dependency**: Waits for Analyst task to complete
+- **Execution Flow**:
+  ```
+  1. Storyteller Agent receives Analyst's output as context
+  2. LLM processes: "Create engaging headline and story"
+  3. No tools used (only LLM)
+  4. LLM generates:
+     - Catchy headline
+     - Engaging narrative
+     - Context and insights
+  5. Task complete → Output stored
+  ```
+---
+### **Phase 4: Tool Execution Details**
+#### Tool 1: `read_nba_data(limit)` (`tools.py` Lines 22-30)
+```
+Input: limit (number of rows)
+Execution:
+  1. pd.read_csv(data_path)
+  2. df.head(limit)
+  3. Format as string
+Output: Sample rows with column names
+```
+#### Tool 2: `search_nba_data(query, column, value)` (`tools.py` Lines 32-71)
+```
+Input: query (text), column (name), value (filter)
+Execution:
+  1. pd.read_csv(data_path)
+  2. Apply filters if provided
+  3. Text search across columns
+  4. Limit to 50 rows max
+Output: Filtered DataFrame as string
+```
+#### Tool 3: `get_nba_data_summary()` (`tools.py` Lines 73-94)
+```
+Input: None
+Execution:
+  1. pd.read_csv(data_path)
+  2. Calculate: total rows, columns, unique players/teams
+  3. Get date range
+  4. Identify numeric columns
+  5. Show sample rows
+Output: Comprehensive dataset summary
+```
+#### Tool 4: `semantic_search_nba_data(query)` (`tools.py` Lines 135-175)
+```
+Input: query (natural language)
+Execution:
+  1. Get vector_db instance (vector_db.py)
+  2. Check if indexed (if not, index CSV)
+  3. Generate embedding for query
+  4. Search in ChromaDB
+  5. Return top N similar records
+  6. Load original CSV rows
+Output: Similar records with metadata
+```
+**Vector DB Indexing** (`vector_db.py` Lines 94-156):
+```
+First time only:
+  1. Load SentenceTransformer model
+  2. Read CSV
+  3. For each row:
+     - Convert to text: "Player: X, Team: Y, Points: Z..."
+     - Generate embedding
+     - Store in ChromaDB with metadata
+  4. Persist to disk (chroma_db/)
+```
+#### Tool 5: `analyze_nba_data(pandas_code)` (`tools.py` Lines 203-253)
+```
+Input: pandas_code (string of pandas operations)
+Execution:
+  1. Load CSV into DataFrame 'df'
+  2. Create safe namespace: {'pd': pandas, 'df': df}
+  3. Execute: exec(f"result = {pandas_code}", namespace)
+  4. Get result from namespace
+  5. Format output:
+     - DataFrame → to_string()
+     - Series → to_string()
+     - Limit to 50 rows if large
+Output: Analysis results as string
+```
+---
+### **Phase 5: LLM Interaction**
+#### LLM Call Flow (`config.py` → LLM API)
+```
+1. Agent needs to process task
+2. Calls llm.call(prompt, ...)
+3. config.py routes to provider:
+   Hugging Face:
+   - Format: huggingface/{model_name}
+   - API: https://api-inference.huggingface.co
+   - Request: POST with prompt
+   - Response: Generated text
+   Ollama:
+   - Base URL: http://localhost:11434/v1
+   - OpenAI-compatible API
+   - Request: POST /chat/completions
+   - Response: Generated text
+   OpenRouter:
+   - Base URL: https://openrouter.ai/api/v1
+   - Request: POST with model name
+   - Response: Generated text
+4. LLM generates response
+5. Response returned to agent
+6. Agent processes response
+7. Agent decides next action (use tool? finish? ask for clarification?)
+```
+---
+### **Phase 6: Result Aggregation**
+#### Result Collection (`app.py` Lines 39-80)
+```
+After crew.kickoff() completes:
+1. Extract task outputs:
+   - result.tasks_output[0] → Engineer result
+   - result.tasks_output[1] → Analyst result
+   - result.tasks_output[2] → Storyteller result
+2. Format output:
+   - Add headers: "## Engineer Agent Results"
+   - Add separators: "---"
+   - Combine all outputs
+3. Store engineer result for reuse
+4. Return formatted string to Gradio
+```
+#### Gradio Display (`app.py` Lines 200-340)
+```
+1. User sees results in output textbox
+2. Engineer result stored in hidden state
+3. Can be reused for follow-up questions
+```
+---
+## 🔄 Parallel Execution Flow
+### How Tasks Run in Parallel (`crew.py` Lines 69-104)
+```
+Time →
+│
+├─ Task 1: Engineer (independent)
+│  └─ Uses: get_nba_data_summary()
+│
+├─ Task 2: Analyst (independent, runs in parallel)
+│  └─ Uses: analyze_nba_data() or search_nba_data()
+│
+└─ Task 3: Storyteller (waits for Analyst)
+   └─ Uses: LLM only (no tools)
+```
+**Key Points**:
+- Engineer and Analyst run **simultaneously** (no dependencies)
+- Storyteller runs **after** Analyst completes (has dependency)
+- CrewAI handles parallelization automatically
+---
+## 📊 Data Flow Diagram
+```
+CSV File
+    ↓
+[pandas.read_csv()]
+    ↓
+DataFrame
+    ↓
+    ├─→ Tools (read, search, analyze)
+    │       ↓
+    │   Results → Agent → LLM → Response
+    │
+    └─→ Vector DB (semantic search)
+            ↓
+        [SentenceTransformer]
+            ↓
+        Embeddings
+            ↓
+        [ChromaDB]
+            ↓
+        Similar Records → Agent → LLM → Response
+```
+---
+## 🎯 Example: Complete Execution Trace
+### Input:
+- CSV: `nba24-25.csv`
+- Query: "Who are the top 5 three-point shooters?"
+### Execution:
+1. **app.py**: `process_file_and_analyze(file, "top 5 three-point shooters")`
+2. **crew.py**: `create_flow_crew("top 5...", "nba24-25.csv")`
+3. **agents.py**: Create Engineer, Analyst, Storyteller agents
+4. **config.py**: `get_llm()` → Returns Hugging Face LLM
+5. **crew.kickoff()** starts
+6. **Task 1 (Engineer)**:
+   - Agent: "I need to check the dataset"
+   - Tool: `get_nba_data_summary()`
+   - Result: "Dataset has 5000 rows, columns: Player, Team, 3P, ..."
+   - LLM: "Dataset loaded. 5000 rows, ready for analysis."
+7. **Task 2 (Analyst)** - Runs in parallel:
+   - Agent: "User wants top 5 three-point shooters"
+   - Tool: `analyze_nba_data("df.groupby('Player')['3P'].sum().sort_values(ascending=False).head(5)")`
+   - Execution:
+     ```python
+     df = pd.read_csv("nba24-25.csv")
+     result = df.groupby('Player')['3P'].sum().sort_values(ascending=False).head(5)
+     # Returns: Player1: 250, Player2: 245, ...
+     ```
+   - LLM: "Top 5 three-point shooters: 1. Player1 (250), 2. Player2 (245)..."
+8. **Task 3 (Storyteller)** - After Analyst:
+   - Agent receives Analyst output
+   - LLM: "🏀 **Splash Brothers Dominate: Top 5 Three-Point Sharpshooters Revealed** ..."
+9. **app.py**: Combine all outputs
+10. **Gradio**: Display to user
+---
+## 🔧 Key Configuration Points
+### LLM Provider Selection (`config.py`)
+- Environment variable: `LLM_PROVIDER`
+- Options: `huggingface`, `ollama`, `openrouter`, `openai`
+- Default: `huggingface`
+### Model Selection
+- Hugging Face: `HF_MODEL` (default: `meta-llama/Llama-3.1-8B-Instruct`)
+- Ollama: `OLLAMA_MODEL` (default: `mistral`)
+- OpenRouter: `OPENROUTER_MODEL` (default: `google/gemma-2-2b-it:free`)
+### Data Path
+- Default: `NBA_DATA_PATH = "nba24-25.csv"` (config.py)
+- Can be overridden by uploaded file
+---
+## 🐛 Error Handling
+### At Each Level:
+1. **app.py** (Lines 82-86):
+   - Try/except around `crew.kickoff()`
+   - Returns error message with traceback
+2. **Tools** (tools.py):
+   - Each tool has try/except
+   - Returns error message if fails
+3. **Vector DB** (vector_db.py):
+   - Handles missing files
+   - Creates directory if needed
+   - Handles indexing errors
+4. **LLM** (config.py):
+   - Validates API keys
+   - Raises ValueError if missing
+   - Handles API errors
+---
+## 📝 Summary
+**Input Flow**:
+```
+User → Gradio → app.py → crew.py → agents.py → tasks.py → tools.py → data/LLM
+```
+**Output Flow**:
+```
+LLM/data → tools.py → agents.py → tasks.py → crew.py → app.py → Gradio → User
+```
+**Key Points**:
+- All agents share the same LLM instance
+- Tools are stateless (read CSV each time)
+- Vector DB is persistent (indexed once, reused)
+- Tasks can run in parallel if no dependencies
+- Results are aggregated and formatted in app.py
+---
+**Last Updated**: Based on current codebase structure
+**Files Involved**: app.py, crew.py, agents.py, tasks.py, tools.py, vector_db.py, config.py

README.md CHANGED Viewed

@@ -9,46 +9,332 @@ app_file: app.py
 pinned: false
 ---
-# NBA Data Analysis with CrewAI
-An intelligent NBA data analysis application powered by CrewAI agents. Upload your NBA CSV data and get comprehensive analysis with insights, statistics, and engaging storylines.
-## Features
 - 📊 **Data Engineering**: Automatic data cleaning and preparation
 - 🔍 **Intelligent Analysis**: AI-powered insights and pattern detection
 - 📈 **Statistical Analysis**: Top performers, trends, and key metrics
 - 📝 **Storytelling**: Engaging headlines and narratives from data
-- 🎯 **Semantic Search**: Natural language queries on your data
-## How to Use
-1. **Upload a CSV file** with NBA data
-2. **Enter your analysis query** (or leave blank for comprehensive analysis)
-3. **Click "Analyze Dataset"** and wait for results
-4. **View insights** from Engineer, Analyst, and Storyteller agents
-## Example Queries
 - "Who are the top 5 three-point shooters?"
 - "Show me the best scoring games this season"
 - "Which players have the highest field goal percentage?"
 - "Analyze team performance trends"
-## Technology Stack
-- **CrewAI**: Multi-agent AI framework
-- **Gradio**: Web interface
-- **Pandas**: Data analysis
-- **ChromaDB**: Vector database for semantic search
-- **OpenRouter**: Free open-source LLM access
-## Free to Use
-This application uses free-tier services:
-- OpenRouter for LLM access (free tier)
-- Hugging Face Spaces for hosting (free tier)
 ---
-Built with ❤️ using CrewAI

 pinned: false
 ---
+# 🏀 NBA Data Analysis with CrewAI
+An intelligent NBA data analysis application powered by CrewAI multi-agent framework. Upload your NBA CSV data and get comprehensive analysis with insights, statistics, and engaging storylines generated by AI agents.
+## ✨ Features
+- 🤖 **Multi-Agent AI System**: Three specialized agents (Engineer, Analyst, Storyteller) work together
 - 📊 **Data Engineering**: Automatic data cleaning and preparation
 - 🔍 **Intelligent Analysis**: AI-powered insights and pattern detection
 - 📈 **Statistical Analysis**: Top performers, trends, and key metrics
+- 🔎 **Semantic Search**: Natural language queries on your data using vector embeddings
 - 📝 **Storytelling**: Engaging headlines and narratives from data
+- 🎯 **Parallel Processing**: Tasks run in parallel for faster results
+- 🌐 **Web Interface**: Easy-to-use Gradio web app
+- 🆓 **Free & Open Source**: Uses free-tier open-source LLM models
+## 🏗️ Architecture
+The application uses a multi-agent system with the following components:
+- **Data Engineer Agent**: Processes and validates data
+- **Data Analyst Agent**: Performs statistical analysis and extracts insights
+- **Storyteller Agent**: Creates engaging narratives from analysis results
+### Tech Stack
+- **CrewAI**: Multi-agent AI framework
+- **Gradio**: Web interface
+- **Pandas**: Data analysis
+- **ChromaDB**: Vector database for semantic search
+- **Sentence Transformers**: Embeddings for semantic search
+- **Hugging Face / Ollama**: Open-source LLM providers
+## 📋 Prerequisites
+- Python 3.11 or 3.12
+- pip or uv package manager
+- (Optional) Ollama for local testing
+## 🚀 Installation
+### 1. Clone the Repository
+```bash
+git clone <your-repo-url>
+cd NBA_Analysis
+```
+### 2. Install Dependencies
+**Using uv (recommended):**
+```bash
+uv sync
+```
+**Using pip:**
+```bash
+pip install -r requirements.txt
+```
+### 3. Prepare Your Data
+Place your NBA CSV file in the project directory, or upload it through the web interface.
+## ⚙️ Configuration
+### LLM Provider Setup
+The application supports multiple LLM providers. Configure via environment variables:
+#### Option 1: Hugging Face (Recommended for Deployment)
+1. Get a free API token from [Hugging Face](https://huggingface.co/settings/tokens)
+2. Set environment variables:
+   ```bash
+   export LLM_PROVIDER=huggingface
+   export HF_API_KEY=your-hf-token
+   export HF_MODEL=meta-llama/Llama-3.1-8B-Instruct  # or any HF model
+   ```
+**Available Models:**
+- `meta-llama/Llama-3.1-8B-Instruct` (default, best quality)
+- `mistralai/Mistral-7B-Instruct-v0.2` (excellent quality)
+- `Qwen/Qwen2.5-7B-Instruct` (multilingual, great quality)
+- `meta-llama/Llama-3.2-3B-Instruct` (faster, smaller)
+#### Option 2: Ollama (For Local Testing)
+1. Install Ollama: https://ollama.ai
+2. Start Ollama service:
+   ```bash
+   ollama serve
+   ```
+3. Download a model:
+   ```bash
+   ollama pull mistral  # or llama3.2, qwen2.5:7b, etc.
+   ```
+4. Set environment variables:
+   ```bash
+   export LLM_PROVIDER=ollama
+   export OLLAMA_MODEL=mistral
+   export OLLAMA_BASE_URL=http://localhost:11434/v1
+   ```
+#### Option 3: OpenRouter (Alternative Free Option)
+1. Get a free API key from [OpenRouter](https://openrouter.ai)
+2. Set environment variables:
+   ```bash
+   export LLM_PROVIDER=openrouter
+   export OPENROUTER_API_KEY=your-key
+   export OPENROUTER_MODEL=google/gemma-2-2b-it:free
+   ```
+### Default Configuration
+The application defaults to **Hugging Face** with **Llama 3.1 8B Instruct** model. No configuration needed if you set `HF_API_KEY`.
+## 🎮 Usage
+### Web Interface (Recommended)
+```bash
+python app.py
+```
+Then open your browser to the URL shown (usually `http://localhost:7860`).
+**Features:**
+- Upload CSV file
+- Enter analysis query (or leave blank for comprehensive analysis)
+- Click "Analyze Dataset" for full analysis
+- Click "Analyze with Question" for quick queries
+### Command Line
+```bash
+python main.py
+```
+## 📖 Example Queries
 - "Who are the top 5 three-point shooters?"
 - "Show me the best scoring games this season"
 - "Which players have the highest field goal percentage?"
 - "Analyze team performance trends"
+- "Find games with triple doubles"
+- "What are the most efficient shooters?"
+## 🛠️ Project Structure
+```
+NBA_Analysis/
+├── app.py                 # Gradio web interface
+├── main.py                # Command-line entry point
+├── config.py              # LLM and configuration settings
+├── agents.py              # AI agent definitions
+├── crew.py                # CrewAI crew orchestration
+├── tasks.py               # Task definitions
+├── tools.py               # Data access tools for agents
+├── vector_db.py           # Vector database for semantic search
+├── requirements.txt       # Python dependencies
+├── pyproject.toml        # Project configuration
+├── test_local.sh          # Script for local testing with Ollama
+├── EXECUTION_FLOW.md      # Detailed execution flow documentation
+└── README.md              # This file
+```
+## 🔧 Available Tools
+The agents have access to 5 data tools:
+1. **read_nba_data**: Read sample rows to understand structure
+2. **search_nba_data**: Filter and search CSV data
+3. **get_nba_data_summary**: Get comprehensive dataset overview
+4. **semantic_search_nba_data**: Natural language semantic search
+5. **analyze_nba_data**: Execute pandas operations for advanced analysis
+## 🚀 Deployment
+### Hugging Face Spaces (Free)
+1. **Get API Keys:**
+   - Hugging Face token: https://huggingface.co/settings/tokens
+   - (Optional) OpenRouter key: https://openrouter.ai
+2. **Create Space:**
+   - Go to https://huggingface.co/spaces
+   - Create new Space with Gradio SDK
+   - Push your code
+3. **Set Secrets:**
+   - Space Settings → Repository secrets
+   - Add `HF_API_KEY` = your Hugging Face token
+   - (Optional) Add `LLM_PROVIDER` = `huggingface`
+   - (Optional) Add `HF_MODEL` = your preferred model
+4. **Deploy:**
+   ```bash
+   git remote add hf https://huggingface.co/spaces/yourusername/nba-analysis
+   git push hf main
+   ```
+See `EXECUTION_FLOW.md` for detailed deployment instructions.
+## 🧪 Local Testing
+### Quick Test with Ollama
+```bash
+# Make sure Ollama is running
+ollama serve
+# Run test script
+./test_local.sh
+```
+Or manually:
+```bash
+export LLM_PROVIDER=ollama
+export OLLAMA_MODEL=mistral
+export OLLAMA_BASE_URL=http://localhost:11434/v1
+python app.py
+```
+## 📊 How It Works
+1. **User Input**: Upload CSV + enter query
+2. **Crew Creation**: Three agents are initialized with their roles
+3. **Parallel Execution**:
+   - Engineer validates data
+   - Analyst performs analysis (runs in parallel)
+   - Storyteller creates narrative (waits for Analyst)
+4. **Tool Execution**: Agents use tools to access and analyze data
+5. **LLM Processing**: AI generates insights and responses
+6. **Result Aggregation**: All outputs are combined and formatted
+7. **Display**: Results shown to user
+See `EXECUTION_FLOW.md` for detailed flow documentation.
+## 🎯 Key Features Explained
+### Semantic Search
+Uses vector embeddings to find semantically similar records. First run indexes the CSV, subsequent runs use cached embeddings.
+### Parallel Processing
+Engineer and Analyst tasks run simultaneously for faster results. Storyteller waits for Analyst to complete.
+### Multi-Agent Collaboration
+Each agent has a specialized role:
+- **Engineer**: Data quality and structure
+- **Analyst**: Statistical analysis and insights
+- **Storyteller**: Narrative and presentation
+## 🔒 Environment Variables
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `LLM_PROVIDER` | LLM provider (`huggingface`, `ollama`, `openrouter`) | `huggingface` |
+| `HF_API_KEY` | Hugging Face API token | Required if using HF |
+| `HF_MODEL` | Hugging Face model name | `meta-llama/Llama-3.1-8B-Instruct` |
+| `OLLAMA_MODEL` | Ollama model name | `mistral` |
+| `OLLAMA_BASE_URL` | Ollama server URL | `http://localhost:11434/v1` |
+| `OPENROUTER_API_KEY` | OpenRouter API key | Required if using OpenRouter |
+| `OPENROUTER_MODEL` | OpenRouter model name | `google/gemma-2-2b-it:free` |
+## 🐛 Troubleshooting
+### "ModuleNotFoundError: No module named 'crewai'"
+- Install dependencies: `pip install -r requirements.txt` or `uv sync`
+### "HF_API_KEY not set"
+- Set your Hugging Face token as environment variable or in Space secrets
+### "Connection refused" (Ollama)
+- Make sure `ollama serve` is running
+- Check port 11434 is available
+### "Model not found" (Ollama)
+- Download the model: `ollama pull mistral`
+- List models: `ollama list`
+### Slow responses
+- Use smaller models (Llama 3.2 3B instead of 8B)
+- Check your internet connection for API calls
+- For local: Use faster models like `llama3.2`
+## 📝 License
+This project is open source. Check individual dependencies for their licenses.
+## 🤝 Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.
+## 📚 Documentation
+- **Execution Flow**: See `EXECUTION_FLOW.md` for detailed flow
+- **CrewAI Docs**: https://docs.crewai.com
+- **Gradio Docs**: https://gradio.app/docs
+## 🎓 What Was Built
+This project demonstrates:
+- Multi-agent AI systems with CrewAI
+- Parallel task execution
+- Semantic search with vector databases
+- Integration with multiple LLM providers
+- Web interface with Gradio
+- Free-tier deployment on Hugging Face Spaces
+## 💡 Tips
+- **First Run**: Vector DB indexing takes time on first use
+- **Large Files**: Use semantic search for large datasets
+- **Complex Queries**: Use "Analyze with Question" for specific queries
+- **Model Selection**: Larger models = better quality, slower speed
+- **Local Testing**: Use Ollama for faster iteration
+## 🔗 Links
+- **Hugging Face**: https://huggingface.co
+- **Ollama**: https://ollama.ai
+- **OpenRouter**: https://openrouter.ai
+- **CrewAI**: https://docs.crewai.com
 ---
+**Built with ❤️ using CrewAI and open-source LLMs**