Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.3.0
Detailed Execution Flow - NBA Analysis Application
This document explains step-by-step how user input flows through the application and gets executed.
π― High-Level Flow Overview
User Input (CSV + Query)
β
app.py (Gradio Interface)
β
crew.py (CrewAI Orchestration)
β
agents.py (AI Agents)
β
tasks.py (Task Definitions)
β
tools.py (Data Access Tools)
β
vector_db.py / pandas (Data Processing)
β
config.py (LLM Configuration)
β
LLM API (Hugging Face / Ollama / etc.)
β
Results β User
π Detailed Step-by-Step Execution
Phase 1: User Input & Initialization
Step 1.1: User Interaction (app.py)
- File:
app.py - Function:
process_file_and_analyze()orprocess_question_only() - Input:
- CSV file (uploaded via Gradio)
- User query (optional text)
- What happens:
# Line 23-24: Validate file exists if file is None: return "Please upload a CSV file." # Line 27-28: Set default query if empty if not user_query: user_query = "Provide comprehensive analysis..." # Line 32-33: Extract file path file_path = file.name csv_path = file_path
Step 1.2: Crew Creation (crew.py)
- File:
crew.py - Function:
create_flow_crew(user_query, csv_path) - What happens:
# Line 82-84: Create all agents engineer_agent = create_engineer_agent(csv_path) analyst_agent = create_analyst_agent(csv_path) storyteller_agent = create_storyteller_agent() # Line 88-94: Create tasks data_engineering_task = create_data_engineering_task(...) custom_analysis_task = create_custom_analysis_task(...) storyteller_task = create_storyteller_task(...) # Line 99-104: Create Crew with agents and tasks return Crew(agents=[...], tasks=[...], process=Process.sequential)
Phase 2: Agent Initialization
Step 2.1: LLM Configuration (config.py)
- File:
config.py - Function:
get_llm() - What happens:
# Line 13: Check provider (default: "huggingface") LLM_PROVIDER = os.getenv("LLM_PROVIDER", "huggingface") # Line 54-64: Create LLM instance based on provider if LLM_PROVIDER == "huggingface": return LLM( model=f"huggingface/{HF_MODEL}", api_key=HF_API_KEY ) # Similar for ollama, openrouter, etc. - Output: Configured LLM instance (used by all agents)
Step 2.2: Agent Creation (agents.py)
- File:
agents.py - Functions:
create_engineer_agent(),create_analyst_agent(),create_storyteller_agent() - What happens:
Engineer Agent (Lines 12-36):
# Line 22-23: Get data path and tools
data_path = csv_path or NBA_DATA_PATH
agent_tools = get_agent_tools(data_path)
# Line 25-36: Create agent with:
- role: "Data Engineer"
- goal: Process and clean data
- backstory: Expert data engineer description
- llm: Shared LLM instance
- tools: Data access tools (read, search, analyze)
Analyst Agent (Lines 39-69):
# Similar structure but with:
- role: "Data Analyst"
- goal: Extract insights and patterns
- backstory: Includes instructions to use analyze_nba_data for aggregations
- tools: Same data tools
Storyteller Agent (Lines 72-93):
- role: "Sports Storyteller"
- goal: Create engaging headlines from analysis
- tools: [] (no data tools, only uses LLM)
Step 2.3: Tools Initialization (tools.py)
- File:
tools.py - Function:
get_agent_tools(data_path) - What happens:
# Returns list of 5 tools: 1. read_nba_data(limit) - Read sample rows 2. search_nba_data(query, column, value) - Filter/search CSV 3. get_nba_data_summary() - Get dataset overview 4. semantic_search_nba_data(query) - Vector search 5. analyze_nba_data(pandas_code) - Execute pandas operations - Note: Each tool is wrapped with
@tooldecorator for CrewAI
Phase 3: Task Execution
Step 3.1: Crew Kickoff (app.py β crew.py)
- File:
app.pyLine 36-37 - What happens:
crew = create_flow_crew(user_query.strip(), csv_path) result = crew.kickoff() # This triggers execution
Step 3.2: Task 1 - Data Engineering (tasks.py)
- File:
tasks.pyLines 8-40 - Task:
create_data_engineering_task() - Agent: Engineer Agent
- Execution Flow:
1. Engineer Agent receives task description 2. LLM processes task: "Examine dataset, get summary..." 3. Agent decides to use: get_nba_data_summary() 4. Tool execution (tools.py): - Reads CSV with pandas - Calculates stats (rows, columns, unique values) - Returns formatted summary 5. LLM receives tool output 6. LLM generates confirmation: "Dataset loaded, X rows, Y columns..." 7. Task complete β Output stored
Step 3.3: Task 2 - Data Analysis (tasks.py)
- File:
tasks.pyLines 55-95 (create_custom_analysis_task) - Agent: Analyst Agent
- Execution Flow:
1. Analyst Agent receives user query + task description 2. LLM analyzes query: "What does user want?" 3. Agent decides which tools to use: - For aggregations β analyze_nba_data() - For searches β search_nba_data() or semantic_search_nba_data() - For overview β get_nba_data_summary() 4. Tool Execution Examples: Example A: "Top 5 three-point shooters" - Agent generates pandas code: df.groupby('Player')['3P'].sum().sort_values(ascending=False).head(5) - analyze_nba_data() executes code - Returns DataFrame with results - LLM formats output: "Top 5: Player1 (X), Player2 (Y)..." Example B: "Find LeBron James games" - Agent uses search_nba_data(query="LeBron James") - Tool filters CSV, returns matching rows - LLM analyzes results, provides insights Example C: "High scoring games" - Agent uses semantic_search_nba_data("high scoring games") - Vector DB finds semantically similar records - Returns top matches with similarity scores - LLM provides analysis 5. LLM generates final analysis report 6. Task complete β Output stored
Step 3.4: Task 3 - Storytelling (tasks.py)
- File:
tasks.pyLines 98-130 (create_storyteller_task) - Agent: Storyteller Agent
- Dependency: Waits for Analyst task to complete
- Execution Flow:
1. Storyteller Agent receives Analyst's output as context 2. LLM processes: "Create engaging headline and story" 3. No tools used (only LLM) 4. LLM generates: - Catchy headline - Engaging narrative - Context and insights 5. Task complete β Output stored
Phase 4: Tool Execution Details
Tool 1: read_nba_data(limit) (tools.py Lines 22-30)
Input: limit (number of rows)
Execution:
1. pd.read_csv(data_path)
2. df.head(limit)
3. Format as string
Output: Sample rows with column names
Tool 2: search_nba_data(query, column, value) (tools.py Lines 32-71)
Input: query (text), column (name), value (filter)
Execution:
1. pd.read_csv(data_path)
2. Apply filters if provided
3. Text search across columns
4. Limit to 50 rows max
Output: Filtered DataFrame as string
Tool 3: get_nba_data_summary() (tools.py Lines 73-94)
Input: None
Execution:
1. pd.read_csv(data_path)
2. Calculate: total rows, columns, unique players/teams
3. Get date range
4. Identify numeric columns
5. Show sample rows
Output: Comprehensive dataset summary
Tool 4: semantic_search_nba_data(query) (tools.py Lines 135-175)
Input: query (natural language)
Execution:
1. Get vector_db instance (vector_db.py)
2. Check if indexed (if not, index CSV)
3. Generate embedding for query
4. Search in ChromaDB
5. Return top N similar records
6. Load original CSV rows
Output: Similar records with metadata
Vector DB Indexing (vector_db.py Lines 94-156):
First time only:
1. Load SentenceTransformer model
2. Read CSV
3. For each row:
- Convert to text: "Player: X, Team: Y, Points: Z..."
- Generate embedding
- Store in ChromaDB with metadata
4. Persist to disk (chroma_db/)
Tool 5: analyze_nba_data(pandas_code) (tools.py Lines 203-253)
Input: pandas_code (string of pandas operations)
Execution:
1. Load CSV into DataFrame 'df'
2. Create safe namespace: {'pd': pandas, 'df': df}
3. Execute: exec(f"result = {pandas_code}", namespace)
4. Get result from namespace
5. Format output:
- DataFrame β to_string()
- Series β to_string()
- Limit to 50 rows if large
Output: Analysis results as string
Phase 5: LLM Interaction
LLM Call Flow (config.py β LLM API)
1. Agent needs to process task
2. Calls llm.call(prompt, ...)
3. config.py routes to provider:
Hugging Face:
- Format: huggingface/{model_name}
- API: https://api-inference.huggingface.co
- Request: POST with prompt
- Response: Generated text
Ollama:
- Base URL: http://localhost:11434/v1
- OpenAI-compatible API
- Request: POST /chat/completions
- Response: Generated text
OpenRouter:
- Base URL: https://openrouter.ai/api/v1
- Request: POST with model name
- Response: Generated text
4. LLM generates response
5. Response returned to agent
6. Agent processes response
7. Agent decides next action (use tool? finish? ask for clarification?)
Phase 6: Result Aggregation
Result Collection (app.py Lines 39-80)
After crew.kickoff() completes:
1. Extract task outputs:
- result.tasks_output[0] β Engineer result
- result.tasks_output[1] β Analyst result
- result.tasks_output[2] β Storyteller result
2. Format output:
- Add headers: "## Engineer Agent Results"
- Add separators: "---"
- Combine all outputs
3. Store engineer result for reuse
4. Return formatted string to Gradio
Gradio Display (app.py Lines 200-340)
1. User sees results in output textbox
2. Engineer result stored in hidden state
3. Can be reused for follow-up questions
π Parallel Execution Flow
How Tasks Run in Parallel (crew.py Lines 69-104)
Time β
β
ββ Task 1: Engineer (independent)
β ββ Uses: get_nba_data_summary()
β
ββ Task 2: Analyst (independent, runs in parallel)
β ββ Uses: analyze_nba_data() or search_nba_data()
β
ββ Task 3: Storyteller (waits for Analyst)
ββ Uses: LLM only (no tools)
Key Points:
- Engineer and Analyst run simultaneously (no dependencies)
- Storyteller runs after Analyst completes (has dependency)
- CrewAI handles parallelization automatically
π Data Flow Diagram
CSV File
β
[pandas.read_csv()]
β
DataFrame
β
βββ Tools (read, search, analyze)
β β
β Results β Agent β LLM β Response
β
βββ Vector DB (semantic search)
β
[SentenceTransformer]
β
Embeddings
β
[ChromaDB]
β
Similar Records β Agent β LLM β Response
π― Example: Complete Execution Trace
Input:
- CSV:
nba24-25.csv - Query: "Who are the top 5 three-point shooters?"
Execution:
app.py:
process_file_and_analyze(file, "top 5 three-point shooters")crew.py:
create_flow_crew("top 5...", "nba24-25.csv")agents.py: Create Engineer, Analyst, Storyteller agents
config.py:
get_llm()β Returns Hugging Face LLMcrew.kickoff() starts
Task 1 (Engineer):
- Agent: "I need to check the dataset"
- Tool:
get_nba_data_summary() - Result: "Dataset has 5000 rows, columns: Player, Team, 3P, ..."
- LLM: "Dataset loaded. 5000 rows, ready for analysis."
Task 2 (Analyst) - Runs in parallel:
- Agent: "User wants top 5 three-point shooters"
- Tool:
analyze_nba_data("df.groupby('Player')['3P'].sum().sort_values(ascending=False).head(5)") - Execution:
df = pd.read_csv("nba24-25.csv") result = df.groupby('Player')['3P'].sum().sort_values(ascending=False).head(5) # Returns: Player1: 250, Player2: 245, ... - LLM: "Top 5 three-point shooters: 1. Player1 (250), 2. Player2 (245)..."
Task 3 (Storyteller) - After Analyst:
- Agent receives Analyst output
- LLM: "π Splash Brothers Dominate: Top 5 Three-Point Sharpshooters Revealed ..."
app.py: Combine all outputs
Gradio: Display to user
π§ Key Configuration Points
LLM Provider Selection (config.py)
- Environment variable:
LLM_PROVIDER - Options:
huggingface,ollama,openrouter,openai - Default:
huggingface
Model Selection
- Hugging Face:
HF_MODEL(default:meta-llama/Llama-3.1-8B-Instruct) - Ollama:
OLLAMA_MODEL(default:mistral) - OpenRouter:
OPENROUTER_MODEL(default:google/gemma-2-2b-it:free)
Data Path
- Default:
NBA_DATA_PATH = "nba24-25.csv"(config.py) - Can be overridden by uploaded file
π Error Handling
At Each Level:
app.py (Lines 82-86):
- Try/except around
crew.kickoff() - Returns error message with traceback
- Try/except around
Tools (tools.py):
- Each tool has try/except
- Returns error message if fails
Vector DB (vector_db.py):
- Handles missing files
- Creates directory if needed
- Handles indexing errors
LLM (config.py):
- Validates API keys
- Raises ValueError if missing
- Handles API errors
π Summary
Input Flow:
User β Gradio β app.py β crew.py β agents.py β tasks.py β tools.py β data/LLM
Output Flow:
LLM/data β tools.py β agents.py β tasks.py β crew.py β app.py β Gradio β User
Key Points:
- All agents share the same LLM instance
- Tools are stateless (read CSV each time)
- Vector DB is persistent (indexed once, reused)
- Tasks can run in parallel if no dependencies
- Results are aggregated and formatted in app.py
Last Updated: Based on current codebase structure Files Involved: app.py, crew.py, agents.py, tasks.py, tools.py, vector_db.py, config.py