Spaces:

MapoTofu9
/

why-agent

Sleeping

App Files Files Community

why-agent / docs /CONTRIBUTING.md

MapoTofu9

deploy: HF Spaces

5d30bdc about 2 months ago

preview code

Raw

History Blame Contribute Delete

19.3 kB

	# Contributing to why-agent

	Thank you for your interest in contributing! This guide covers the development setup, testing workflow, and code quality standards.

	---

	## Prerequisites

	- Python 3.12+ (check `.python-version`)
	- uv — modern Python package manager ([install](https://docs.astral.sh/uv/))
	- Node.js 20+ — for the Next.js frontend (optional, only if modifying frontend)
	- Git — for version control

	---

	## Development Setup

	### 1. Clone and install dependencies

	```bash
	git clone https://github.com/Isa-Mapo-Hackathon/why-agent.git
	cd why-agent
	uv sync
	```

	This installs both runtime and dev dependencies (pytest, ruff, pyright).

	### 2. Set up environment

	```bash
	cp .env.example .env
	```

	Then edit `.env` with your secrets:
	- `MODEL_BACKEND` — use `minimax` or `replay` for local development
	- `MINIMAX_API_KEY` — get from [MiniMax dashboard](https://platform.minimaxi.chat/)
	- `PARQUET_DIR` — defaults to `data/parquet`
	- `SEMANTIC_LAYER_PATH` — defaults to `data/semantic_layer.yml`

	### 3. Verify setup

	```bash
	uv run pytest -v
	```

	Should run ~15+ tests without errors.

	---

	## Running the Application

	### Option A: Streamlit (Python-only, simplest)

	```bash
	uv run streamlit run streamlit_app.py
	```

	Opens at `http://localhost:8501`. Uses the Streamlit UI to ask questions directly to the agent.

	### Option B: FastAPI + Next.js (full stack)

	Terminal 1 — FastAPI backend (with hot reload):
	```bash
	uv run uvicorn client.backend.main:app --reload --port 8000
	```

	Backend runs at `http://localhost:8000`. Check health at `http://localhost:8000/api/health`.

	Terminal 2 — Next.js frontend:
	```bash
	cd client/frontend
	npm install # first time only
	npm run dev
	```

	Frontend runs at `http://localhost:3000`. The Next.js dev server proxies `/api/*` to the FastAPI backend on port 8000 automatically.

	---

	## Common Development Commands

	\| Task \| Command \|
	\|------\|---------\|
	\| Install deps \| `uv sync` \|
	\| Add a dependency \| `uv add <package>` (runtime) or `uv add --dev <package>` (dev) \|
	\| Run tests \| `uv run pytest -v` \|
	\| Run one test file \| `uv run pytest tests/test_agent_smoke.py -v` \|
	\| Lint code \| `uv run ruff check --fix` \|
	\| Format code \| `uv run ruff format` \|
	\| Type check (optional) \| `uv run pyright` \|
	\| Run Streamlit \| `uv run streamlit run streamlit_app.py` \|
	\| Run FastAPI backend \| `uv run uvicorn client.backend.main:app --reload --port 8000` \|
	\| Run Next.js frontend \| `cd client/frontend && npm run dev` \|

	---

	## Testing

	### Philosophy

	Tests are smoke tests, not unit tests. We verify:
	- Tools run without crashing
	- Output has the expected shape (JSON, dict keys, etc.)
	- Error handling is recoverable

	We do not mock heavily or test implementation details.

	### Running tests

	```bash
	# All tests
	uv run pytest

	# Single file
	uv run pytest tests/test_tools.py -v

	# Single test
	uv run pytest tests/test_tools.py::test_inspect_schema -v

	# With print output
	uv run pytest -s
	```

	### Adding a test

	1. Add a `.py` file in `tests/` or `client/backend/tests/`
	2. Write a function named `test_*`
	3. Use `assert` statements
	4. Run `uv run pytest` to verify

	Example:
	```python
	def test_my_feature():
	from agent.tools import run_sql
	result = run_sql(...)
	assert "rows" in result
	assert isinstance(result["rows"], list)
	```

	---

	## Code Quality

	Before any commit, code must pass:

	```bash
	uv run ruff check --fix # Fix lint errors automatically
	uv run ruff format # Format to standard style
	```

	These two commands are required — CI will reject commits that don't pass.

	Optional (not in CI, but recommended):
	```bash
	uv run pyright # Type checking (editor runs this too)
	```

	---

	## Repository Structure

	```
	why-agent/
	├── agent/ # Core agent logic
	│ ├── graph.py # LangGraph state machine
	│ ├── state.py # Pydantic state models
	│ ├── client.py # Multi-backend LLM client
	│ ├── constants.py # Named constants (backends, tool names, demo questions)
	│ ├── tools/ # The four tools
	│ │ ├── inspect_schema.py
	│ │ ├── run_sql.py
	│ │ ├── compare_periods.py
	│ │ └── decompose_metric.py
	│ └── prompts/ # System + critique prompts
	│
	├── client/
	│ ├── backend/ # FastAPI server
	│ │ ├── main.py # GET /health, POST /api/investigate
	│ │ ├── deps.py # Dependency injection (graph instance)
	│ │ ├── sse.py # Server-Sent Events formatting
	│ │ └── tests/
	│ └── frontend/ # Next.js app
	│ ├── src/app/page.tsx # Main page
	│ └── package.json
	│
	├── data/
	│ ├── parquet/ # Dataset files (gitignored)
	│ └── semantic_layer.yml # Metadata + business context
	│
	├── tests/ # Python smoke tests
	│ ├── test_tools.py
	│ ├── test_client_backends.py
	│ └── test_agent_smoke.py
	│
	├── docs/ # Documentation
	│ ├── CONTRIBUTING.md # This file
	│ ├── RUNBOOK.md # Deployment guide
	│ └── why-agent-architecture.png
	│
	├── streamlit_app.py # Standalone Streamlit UI
	├── pyproject.toml # Python deps + commands
	├── docker/ # Containers
	│ ├── Dockerfile # Multi-stage build
	│ ├── entrypoint.sh # HF Spaces boot script
	│ ├── nginx.conf # Reverse proxy config
	│ └── supervisord.conf # Process management
	│
	└── README.md # Project overview + business context
	```

	---

	## Architecture Overview

	```
	┌─────────────────────────────────┐
	│ Streamlit UI │
	│ (streamlit_app.py) │
	└────────────┬────────────────────┘
	│
	┌─────┴──────┐
	│ │
	▼ ▼
	┌──────────────┐ ┌──────────────────┐
	│ Next.js │ │ FastAPI Backend │
	│ (client/ │ │ (client/backend/ │
	│ frontend/) │ │ main.py) │
	└──────────────┘ └────────┬─────────┘
	│
	┌─────▼─────┐
	│ LangGraph │
	│ Agent │
	└─────┬─────┘
	│
	┌───────────────┼───────────────┐
	│ │ │
	▼ ▼ ▼
	┌──────────┐ ┌──────────────┐ ┌──────────┐
	│DuckDB │ │Pydantic │ │LLM Client│
	│(Parquet) │ │Tools Schemas │ │(3 backends)
	└──────────┘ └──────────────┘ └──────────┘
	```

	---

	## Common Issues & Solutions

	### ModuleNotFoundError: No module named 'agent'

	Solution: Make sure you're in the repo root and have run `uv sync`.

	```bash
	cd /home/ysh/dev/why-agent
	uv sync
	```

	### Tests fail with "No MINIMAX_API_KEY"

	Solution: Use `MODEL_BACKEND=replay` for local testing. Replay mode doesn't call any LLM.

	```bash
	export MODEL_BACKEND=replay
	uv run pytest
	```

	### Ruff formatting conflicts with editor

	Solution: Use the commands above — they're the source of truth.

	```bash
	uv run ruff format
	uv run ruff check --fix
	```

	### Next.js frontend doesn't build

	Solution: Make sure Node 20+ is installed and `npm install` ran successfully.

	```bash
	node --version # should be v20+
	cd client/frontend
	npm install
	npm run build
	```

	---

	## Coding Conventions

	Per `CLAUDE.md`, follow these conventions:

	1. Sync by default — DuckDB has no async API. Use `async def` only at the LLM boundary.
	2. Pydantic v2 — All structured data (tool inputs/outputs, state, semantic layer).
	3. Type annotations — Required on public functions (args and return type).
	4. No print() — Use `logger = logging.getLogger(__name__)` in agent code.
	5. No magic strings — Backend names, tool names, scenario IDs go in `agent/constants.py`.
	6. Tool docstrings for the LLM — Write them as if the model will read them.

	Example tool:

	```python
	from pydantic import BaseModel, Field
	import logging

	logger = logging.getLogger(__name__)

	class MyToolInput(BaseModel):
	query: str = Field(description="A human-readable query.")

	def my_tool(args: MyToolInput) -> dict:
	"""Use this tool to do X. Returns a dict with 'result' and optional 'error'."""
	try:
	result = ...
	return {"result": result}
	except Exception as exc:
	logger.exception("Failed")
	return {"error": str(exc), "hint": "Try Y instead"}
	```

	---

	## Deployment

	### Local Docker build

	To test the full stack locally (frontend + backend + agent) in a container:

	```bash
	docker build -t why-agent:latest .
	docker run -p 7860:7860 -e MODEL_BACKEND=replay why-agent:latest
	```

	Then open `http://localhost:7860`.

	### Remote push rules

	The repo has two git remotes with different push policies:

	\| Remote \| Purpose \| When to push \|
	\|--------\|---------\|-------------\|
	\| `origin` (GitHub) \| Source of truth, PRs, CI \| Every commit — always push here \|
	\| `space` (HF Spaces) \| Deployment target \| Only when opening a PR \|

	```bash
	# Normal dev — push to GitHub only
	git push origin feat/my-feature

	# Deploy to HF Spaces — only when PR is ready
	git push space feat/my-feature:main --force
	```

	HF Spaces triggers a full Docker rebuild on every push. Do not push to `space` during iteration — only when the branch is ready for demo/review and a PR is being opened.

	### HF Spaces environment variables

	When deploying to HF Spaces, set these secrets in the Space settings:

	\| Variable \| Value \| Purpose \|
	\|----------\|-------\|---------\|
	\| `MODEL_BACKEND` \| `replay` or `minimax` \| LLM backend; use `replay` to avoid API costs \|
	\| `MINIMAX_API_KEY` \| (API key) \| Required only if `MODEL_BACKEND=minimax` \|
	\| `HF_DATASET_ID` \| (optional) \| Dataset repo ID to auto-download Parquet files on boot \|
	\| `PARQUET_DIR` \| `/app/data/parquet` \| Path inside container (do not change) \|
	\| `SEMANTIC_LAYER_PATH` \| `/app/data/semantic_layer.yml` \| Path inside container (do not change) \|

	Note: Paths in the container must use `/app/` prefix, not relative paths.

	### HF Spaces deployment procedure

	#### Quick start

	1. Create a new Space on [huggingface.co/spaces](https://huggingface.co/spaces):
	- Owner: your username
	- Space name: `why-agent` (or any name)
	- License: MIT
	- Docker template (or blank)

	2. Link the repo:
	```bash
	cd /path/to/why-agent
	git remote add space https://huggingface.co/spaces/{username}/{space-name}
	```

	3. Push to deploy (only when ready):
	```bash
	git push space feat/my-feature:main --force
	```

	4. Set secrets in the Space UI → Settings → Repository secrets:
	- `MINIMAX_API_KEY` (if using MiniMax backend)
	- `HF_DATASET_ID` (optional; see below)

	#### How the build works

	1. HF Spaces detects the `Dockerfile` in the repo root
	2. Builds the image (takes ~5–10 minutes the first time)
	3. Runs the container on port 7860
	4. The `entrypoint.sh` script starts nginx, backend, and frontend via supervisord

	#### Auto-downloading Parquet data

	If you set `HF_DATASET_ID=ysh99226/why-agent-data`, the entrypoint will:
	1. Check if `/app/data/parquet` is empty
	2. Run `hf download` to fetch the dataset
	3. Timeout after 120 seconds and fall back to `MODEL_BACKEND=replay`

	The `hf` command (from `huggingface-hub` package) replaces the deprecated `huggingface-cli`.

	#### Git workflow for deployment

	Do NOT push to HF Spaces during development.

	1. Work on a feature branch:
	```bash
	git checkout -b feat/my-feature
	git push origin feat/my-feature
	```

	2. Open a PR on GitHub when ready.

	3. Deploy to HF Spaces only when the PR is ready to demo:
	```bash
	git push space main:main --force
	```

	Or, if the feature branch is the one being demoed (before merge):
	```bash
	git push space feat/my-feature:main --force
	```

	Why `--force`? HF Spaces doesn't have a traditional git history. Using `--force` ensures the Space always reflects the exact commit you push, even if the branch history differs from the origin.

	---

	## Docker build errors & fixes

	### "replays/ directory not found" or "missing JSON files"

	Cause: The Dockerfile expects `replays/` to exist and contain at least one `.json` file for `MODEL_BACKEND=replay` to work.

	Fix:
	```bash
	# Create dummy replay if needed
	mkdir -p replays
	echo '{"scenario": "demo"}' > replays/demo.json
	git add replays/demo.json
	git commit -m "chore: add demo replay"
	```

	Then rebuild the Docker image.

	### "SEMANTIC_LAYER_PATH not found" or "semantic_layer.yml missing"

	Cause: The Dockerfile copies `data/semantic_layer_6w.yml` but the file doesn't exist.

	Fix:
	```bash
	# Check the actual filename
	ls -la data/semantic_layer*

	# If using a different name, update the Dockerfile COPY line
	COPY data/semantic_layer_6w.yml /app/data/semantic_layer.yml
	```

	Or, if you're using a different semantic layer file:
	```dockerfile
	COPY data/YOUR_SEMANTIC_LAYER.yml /app/data/semantic_layer.yml
	```

	### "supervisord can't find environment variables" or "MODEL_BACKEND not set in child processes"

	Cause: Environment variables set in `ENV` commands are not automatically passed to supervisord child processes.

	Fix: The `docker/supervisord.conf` must explicitly read env vars via `environment=` lines:

	```ini
	[program:backend]
	command=/app/.venv/bin/uvicorn ...
	environment=PYTHONUNBUFFERED="1",MODEL_BACKEND="replay"
	```

	Or pass them in the command itself. Rebuild the image after fixing `supervisord.conf`.

	### "huggingface-cli: command not found"

	Cause: The old `huggingface-cli` tool is deprecated. The project uses the newer `hf` command from `huggingface-hub` package.

	Fix: The Dockerfile includes `huggingface-hub` in `pyproject.toml`. The `entrypoint.sh` script uses `hf download`, which is the correct command.

	If the entrypoint still fails:
	```bash
	# Verify hf is installed
	docker run -it why-agent:latest /app/.venv/bin/hf --version

	# If missing, add to pyproject.toml
	uv add huggingface-hub
	```

	### "next: command not found" or "Node.js frontend doesn't start"

	Cause: The Next.js build failed, or the `server.js` file is missing.

	Fix:
	1. Check the build log for `npm run build` errors
	2. Ensure `client/frontend/package.json` exists and has a valid build script
	3. Rebuild the Docker image:
	```bash
	docker build --no-cache -t why-agent:latest .
	```

	### "nginx bind: address already in use"

	Cause: Port 7860 or 80 is already bound on your machine.

	Fix (local testing):
	```bash
	docker run -p 8080:7860 -e MODEL_BACKEND=replay why-agent:latest
	# Now visit http://localhost:8080
	```

	On HF Spaces, port 7860 is reserved and managed by the platform — no action needed.

	### "ModuleNotFoundError: No module named 'agent'"

	Cause: The Python path is not set correctly in the container.

	Fix: The Dockerfile sets `ENV PYTHONPATH=/app`, which should work. If it doesn't:
	1. Verify `COPY agent/ /app/agent/` in the Dockerfile
	2. Check that the `backend` program in supervisord uses the full venv path: `/app/.venv/bin/uvicorn`

	### "API route returns 404" or "Frontend can't reach backend"

	Cause: nginx is not configured to reverse-proxy to the backend on 127.0.0.1:8000.

	Fix: Check `docker/nginx.conf`:
	```nginx
	location /api/ {
	proxy_pass http://127.0.0.1:8000;
	proxy_set_header X-Real-IP $remote_addr;
	...
	}
	```

	Rebuild after fixing the config:
	```bash
	docker build --no-cache -t why-agent:latest .
	```

	---

	## Health check & monitoring

	### Verify all services are running

	```bash
	# Inside the container or from host
	curl http://localhost:7860/api/health
	# Expected: {"ok":true}

	curl http://localhost:7860/
	# Expected: HTML (Next.js frontend)

	curl -X POST http://localhost:7860/api/investigate \
	-H "Content-Type: application/json" \
	-d '{"question":"Why did revenue go up?"}'
	# Expected: Server-Sent Event stream
	```

	### Check logs in HF Spaces

	Click "Logs" in the top right of the Space UI. The logs show:
	- nginx startup
	- backend startup (uvicorn)
	- frontend startup (Node.js)
	- Any errors from the agent or tools

	### Common troubleshooting flows

	The frontend loads but the backend is down:
	1. Check Space logs (UI → Logs)
	2. Verify `PYTHONPATH=/app` is set in the Dockerfile
	3. Verify `supervisord.conf` has the correct backend command
	4. Rebuild without cache and push:
	```bash
	git push space feat/my-feature:main --force
	```

	The API returns 500 errors but logs show nothing:
	1. The agent code may have an unhandled exception
	2. Check the agent's error handling in `agent/graph.py`
	3. Verify the semantic layer file exists at `/app/data/semantic_layer.yml`
	4. Test locally:
	```bash
	docker run -e MODEL_BACKEND=replay why-agent:latest
	curl http://localhost:7860/api/health
	```

	Parquet data auto-download timed out, but I want to retry:
	The entrypoint waits 120 seconds for the HF dataset download, then falls back to `MODEL_BACKEND=replay`. If you want a fresh download:
	1. Manually clear the parquet directory in the Space (if you have SSH access)
	2. Or, restart the Space (UI → Settings → Restart)
	3. The entrypoint will retry on next boot

	I pushed to the Space but the changes didn't appear:
	1. Verify you pushed to the correct branch (should push `*:main`):
	```bash
	git push space feat/my-feature:main --force
	```

	2. HF Spaces can take 5–10 minutes to rebuild. Wait and refresh after 2 minutes.

	3. If the Space still doesn't update:
	- Click "Restart" in the Space UI
	- Or delete and recreate the Space

	---

	## Reporting Issues

	If you find a bug or have a feature request:
	1. Check existing issues in GitHub
	2. Provide a minimal reproduction (code snippet + data)
	3. Include your environment (Python version, OS, backend)

	---

	## Getting Help

	- CLAUDE.md — Implementation decisions and locked constraints
	- README.md — Business context and architecture
	- Agent code — Read `agent/graph.py` to understand the loop; read `agent/tools/` to see tool contracts
	- LangGraph docs — https://langchain-ai.github.io/langgraph/
	- Pydantic docs — https://docs.pydantic.dev/

	---

	Last updated: 2026-05-07