Spaces:

Zappandy
/

Kirana_AI

Sleeping

App Files Files Community

Kirana_AI / docs /plan_react_agent.md

Zappandy

Deploy to HF Space

dae60e5 15 days ago

preview code

Raw

History Blame Contribute Delete

3.52 kB

	# ReAct Agent Status For Hugging Face Spaces

	This document describes the current role of the lean ReAct router in the
	HF Spaces-oriented runtime.

	## Role

	ReAct is the app-side orchestrator, not the model.

	It records:

	```text
	Thought -> Action -> Observation
	```

	and chooses the smallest safe tool chain for a task. The tools may call
	deterministic Python, Hugging Face Inference, Modal endpoints, or local
	llama.cpp depending on configuration.

	ReAct must never write inventory. It returns editable rows, pending actions, and
	trace lines. Owner approval remains the write boundary.

	## Current Live Paths

	### Receipt photo

	```text
	POST /api/photo
	-> ReceiptReActAgent.extract_receipt_image
	-> extract_text_from_receipt_image tool
	-> dukaan_saathi/integrations/modal_receipt.py
	-> MODAL_RECEIPT_ENDPOINT
	-> modal_apps/receipt_vlm_service.py
	-> parse_receipt_text_tool
	-> hf_inference / modal_llm / llamacpp / deterministic
	-> post-match rows to inventory catalog
	-> editable receipt rows in UI
	-> owner applies row
	-> inventory write
	```

	For HF Spaces, the preferred receipt text parser backend is
	`RECEIPT_BACKEND=hf_inference` with `HF_RECEIPT_MODEL_REPO` set.

	### Voice command

	```text
	POST /api/speech
	-> Modal ASR endpoint
	-> transcript
	-> _h_voice_command
	-> run_command_parse
	-> ReceiptReActAgent.parse_stock_command
	-> pending stock action
	-> owner clicks Approve stock change
	-> _h_voice_apply
	-> inventory write
	```

	The ASR step is separate from ReAct. ReAct starts after text exists.

	### Receipt text

	The Gradio path already routes text parsing through the ReAct agent with
	configured fallback behavior. The custom FastAPI path uses ReAct for photo OCR
	flows, then the parser tool for receipt text.

	## Tool Responsibilities

	- `extract_text_from_receipt_image`: calls the Modal OCR HTTP client.
	- `parse_receipt_text_tool`: respects `RECEIPT_BACKEND`.
	- `parse_stock_command_tool`: uses the deterministic stock parser today.
	- `draft_reorder_tool`: reads inventory and drafts reorder suggestions.
	- `propose_inventory_update`: stores a pending proposal only; no DB write.

	Modal/HF/llama.cpp inference belongs behind tools or integration clients, not
	inside UI handlers.

	## Completed

	- Custom FastAPI photo path calls `ReceiptReActAgent` first.
	- Custom FastAPI voice command path calls `ReceiptReActAgent` first.
	- Voice trace is shown in the parsed voice result panel.
	- Receipt rows are returned as editable rows and post-matched against inventory.
	- Voice actions are pending until owner approval.
	- Direct fallback paths remain for robustness if ReAct or a configured backend
	fails.

	## Remaining Cleanup

	### 1. Heavier ToolCallingAgent

	`dukaan_saathi/agent/agent.py` still contains the heavier smolagents
	`ToolCallingAgent`. For HF Spaces, do not make this primary unless there is a
	clear demo need. If revived, it should use an HF-compatible model client and
	must preserve the same owner approval gate.

	### 2. Multi-step agent UX

	Do not add broad agent chat unless it has a concrete user workflow. The current
	demo benefits more from reliable receipt, voice, reorder, and approval flows
	than from open-ended agent chat.

	## HF Spaces Guidance

	- Prefer `hf_inference` for receipt text parsing in the public Space.
	- Use Modal for OCR and ASR only when endpoint env vars are configured.
	- Keep `/api/warm` best-effort and non-blocking.
	- Keep deterministic parser fallbacks for tests and graceful degradation.
	- Never require local llama.cpp in the Space runtime.