# ReAct Agent Status For Hugging Face Spaces This document describes the current role of the lean ReAct router in the HF Spaces-oriented runtime. ## Role ReAct is the app-side orchestrator, not the model. It records: ```text Thought -> Action -> Observation ``` and chooses the smallest safe tool chain for a task. The tools may call deterministic Python, Hugging Face Inference, Modal endpoints, or local llama.cpp depending on configuration. ReAct must never write inventory. It returns editable rows, pending actions, and trace lines. Owner approval remains the write boundary. ## Current Live Paths ### Receipt photo ```text POST /api/photo -> ReceiptReActAgent.extract_receipt_image -> extract_text_from_receipt_image tool -> dukaan_saathi/integrations/modal_receipt.py -> MODAL_RECEIPT_ENDPOINT -> modal_apps/receipt_vlm_service.py -> parse_receipt_text_tool -> hf_inference / modal_llm / llamacpp / deterministic -> post-match rows to inventory catalog -> editable receipt rows in UI -> owner applies row -> inventory write ``` For HF Spaces, the preferred receipt text parser backend is `RECEIPT_BACKEND=hf_inference` with `HF_RECEIPT_MODEL_REPO` set. ### Voice command ```text POST /api/speech -> Modal ASR endpoint -> transcript -> _h_voice_command -> run_command_parse -> ReceiptReActAgent.parse_stock_command -> pending stock action -> owner clicks Approve stock change -> _h_voice_apply -> inventory write ``` The ASR step is separate from ReAct. ReAct starts after text exists. ### Receipt text The Gradio path already routes text parsing through the ReAct agent with configured fallback behavior. The custom FastAPI path uses ReAct for photo OCR flows, then the parser tool for receipt text. ## Tool Responsibilities - `extract_text_from_receipt_image`: calls the Modal OCR HTTP client. - `parse_receipt_text_tool`: respects `RECEIPT_BACKEND`. - `parse_stock_command_tool`: uses the deterministic stock parser today. - `draft_reorder_tool`: reads inventory and drafts reorder suggestions. - `propose_inventory_update`: stores a pending proposal only; no DB write. Modal/HF/llama.cpp inference belongs behind tools or integration clients, not inside UI handlers. ## Completed - Custom FastAPI photo path calls `ReceiptReActAgent` first. - Custom FastAPI voice command path calls `ReceiptReActAgent` first. - Voice trace is shown in the parsed voice result panel. - Receipt rows are returned as editable rows and post-matched against inventory. - Voice actions are pending until owner approval. - Direct fallback paths remain for robustness if ReAct or a configured backend fails. ## Remaining Cleanup ### 1. Heavier ToolCallingAgent `dukaan_saathi/agent/agent.py` still contains the heavier smolagents `ToolCallingAgent`. For HF Spaces, do not make this primary unless there is a clear demo need. If revived, it should use an HF-compatible model client and must preserve the same owner approval gate. ### 2. Multi-step agent UX Do not add broad agent chat unless it has a concrete user workflow. The current demo benefits more from reliable receipt, voice, reorder, and approval flows than from open-ended agent chat. ## HF Spaces Guidance - Prefer `hf_inference` for receipt text parsing in the public Space. - Use Modal for OCR and ASR only when endpoint env vars are configured. - Keep `/api/warm` best-effort and non-blocking. - Keep deterministic parser fallbacks for tests and graceful degradation. - Never require local llama.cpp in the Space runtime.