farm-layout-model / LLM_API_DOCS.md
spacedout-bits's picture
Add LLM inference API endpoints for farmer chat assistant
0445ac9
|
Raw
History Blame Contribute Delete
13 kB
# Farm GPT LLM Inference API\n\nREST API for the farmer chat assistant. Stateless endpoints for LLM inference powered by HuggingFace.\n\n## Overview\n\nThe LLM API is exposed at `/api/v1/llm/` and provides stateless inference endpoints. Your external service (web UI, mobile app, etc.) is responsible for maintaining conversation state and user sessions.\n\n**Key features:**\n- Stateless chat inference\n- Farm context grounding (optional)\n- Context validation before inference\n- Batch requests for bulk analysis\n- Health monitoring\n\n---\n\n## Endpoints\n\n### 1. Health Check\n\n```http\nGET /api/v1/llm/health\n```\n\nVerify LLM backend accessibility and get latency info.\n\n**Response (200 OK):**\n```json\n{\n \"status\": \"ok\",\n \"model\": \"mistralai/Mistral-7B-Instruct-v0.1\",\n \"latency_ms\": 12.5,\n \"message\": \"LLM backend is reachable\"\n}\n```\n\n**Response (503 Service Unavailable):**\n```json\n{\n \"status\": \"unavailable\",\n \"message\": \"LLM backend unavailable: HF token not found...\"\n}\n```\n\n**Use case:** Check backend health before sending requests to `/chat`.\n\n---\n\n### 2. Chat Inference\n\n```http\nPOST /api/v1/llm/chat\n```\n\nSend a message to the farmer assistant and get a response.\n\n**Request:**\n```json\n{\n \"content\": \"How much water should tomatoes get weekly?\",\n \"conversation_history\": [\n {\"role\": \"user\", \"content\": \"What crops grow here?\"},\n {\"role\": \"assistant\", \"content\": \"Tomatoes, peppers, lettuce...\"}\n ],\n \"farm_context\": {\n \"farm_name\": \"Johnson Farm\",\n \"crop\": \"tomato\",\n \"area_ha\": 2.5,\n \"design_summary\": {}\n },\n \"max_tokens\": 256,\n \"temperature\": 0.7\n}\n```\n\n**Request fields:**\n\n| Field | Type | Required | Default | Notes |\n|-------|------|----------|---------|-------|\n| `content` | string | βœ“ | β€” | User message (1–2000 chars) |\n| `conversation_history` | array | β€” | null | Prior messages in `{\"role\": \"user\"|\"assistant\", \"content\": \"...\"}` format |\n| `farm_context` | object | β€” | null | Farm metadata (see below) |\n| `max_tokens` | integer | β€” | 256 | Max response length (10–1024) |\n| `temperature` | float | β€” | 0.7 | Creativity (0=deterministic, 2=very creative) |\n\n**Farm context fields (all optional):**\n- `farm_name`: string β€” Name of the farm\n- `crop`: string β€” Crop type (tomato, pepper, lettuce, cucumber, orchard, generic)\n- `area_ha`: number β€” Farm area in hectares\n- `design_summary`: object β€” Design metadata from `/rest/v1/design`\n\n**Response (200 OK):**\n```json\n{\n \"content\": \"For tomatoes, apply 25-40mm of water per week...\",\n \"timestamp\": \"2026-06-18T12:30:00+00:00\",\n \"model\": \"mistralai/Mistral-7B-Instruct-v0.1\",\n \"tokens_used\": 145,\n \"latency_ms\": 2450,\n \"metadata\": {\n \"farm_context_provided\": true,\n \"conversation_history_length\": 2\n }\n}\n```\n\n**Response (422 Validation Error):**\n```json\n{\n \"detail\": {\n \"code\": \"invalid_farm_context\",\n \"message\": \"Farm context validation failed\",\n \"errors\": [\"'area_ha' must be a number\"]\n }\n}\n```\n\n**Response (500 Inference Error):**\n```json\n{\n \"detail\": \"LLM inference failed: Request timed out after 30s\"\n}\n```\n\n**Use case:** Core endpoint for multi-turn conversations. Client maintains history and passes it with each request.\n\n---\n\n### 3. Validate Context\n\n```http\nPOST /api/v1/llm/validate-context\n```\n\nValidate farm context before using it in chat. Catch issues early without spending LLM tokens.\n\n**Request:**\n```json\n{\n \"farm_context\": {\n \"farm_name\": \"Smith Farm\",\n \"crop\": \"lettuce\",\n \"area_ha\": 0.5\n }\n}\n```\n\n**Response (200 OK):**\n```json\n{\n \"valid\": true,\n \"warnings\": [],\n \"errors\": []\n}\n```\n\n**Response with warnings:**\n```json\n{\n \"valid\": true,\n \"warnings\": [\n \"Unknown crop 'sugarcanr'. Expected one of: tomato, pepper, lettuce, cucumber, orchard, generic\"\n ],\n \"errors\": []\n}\n```\n\n**Response with errors:**\n```json\n{\n \"valid\": false,\n \"warnings\": [],\n \"errors\": [\"'area_ha' must be a number\"]\n}\n```\n\n**Validation rules:**\n- Required (fail): `area_ha` is a number if present\n- Recommended: `farm_name`, `crop`, `area_ha`\n- Optional warnings: Unknown crop, missing recommended fields\n\n**Use case:** Pre-validate context before `/chat` to avoid wasting LLM tokens on invalid requests.\n\n---\n\n### 4. Batch Chat\n\n```http\nPOST /api/v1/llm/chat/batch\n```\n\nSend multiple messages in a single request (up to 10).\n\n**Request:**\n```json\n[\n {\n \"content\": \"What is drip irrigation?\",\n \"max_tokens\": 100\n },\n {\n \"content\": \"How do I install valves?\",\n \"max_tokens\": 100\n },\n {\n \"content\": \"What is emitter spacing?\",\n \"max_tokens\": 100,\n \"farm_context\": {\n \"crop\": \"tomato\",\n \"area_ha\": 1.0\n }\n }\n]\n```\n\n**Response (200 OK):**\n```json\n[\n {\n \"content\": \"Drip irrigation is a method of watering plants...\",\n \"timestamp\": \"2026-06-18T12:30:00+00:00\",\n \"model\": \"mistralai/Mistral-7B-Instruct-v0.1\",\n \"tokens_used\": 120,\n \"latency_ms\": 2100,\n \"metadata\": {\"farm_context_provided\": false}\n },\n {\n \"content\": \"To install valves: 1) Plan your zones...\",\n \"timestamp\": \"2026-06-18T12:30:02+00:00\",\n \"model\": \"mistralai/Mistral-7B-Instruct-v0.1\",\n \"tokens_used\": 135,\n \"latency_ms\": 1950,\n \"metadata\": {\"farm_context_provided\": false}\n },\n {\n \"content\": \"Emitter spacing depends on soil type...\",\n \"timestamp\": \"2026-06-18T12:30:04+00:00\",\n \"model\": \"mistralai/Mistral-7B-Instruct-v0.1\",\n \"tokens_used\": 110,\n \"latency_ms\": 2050,\n \"metadata\": {\"farm_context_provided\": true}\n }\n]\n```\n\n**Constraints:**\n- Max 10 requests per batch\n- Each request is independent (no conversation history carried between items)\n- Returns response array in same order as request\n\n**Use case:** Bulk analysis, FAQ generation, or multi-question surveys.\n\n---\n\n## Error Handling\n\n### HTTP Status Codes\n\n| Code | Meaning | Example |\n|------|---------|----------|\n| 200 | Success | Chat response generated |\n| 422 | Validation Error | Invalid farm context or oversized content |\n| 500 | Inference Error | LLM backend failure or timeout |\n| 503 | Service Unavailable | HF token not configured |\n\n### Common Error Scenarios\n\n**Missing HF token:**\n```bash\ncurl http://localhost:7860/api/v1/llm/health\n# β†’ 503 Service Unavailable\n```\n\n**Oversized content (>2000 chars):**\n```bash\ncurl -X POST http://localhost:7860/api/v1/llm/chat \\\n -H \"Content-Type: application/json\" \\\n -d '{\"content\": \"'\"'\"'x{3000}'\"'\"'\"}'\n# β†’ 422 Validation Error\n```\n\n**Invalid farm context:**\n```bash\ncurl -X POST http://localhost:7860/api/v1/llm/chat \\\n -H \"Content-Type: application/json\" \\\n -d '{\"content\": \"...\", \"farm_context\": {\"area_ha\": \"not a number\"}}'\n# β†’ 422 Validation Error with error details\n```\n\n---\n\n## Integration Examples\n\n### JavaScript/Web UI\n\n```javascript\n// Simple chat\nconst response = await fetch('http://localhost:7860/api/v1/llm/chat', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({\n content: 'How often should I water?',\n farm_context: {\n farm_name: 'My Farm',\n crop: 'tomato',\n area_ha: 1.5\n },\n max_tokens: 200,\n temperature: 0.7\n })\n});\n\nconst { content, latency_ms } = await response.json();\nconsole.log(`Response (${latency_ms}ms): ${content}`);\n\n// Multi-turn conversation\nconst messages = [];\n\nfunction addMessage(role, content) {\n messages.push({ role, content });\n}\n\nasync function chat(userMessage) {\n addMessage('user', userMessage);\n const response = await fetch('http://localhost:7860/api/v1/llm/chat', {\n method: 'POST',\n headers: { 'Content-Type': 'application/json' },\n body: JSON.stringify({\n content: userMessage,\n conversation_history: messages.slice(0, -1), // Exclude current user message\n farm_context: { crop: 'tomato', area_ha: 2.0 }\n })\n });\n const { content } = await response.json();\n addMessage('assistant', content);\n return content;\n}\n\nawait chat('What is drip irrigation?');\nawait chat('Can I use it for peppers?');\n```\n\n### Python Client\n\n```python\nimport requests\n\nclass FarmerChatClient:\n def __init__(self, base_url='http://localhost:7860'):\n self.base_url = base_url\n self.history = []\n self.farm_context = {}\n \n def set_farm_context(self, **kwargs):\n \"\"\"Set farm metadata (crop, area_ha, farm_name, etc.)\"\"\"\n self.farm_context.update(kwargs)\n \n def chat(self, message: str, max_tokens: int = 256) -> str:\n \"\"\"Send a message and get a response.\"\"\"\n response = requests.post(\n f'{self.base_url}/api/v1/llm/chat',\n json={\n 'content': message,\n 'conversation_history': self.history,\n 'farm_context': self.farm_context,\n 'max_tokens': max_tokens,\n 'temperature': 0.7,\n }\n )\n response.raise_for_status()\n \n data = response.json()\n assistant_message = data['content']\n \n # Add to conversation history\n self.history.append({'role': 'user', 'content': message})\n self.history.append({'role': 'assistant', 'content': assistant_message})\n \n return assistant_message\n\n# Usage\nclient = FarmerChatClient()\nclient.set_farm_context(farm_name='Johnson Farm', crop='tomato', area_ha=2.5)\n\nprint(client.chat('How often should I water tomatoes?'))\nprint(client.chat('What about in dry seasons?')) # Uses conversation history\n```\n\n### cURL Examples\n\n```bash\n# Health check\ncurl http://localhost:7860/api/v1/llm/health | jq\n\n# Simple chat\ncurl -X POST http://localhost:7860/api/v1/llm/chat \\\n -H \"Content-Type: application/json\" \\\n -d '{\n \"content\": \"What is drip irrigation?\",\n \"max_tokens\": 150\n }' | jq '.content'\n\n# Chat with farm context\ncurl -X POST http://localhost:7860/api/v1/llm/chat \\\n -H \"Content-Type: application/json\" \\\n -d '{\n \"content\": \"How much water should I apply?\",\n \"farm_context\": {\n \"farm_name\": \"Smith Farm\",\n \"crop\": \"tomato\",\n \"area_ha\": 1.5\n },\n \"max_tokens\": 200\n }' | jq '.content'\n\n# Validate context before chat\ncurl -X POST http://localhost:7860/api/v1/llm/validate-context \\\n -H \"Content-Type: application/json\" \\\n -d '{\n \"farm_context\": {\n \"crop\": \"invalid_crop\",\n \"area_ha\": 0.5\n }\n }' | jq\n\n# Batch requests\ncurl -X POST http://localhost:7860/api/v1/llm/chat/batch \\\n -H \"Content-Type: application/json\" \\\n -d '[\n {\"content\": \"What is drip irrigation?\", \"max_tokens\": 100},\n {\"content\": \"How do I install valves?\", \"max_tokens\": 100}\n ]' | jq\n```\n\n---\n\n## Performance Considerations\n\n### Latency\n- Typical: 1.5–3 seconds for 100–200 token responses\n- First request: May take 5–10s if model is loading\n- Use `max_tokens` to control response length and latency\n\n### Throughput\n- HuggingFace Inference API has request rate limits\n- Use batch endpoint for multiple questions (more efficient)\n- Implement client-side request queuing if needed\n\n### Token Estimation\n- Roughly 1 token β‰ˆ 4 characters\n- Response `tokens_used` includes both input and output\n\n---\n\n## Configuration\n\n### Environment Setup\n\nThe API requires a HuggingFace API token:\n\n```bash\n# Option 1: Environment variable\nexport HF_TOKEN=hf_your_token_here\npython app.py\n\n# Option 2: secret.txt file (same directory as app.py)\necho \"hf_your_token_here\" > secret.txt\npython app.py\n\n# Option 3: Passed to FarmerAssistant directly (in code)\n# See llm_chat.py for details\n```\n\n### Model Selection\n\nDefault model: `mistralai/Mistral-7B-Instruct-v0.1`\n\nTo use a different model, edit `llm_chat.py`:\n\n```python\nassistant = FarmerAssistant(\n model_id=\"HuggingFace/ModelName\",\n api_token=\"hf_your_token_here\"\n)\n```\n\n---\n\n## Testing\n\nRun the test suite to verify all endpoints:\n\n```bash\n# Terminal 1: Start the server\npython app.py\n\n# Terminal 2: Run tests\npython test_llm_api.py\n```\n\nThis tests:\n- Health check\n- Simple and contextual chat\n- Multi-turn conversations\n- Context validation\n- Batch requests\n- Error handling\n\n---\n\n## OpenAPI Documentation\n\nOnce the server is running, view interactive API docs:\n\n```\nhttp://localhost:7860/docs\n```\n\nThis page (auto-generated by FastAPI) shows all endpoints, request/response schemas, and try-it-out forms.\n"