Spaces:

ArchCoder
/

medintake-ai

Sleeping

App Files Files Community

medintake-ai / README.md

priyansh-saxena1

feat: migrate inference engine to Ollama for 10x faster CPU inference

4e16e37 22 days ago

preview code

raw

history blame contribute delete

6.82 kB

	---
	title: Clinical Intake Agent
	emoji: 🏥
	colorFrom: blue
	colorTo: green
	sdk: docker
	pinned: false
	---

	# Clinical Intake Agent

	A LangGraph-based conversational agent for conducting pre-visit clinical intakes with simulated patients. The agent generates a structured ClinicalBrief (Chief Complaint, HPI, ROS) at the end of the conversation.

	## Features

	- Multi-turn conversation with stateful memory using LangGraph checkpointing
	- Structured clinical data collection: Chief Complaint, HPI (OPQRST), and ROS
	- Conditional ROS scoping: Adapts review of systems based on chief complaint
	- Vague answer handling: Gracefully re-prompts when patient responses are unclear
	- Dual mode: Runs as FastAPI web app OR CLI tool
	- Mock/Real LLM: Switch between mock responses and real local LLM via environment variable

	## Architecture

	```
	Patient → triage_node → agent_node → (done or loop back for next question)
	```

	### Inference Engine

	- Local dev (mock): `MOCK_LLM=true` — regex-based MockLLM, 0ms latency
	- Production: `MOCK_LLM=false` — Ollama local server (`qwen2.5:0.5b`, C++ optimized)
	- ~2s per turn on CPU vs 25s with raw PyTorch

	### State Graph Nodes

	1. triage_node: Detects acute emergency phrases → immediate 🚨 alert
	2. agent_node: Single LLM call — extracts all HPI/ROS fields AND generates next question
	When all fields complete, builds ClinicalBrief inline (no extra LLM call)

	## Deployment on Hugging Face Spaces

	This repo is configured as a Docker SDK Space. On every push:

	1. Docker image builds — Ollama gets installed via official install script
	2. `startup.sh` starts on container boot: launches Ollama, pulls `qwen2.5:0.5b`, starts FastAPI
	3. App is live on port 7860

	```bash
	# Test the Docker build locally before pushing
	docker build -t clinical-intake .
	docker run -p 7860:7860 clinical-intake
	```

	## Local Development

	```bash
	# Fast mock mode (no model needed, instant responses)
	MOCK_LLM=true uvicorn app.main:app --reload

	# Real Ollama mode — requires Ollama installed at localhost:11434
	ollama serve &
	ollama pull qwen2.5:0.5b
	MOCK_LLM=false uvicorn app.main:app --reload
	```

	## Usage

	### FastAPI Web App

	#### Health Check
	```bash
	curl http://localhost:7860/health
	# Response: {"status": "ok", "mock_mode": true}
	```

	#### Chat Endpoint
	```bash
	# Start conversation
	curl -X POST http://localhost:7860/chat \
	-H "Content-Type: application/json" \
	-d '{"session_id": "patient123", "message": "hello"}'

	# Continue conversation
	curl -X POST http://localhost:7860/chat \
	-H "Content-Type: application/json" \
	-d '{"session_id": "patient123", "message": "I have chest pain"}'

	# Final response includes clinical_brief when state == "done"
	```

	### CLI Mode

	```bash
	# Run interactive CLI
	python app/main.py --cli

	# Example session:
	# Agent: Hello! I'm here to help you with your pre-visit intake. What brings you in today?
	# You: I have chest pain since this morning
	# Agent: I understand you're experiencing chest pain. When did it first start?
	# ... (continues through HPI and ROS) ...
	# Agent: Your clinical intake is complete. Here is your summary:
	# {
	# "chief_complaint": "chest pain",
	# "hpi": {...},
	# "ros": {...},
	# "generated_at": "2024-01-15T10:30:00Z"
	# }
	```

	## API Reference

	### POST /chat

	Request:
	```json
	{
	"session_id": "string",
	"message": "string"
	}
	```

	Response:
	```json
	{
	"reply": "string",
	"state": "intake\|hpi\|ros\|brief_generation\|done",
	"brief": {
	"chief_complaint": "string",
	"hpi": {
	"onset": "string",
	"location": "string",
	"duration": "string",
	"character": "string",
	"severity": "string",
	"aggravating": "string",
	"relieving": "string"
	},
	"ros": {
	"system_name": ["finding1", "finding2"]
	},
	"generated_at": "ISO8601 timestamp"
	}
	}
	```

	### GET /health

	Response:
	```json
	{
	"status": "ok",
	"mock_mode": true
	}
	```

	## Configuration

	\| Environment Variable \| Description \| Default \|
	\|---------------------\|-------------\|---------\|
	\| `MOCK_LLM` \| Use mock LLM responses (`true`) or real local LLM (`false`) \| `true` \|
	\| `MODEL_PATH` \| Path to GGUF model file (used when `MOCK_LLM=false`) \| `/models/qwen2.5-0.5b-instruct-q4_k_m.gguf` \|

	## Testing

	```bash
	# Run all tests (uses MockLLM automatically)
	pytest tests/

	# Run specific test
	pytest tests/test_e2e.py::test_full_intake_flow -v

	# Run with coverage
	pytest --cov=app tests/
	```

	### Test Coverage

	- ✅ `test_health_endpoint`: Verifies health check returns mock_mode status
	- ✅ `test_full_intake_flow`: Complete conversation flow from greeting to ClinicalBrief
	- ✅ `test_hpi_reprompt`: Validates vague answer re-prompting behavior
	- ✅ `test_ros_scoping`: Confirms ROS systems are scoped based on chief complaint
	- ✅ `test_brief_structure`: Validates ClinicalBrief Pydantic schema compliance

	## Project Structure

	```
	clinical-intake-agent/
	├── app/
	│ ├── __init__.py
	│ ├── main.py # FastAPI app + CLI entry point
	│ ├── graph.py # LangGraph state graph and nodes
	│ ├── state.py # TypedDict state definitions
	│ ├── schemas.py # Pydantic models (HPI, ClinicalBrief)
	│ └── llm.py # LLM provider (MockLLM, RealLLM)
	├── tests/
	│ ├── __init__.py
	│ └── test_e2e.py # End-to-end tests
	├── requirements.txt
	├── Dockerfile
	├── README.md
	```

	## Dependencies

	Minimal dependencies (no heavy ML libraries unless `MOCK_LLM=false`):

	- `langgraph` - State graph orchestration
	- `fastapi` - Web framework
	- `uvicorn` - ASGI server
	- `pydantic` - Data validation
	- `pytest` + `pytest-asyncio` - Testing
	- `httpx` - Async HTTP client for tests
	- `llama-cpp-python` - Only in Docker prod layer for real LLM mode

	## License

	MIT

	## Contributing

	1. Fork the repository
	2. Create a feature branch (`git checkout -b feature/amazing-feature`)
	3. Commit changes (`git commit -m 'Add amazing feature'`)
	4. Push to branch (`git push origin feature/amazing-feature`)
	5. Open a Pull Request

	## Troubleshooting

	### Model Download Fails

	If running with `MOCK_LLM=false` and the model fails to download:

	```bash
	# Manually download the model
	python -c "from huggingface_hub import hf_hub_download; hf_hub_download('bartowski/Qwen2.5-0.5B-Instruct-GGUF', 'Qwen2.5-0.5B-Instruct-Q4_K_M.gguf', local_dir='/models')"
	```

	### Session State Not Persisting

	Ensure you're using the same `session_id` across multiple `/chat` calls. Sessions are stored in-memory per process.

	### Docker Build Fails

	The Dockerfile skips model download if `MOCK_LLM=true`. To force model download in Docker:

	```bash
	docker build --build-arg MOCK_LLM=false -t clinical-intake-agent .
	```