medintake-ai / README.md
priyansh-saxena1
feat: migrate inference engine to Ollama for 10x faster CPU inference
4e16e37
---
title: Clinical Intake Agent
emoji: πŸ₯
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
---
# Clinical Intake Agent
A LangGraph-based conversational agent for conducting pre-visit clinical intakes with simulated patients. The agent generates a structured ClinicalBrief (Chief Complaint, HPI, ROS) at the end of the conversation.
## Features
- **Multi-turn conversation** with stateful memory using LangGraph checkpointing
- **Structured clinical data collection**: Chief Complaint, HPI (OPQRST), and ROS
- **Conditional ROS scoping**: Adapts review of systems based on chief complaint
- **Vague answer handling**: Gracefully re-prompts when patient responses are unclear
- **Dual mode**: Runs as FastAPI web app OR CLI tool
- **Mock/Real LLM**: Switch between mock responses and real local LLM via environment variable
## Architecture
```
Patient β†’ triage_node β†’ agent_node β†’ (done or loop back for next question)
```
### Inference Engine
- **Local dev (mock)**: `MOCK_LLM=true` β€” regex-based MockLLM, 0ms latency
- **Production**: `MOCK_LLM=false` β€” **Ollama** local server (`qwen2.5:0.5b`, C++ optimized)
- ~2s per turn on CPU vs 25s with raw PyTorch
### State Graph Nodes
1. **triage_node**: Detects acute emergency phrases β†’ immediate 🚨 alert
2. **agent_node**: Single LLM call β€” extracts all HPI/ROS fields AND generates next question
When all fields complete, builds ClinicalBrief inline (no extra LLM call)
## Deployment on Hugging Face Spaces
This repo is configured as a **Docker SDK Space**. On every push:
1. Docker image builds β€” Ollama gets installed via official install script
2. `startup.sh` starts on container boot: launches Ollama, pulls `qwen2.5:0.5b`, starts FastAPI
3. App is live on port 7860
```bash
# Test the Docker build locally before pushing
docker build -t clinical-intake .
docker run -p 7860:7860 clinical-intake
```
## Local Development
```bash
# Fast mock mode (no model needed, instant responses)
MOCK_LLM=true uvicorn app.main:app --reload
# Real Ollama mode β€” requires Ollama installed at localhost:11434
ollama serve &
ollama pull qwen2.5:0.5b
MOCK_LLM=false uvicorn app.main:app --reload
```
## Usage
### FastAPI Web App
#### Health Check
```bash
curl http://localhost:7860/health
# Response: {"status": "ok", "mock_mode": true}
```
#### Chat Endpoint
```bash
# Start conversation
curl -X POST http://localhost:7860/chat \
-H "Content-Type: application/json" \
-d '{"session_id": "patient123", "message": "hello"}'
# Continue conversation
curl -X POST http://localhost:7860/chat \
-H "Content-Type: application/json" \
-d '{"session_id": "patient123", "message": "I have chest pain"}'
# Final response includes clinical_brief when state == "done"
```
### CLI Mode
```bash
# Run interactive CLI
python app/main.py --cli
# Example session:
# Agent: Hello! I'm here to help you with your pre-visit intake. What brings you in today?
# You: I have chest pain since this morning
# Agent: I understand you're experiencing chest pain. When did it first start?
# ... (continues through HPI and ROS) ...
# Agent: Your clinical intake is complete. Here is your summary:
# {
# "chief_complaint": "chest pain",
# "hpi": {...},
# "ros": {...},
# "generated_at": "2024-01-15T10:30:00Z"
# }
```
## API Reference
### POST /chat
**Request:**
```json
{
"session_id": "string",
"message": "string"
}
```
**Response:**
```json
{
"reply": "string",
"state": "intake|hpi|ros|brief_generation|done",
"brief": {
"chief_complaint": "string",
"hpi": {
"onset": "string",
"location": "string",
"duration": "string",
"character": "string",
"severity": "string",
"aggravating": "string",
"relieving": "string"
},
"ros": {
"system_name": ["finding1", "finding2"]
},
"generated_at": "ISO8601 timestamp"
}
}
```
### GET /health
**Response:**
```json
{
"status": "ok",
"mock_mode": true
}
```
## Configuration
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `MOCK_LLM` | Use mock LLM responses (`true`) or real local LLM (`false`) | `true` |
| `MODEL_PATH` | Path to GGUF model file (used when `MOCK_LLM=false`) | `/models/qwen2.5-0.5b-instruct-q4_k_m.gguf` |
## Testing
```bash
# Run all tests (uses MockLLM automatically)
pytest tests/
# Run specific test
pytest tests/test_e2e.py::test_full_intake_flow -v
# Run with coverage
pytest --cov=app tests/
```
### Test Coverage
- βœ… `test_health_endpoint`: Verifies health check returns mock_mode status
- βœ… `test_full_intake_flow`: Complete conversation flow from greeting to ClinicalBrief
- βœ… `test_hpi_reprompt`: Validates vague answer re-prompting behavior
- βœ… `test_ros_scoping`: Confirms ROS systems are scoped based on chief complaint
- βœ… `test_brief_structure`: Validates ClinicalBrief Pydantic schema compliance
## Project Structure
```
clinical-intake-agent/
β”œβ”€β”€ app/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ main.py # FastAPI app + CLI entry point
β”‚ β”œβ”€β”€ graph.py # LangGraph state graph and nodes
β”‚ β”œβ”€β”€ state.py # TypedDict state definitions
β”‚ β”œβ”€β”€ schemas.py # Pydantic models (HPI, ClinicalBrief)
β”‚ └── llm.py # LLM provider (MockLLM, RealLLM)
β”œβ”€β”€ tests/
β”‚ β”œβ”€β”€ __init__.py
β”‚ └── test_e2e.py # End-to-end tests
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ README.md
```
## Dependencies
Minimal dependencies (no heavy ML libraries unless `MOCK_LLM=false`):
- `langgraph` - State graph orchestration
- `fastapi` - Web framework
- `uvicorn` - ASGI server
- `pydantic` - Data validation
- `pytest` + `pytest-asyncio` - Testing
- `httpx` - Async HTTP client for tests
- `llama-cpp-python` - Only in Docker prod layer for real LLM mode
## License
MIT
## Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit changes (`git commit -m 'Add amazing feature'`)
4. Push to branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## Troubleshooting
### Model Download Fails
If running with `MOCK_LLM=false` and the model fails to download:
```bash
# Manually download the model
python -c "from huggingface_hub import hf_hub_download; hf_hub_download('bartowski/Qwen2.5-0.5B-Instruct-GGUF', 'Qwen2.5-0.5B-Instruct-Q4_K_M.gguf', local_dir='/models')"
```
### Session State Not Persisting
Ensure you're using the same `session_id` across multiple `/chat` calls. Sessions are stored in-memory per process.
### Docker Build Fails
The Dockerfile skips model download if `MOCK_LLM=true`. To force model download in Docker:
```bash
docker build --build-arg MOCK_LLM=false -t clinical-intake-agent .
```