Spaces:
Sleeping
Sleeping
metadata
title: Clinical Intake Agent
emoji: π₯
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
Clinical Intake Agent
A LangGraph-based conversational agent for conducting pre-visit clinical intakes with simulated patients. The agent generates a structured ClinicalBrief (Chief Complaint, HPI, ROS) at the end of the conversation.
Features
- Multi-turn conversation with stateful memory using LangGraph checkpointing
- Structured clinical data collection: Chief Complaint, HPI (OPQRST), and ROS
- Conditional ROS scoping: Adapts review of systems based on chief complaint
- Vague answer handling: Gracefully re-prompts when patient responses are unclear
- Dual mode: Runs as FastAPI web app OR CLI tool
- Mock/Real LLM: Switch between mock responses and real local LLM via environment variable
Architecture
Patient β triage_node β agent_node β (done or loop back for next question)
Inference Engine
- Local dev (mock):
MOCK_LLM=trueβ regex-based MockLLM, 0ms latency - Production:
MOCK_LLM=falseβ Ollama local server (qwen2.5:0.5b, C++ optimized)- ~2s per turn on CPU vs 25s with raw PyTorch
State Graph Nodes
- triage_node: Detects acute emergency phrases β immediate π¨ alert
- agent_node: Single LLM call β extracts all HPI/ROS fields AND generates next question
When all fields complete, builds ClinicalBrief inline (no extra LLM call)
Deployment on Hugging Face Spaces
This repo is configured as a Docker SDK Space. On every push:
- Docker image builds β Ollama gets installed via official install script
startup.shstarts on container boot: launches Ollama, pullsqwen2.5:0.5b, starts FastAPI- App is live on port 7860
# Test the Docker build locally before pushing
docker build -t clinical-intake .
docker run -p 7860:7860 clinical-intake
Local Development
# Fast mock mode (no model needed, instant responses)
MOCK_LLM=true uvicorn app.main:app --reload
# Real Ollama mode β requires Ollama installed at localhost:11434
ollama serve &
ollama pull qwen2.5:0.5b
MOCK_LLM=false uvicorn app.main:app --reload
Usage
FastAPI Web App
Health Check
curl http://localhost:7860/health
# Response: {"status": "ok", "mock_mode": true}
Chat Endpoint
# Start conversation
curl -X POST http://localhost:7860/chat \
-H "Content-Type: application/json" \
-d '{"session_id": "patient123", "message": "hello"}'
# Continue conversation
curl -X POST http://localhost:7860/chat \
-H "Content-Type: application/json" \
-d '{"session_id": "patient123", "message": "I have chest pain"}'
# Final response includes clinical_brief when state == "done"
CLI Mode
# Run interactive CLI
python app/main.py --cli
# Example session:
# Agent: Hello! I'm here to help you with your pre-visit intake. What brings you in today?
# You: I have chest pain since this morning
# Agent: I understand you're experiencing chest pain. When did it first start?
# ... (continues through HPI and ROS) ...
# Agent: Your clinical intake is complete. Here is your summary:
# {
# "chief_complaint": "chest pain",
# "hpi": {...},
# "ros": {...},
# "generated_at": "2024-01-15T10:30:00Z"
# }
API Reference
POST /chat
Request:
{
"session_id": "string",
"message": "string"
}
Response:
{
"reply": "string",
"state": "intake|hpi|ros|brief_generation|done",
"brief": {
"chief_complaint": "string",
"hpi": {
"onset": "string",
"location": "string",
"duration": "string",
"character": "string",
"severity": "string",
"aggravating": "string",
"relieving": "string"
},
"ros": {
"system_name": ["finding1", "finding2"]
},
"generated_at": "ISO8601 timestamp"
}
}
GET /health
Response:
{
"status": "ok",
"mock_mode": true
}
Configuration
| Environment Variable | Description | Default |
|---|---|---|
MOCK_LLM |
Use mock LLM responses (true) or real local LLM (false) |
true |
MODEL_PATH |
Path to GGUF model file (used when MOCK_LLM=false) |
/models/qwen2.5-0.5b-instruct-q4_k_m.gguf |
Testing
# Run all tests (uses MockLLM automatically)
pytest tests/
# Run specific test
pytest tests/test_e2e.py::test_full_intake_flow -v
# Run with coverage
pytest --cov=app tests/
Test Coverage
- β
test_health_endpoint: Verifies health check returns mock_mode status - β
test_full_intake_flow: Complete conversation flow from greeting to ClinicalBrief - β
test_hpi_reprompt: Validates vague answer re-prompting behavior - β
test_ros_scoping: Confirms ROS systems are scoped based on chief complaint - β
test_brief_structure: Validates ClinicalBrief Pydantic schema compliance
Project Structure
clinical-intake-agent/
βββ app/
β βββ __init__.py
β βββ main.py # FastAPI app + CLI entry point
β βββ graph.py # LangGraph state graph and nodes
β βββ state.py # TypedDict state definitions
β βββ schemas.py # Pydantic models (HPI, ClinicalBrief)
β βββ llm.py # LLM provider (MockLLM, RealLLM)
βββ tests/
β βββ __init__.py
β βββ test_e2e.py # End-to-end tests
βββ requirements.txt
βββ Dockerfile
βββ README.md
Dependencies
Minimal dependencies (no heavy ML libraries unless MOCK_LLM=false):
langgraph- State graph orchestrationfastapi- Web frameworkuvicorn- ASGI serverpydantic- Data validationpytest+pytest-asyncio- Testinghttpx- Async HTTP client for testsllama-cpp-python- Only in Docker prod layer for real LLM mode
License
MIT
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Troubleshooting
Model Download Fails
If running with MOCK_LLM=false and the model fails to download:
# Manually download the model
python -c "from huggingface_hub import hf_hub_download; hf_hub_download('bartowski/Qwen2.5-0.5B-Instruct-GGUF', 'Qwen2.5-0.5B-Instruct-Q4_K_M.gguf', local_dir='/models')"
Session State Not Persisting
Ensure you're using the same session_id across multiple /chat calls. Sessions are stored in-memory per process.
Docker Build Fails
The Dockerfile skips model download if MOCK_LLM=true. To force model download in Docker:
docker build --build-arg MOCK_LLM=false -t clinical-intake-agent .