medintake-ai / README.md
priyansh-saxena1
feat: migrate inference engine to Ollama for 10x faster CPU inference
4e16e37
metadata
title: Clinical Intake Agent
emoji: πŸ₯
colorFrom: blue
colorTo: green
sdk: docker
pinned: false

Clinical Intake Agent

A LangGraph-based conversational agent for conducting pre-visit clinical intakes with simulated patients. The agent generates a structured ClinicalBrief (Chief Complaint, HPI, ROS) at the end of the conversation.

Features

  • Multi-turn conversation with stateful memory using LangGraph checkpointing
  • Structured clinical data collection: Chief Complaint, HPI (OPQRST), and ROS
  • Conditional ROS scoping: Adapts review of systems based on chief complaint
  • Vague answer handling: Gracefully re-prompts when patient responses are unclear
  • Dual mode: Runs as FastAPI web app OR CLI tool
  • Mock/Real LLM: Switch between mock responses and real local LLM via environment variable

Architecture

Patient β†’ triage_node β†’ agent_node β†’ (done or loop back for next question)

Inference Engine

  • Local dev (mock): MOCK_LLM=true β€” regex-based MockLLM, 0ms latency
  • Production: MOCK_LLM=false β€” Ollama local server (qwen2.5:0.5b, C++ optimized)
    • ~2s per turn on CPU vs 25s with raw PyTorch

State Graph Nodes

  1. triage_node: Detects acute emergency phrases β†’ immediate 🚨 alert
  2. agent_node: Single LLM call β€” extracts all HPI/ROS fields AND generates next question
    When all fields complete, builds ClinicalBrief inline (no extra LLM call)

Deployment on Hugging Face Spaces

This repo is configured as a Docker SDK Space. On every push:

  1. Docker image builds β€” Ollama gets installed via official install script
  2. startup.sh starts on container boot: launches Ollama, pulls qwen2.5:0.5b, starts FastAPI
  3. App is live on port 7860
# Test the Docker build locally before pushing
docker build -t clinical-intake .
docker run -p 7860:7860 clinical-intake

Local Development

# Fast mock mode (no model needed, instant responses)
MOCK_LLM=true uvicorn app.main:app --reload

# Real Ollama mode β€” requires Ollama installed at localhost:11434
ollama serve &
ollama pull qwen2.5:0.5b
MOCK_LLM=false uvicorn app.main:app --reload

Usage

FastAPI Web App

Health Check

curl http://localhost:7860/health
# Response: {"status": "ok", "mock_mode": true}

Chat Endpoint

# Start conversation
curl -X POST http://localhost:7860/chat \
  -H "Content-Type: application/json" \
  -d '{"session_id": "patient123", "message": "hello"}'

# Continue conversation
curl -X POST http://localhost:7860/chat \
  -H "Content-Type: application/json" \
  -d '{"session_id": "patient123", "message": "I have chest pain"}'

# Final response includes clinical_brief when state == "done"

CLI Mode

# Run interactive CLI
python app/main.py --cli

# Example session:
# Agent: Hello! I'm here to help you with your pre-visit intake. What brings you in today?
# You: I have chest pain since this morning
# Agent: I understand you're experiencing chest pain. When did it first start?
# ... (continues through HPI and ROS) ...
# Agent: Your clinical intake is complete. Here is your summary:
# {
#   "chief_complaint": "chest pain",
#   "hpi": {...},
#   "ros": {...},
#   "generated_at": "2024-01-15T10:30:00Z"
# }

API Reference

POST /chat

Request:

{
  "session_id": "string",
  "message": "string"
}

Response:

{
  "reply": "string",
  "state": "intake|hpi|ros|brief_generation|done",
  "brief": {
    "chief_complaint": "string",
    "hpi": {
      "onset": "string",
      "location": "string",
      "duration": "string",
      "character": "string",
      "severity": "string",
      "aggravating": "string",
      "relieving": "string"
    },
    "ros": {
      "system_name": ["finding1", "finding2"]
    },
    "generated_at": "ISO8601 timestamp"
  }
}

GET /health

Response:

{
  "status": "ok",
  "mock_mode": true
}

Configuration

Environment Variable Description Default
MOCK_LLM Use mock LLM responses (true) or real local LLM (false) true
MODEL_PATH Path to GGUF model file (used when MOCK_LLM=false) /models/qwen2.5-0.5b-instruct-q4_k_m.gguf

Testing

# Run all tests (uses MockLLM automatically)
pytest tests/

# Run specific test
pytest tests/test_e2e.py::test_full_intake_flow -v

# Run with coverage
pytest --cov=app tests/

Test Coverage

  • βœ… test_health_endpoint: Verifies health check returns mock_mode status
  • βœ… test_full_intake_flow: Complete conversation flow from greeting to ClinicalBrief
  • βœ… test_hpi_reprompt: Validates vague answer re-prompting behavior
  • βœ… test_ros_scoping: Confirms ROS systems are scoped based on chief complaint
  • βœ… test_brief_structure: Validates ClinicalBrief Pydantic schema compliance

Project Structure

clinical-intake-agent/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py          # FastAPI app + CLI entry point
β”‚   β”œβ”€β”€ graph.py         # LangGraph state graph and nodes
β”‚   β”œβ”€β”€ state.py         # TypedDict state definitions
β”‚   β”œβ”€β”€ schemas.py       # Pydantic models (HPI, ClinicalBrief)
β”‚   └── llm.py           # LLM provider (MockLLM, RealLLM)
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── test_e2e.py      # End-to-end tests
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ README.md

Dependencies

Minimal dependencies (no heavy ML libraries unless MOCK_LLM=false):

  • langgraph - State graph orchestration
  • fastapi - Web framework
  • uvicorn - ASGI server
  • pydantic - Data validation
  • pytest + pytest-asyncio - Testing
  • httpx - Async HTTP client for tests
  • llama-cpp-python - Only in Docker prod layer for real LLM mode

License

MIT

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Troubleshooting

Model Download Fails

If running with MOCK_LLM=false and the model fails to download:

# Manually download the model
python -c "from huggingface_hub import hf_hub_download; hf_hub_download('bartowski/Qwen2.5-0.5B-Instruct-GGUF', 'Qwen2.5-0.5B-Instruct-Q4_K_M.gguf', local_dir='/models')"

Session State Not Persisting

Ensure you're using the same session_id across multiple /chat calls. Sessions are stored in-memory per process.

Docker Build Fails

The Dockerfile skips model download if MOCK_LLM=true. To force model download in Docker:

docker build --build-arg MOCK_LLM=false -t clinical-intake-agent .