Spaces:
Sleeping
Sleeping
| title: Clinical Intake Agent | |
| emoji: π₯ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| pinned: false | |
| # Clinical Intake Agent | |
| A LangGraph-based conversational agent for conducting pre-visit clinical intakes with simulated patients. The agent generates a structured ClinicalBrief (Chief Complaint, HPI, ROS) at the end of the conversation. | |
| ## Features | |
| - **Multi-turn conversation** with stateful memory using LangGraph checkpointing | |
| - **Structured clinical data collection**: Chief Complaint, HPI (OPQRST), and ROS | |
| - **Conditional ROS scoping**: Adapts review of systems based on chief complaint | |
| - **Vague answer handling**: Gracefully re-prompts when patient responses are unclear | |
| - **Dual mode**: Runs as FastAPI web app OR CLI tool | |
| - **Mock/Real LLM**: Switch between mock responses and real local LLM via environment variable | |
| ## Architecture | |
| ``` | |
| Patient β triage_node β agent_node β (done or loop back for next question) | |
| ``` | |
| ### Inference Engine | |
| - **Local dev (mock)**: `MOCK_LLM=true` β regex-based MockLLM, 0ms latency | |
| - **Production**: `MOCK_LLM=false` β **Ollama** local server (`qwen2.5:0.5b`, C++ optimized) | |
| - ~2s per turn on CPU vs 25s with raw PyTorch | |
| ### State Graph Nodes | |
| 1. **triage_node**: Detects acute emergency phrases β immediate π¨ alert | |
| 2. **agent_node**: Single LLM call β extracts all HPI/ROS fields AND generates next question | |
| When all fields complete, builds ClinicalBrief inline (no extra LLM call) | |
| ## Deployment on Hugging Face Spaces | |
| This repo is configured as a **Docker SDK Space**. On every push: | |
| 1. Docker image builds β Ollama gets installed via official install script | |
| 2. `startup.sh` starts on container boot: launches Ollama, pulls `qwen2.5:0.5b`, starts FastAPI | |
| 3. App is live on port 7860 | |
| ```bash | |
| # Test the Docker build locally before pushing | |
| docker build -t clinical-intake . | |
| docker run -p 7860:7860 clinical-intake | |
| ``` | |
| ## Local Development | |
| ```bash | |
| # Fast mock mode (no model needed, instant responses) | |
| MOCK_LLM=true uvicorn app.main:app --reload | |
| # Real Ollama mode β requires Ollama installed at localhost:11434 | |
| ollama serve & | |
| ollama pull qwen2.5:0.5b | |
| MOCK_LLM=false uvicorn app.main:app --reload | |
| ``` | |
| ## Usage | |
| ### FastAPI Web App | |
| #### Health Check | |
| ```bash | |
| curl http://localhost:7860/health | |
| # Response: {"status": "ok", "mock_mode": true} | |
| ``` | |
| #### Chat Endpoint | |
| ```bash | |
| # Start conversation | |
| curl -X POST http://localhost:7860/chat \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"session_id": "patient123", "message": "hello"}' | |
| # Continue conversation | |
| curl -X POST http://localhost:7860/chat \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"session_id": "patient123", "message": "I have chest pain"}' | |
| # Final response includes clinical_brief when state == "done" | |
| ``` | |
| ### CLI Mode | |
| ```bash | |
| # Run interactive CLI | |
| python app/main.py --cli | |
| # Example session: | |
| # Agent: Hello! I'm here to help you with your pre-visit intake. What brings you in today? | |
| # You: I have chest pain since this morning | |
| # Agent: I understand you're experiencing chest pain. When did it first start? | |
| # ... (continues through HPI and ROS) ... | |
| # Agent: Your clinical intake is complete. Here is your summary: | |
| # { | |
| # "chief_complaint": "chest pain", | |
| # "hpi": {...}, | |
| # "ros": {...}, | |
| # "generated_at": "2024-01-15T10:30:00Z" | |
| # } | |
| ``` | |
| ## API Reference | |
| ### POST /chat | |
| **Request:** | |
| ```json | |
| { | |
| "session_id": "string", | |
| "message": "string" | |
| } | |
| ``` | |
| **Response:** | |
| ```json | |
| { | |
| "reply": "string", | |
| "state": "intake|hpi|ros|brief_generation|done", | |
| "brief": { | |
| "chief_complaint": "string", | |
| "hpi": { | |
| "onset": "string", | |
| "location": "string", | |
| "duration": "string", | |
| "character": "string", | |
| "severity": "string", | |
| "aggravating": "string", | |
| "relieving": "string" | |
| }, | |
| "ros": { | |
| "system_name": ["finding1", "finding2"] | |
| }, | |
| "generated_at": "ISO8601 timestamp" | |
| } | |
| } | |
| ``` | |
| ### GET /health | |
| **Response:** | |
| ```json | |
| { | |
| "status": "ok", | |
| "mock_mode": true | |
| } | |
| ``` | |
| ## Configuration | |
| | Environment Variable | Description | Default | | |
| |---------------------|-------------|---------| | |
| | `MOCK_LLM` | Use mock LLM responses (`true`) or real local LLM (`false`) | `true` | | |
| | `MODEL_PATH` | Path to GGUF model file (used when `MOCK_LLM=false`) | `/models/qwen2.5-0.5b-instruct-q4_k_m.gguf` | | |
| ## Testing | |
| ```bash | |
| # Run all tests (uses MockLLM automatically) | |
| pytest tests/ | |
| # Run specific test | |
| pytest tests/test_e2e.py::test_full_intake_flow -v | |
| # Run with coverage | |
| pytest --cov=app tests/ | |
| ``` | |
| ### Test Coverage | |
| - β `test_health_endpoint`: Verifies health check returns mock_mode status | |
| - β `test_full_intake_flow`: Complete conversation flow from greeting to ClinicalBrief | |
| - β `test_hpi_reprompt`: Validates vague answer re-prompting behavior | |
| - β `test_ros_scoping`: Confirms ROS systems are scoped based on chief complaint | |
| - β `test_brief_structure`: Validates ClinicalBrief Pydantic schema compliance | |
| ## Project Structure | |
| ``` | |
| clinical-intake-agent/ | |
| βββ app/ | |
| β βββ __init__.py | |
| β βββ main.py # FastAPI app + CLI entry point | |
| β βββ graph.py # LangGraph state graph and nodes | |
| β βββ state.py # TypedDict state definitions | |
| β βββ schemas.py # Pydantic models (HPI, ClinicalBrief) | |
| β βββ llm.py # LLM provider (MockLLM, RealLLM) | |
| βββ tests/ | |
| β βββ __init__.py | |
| β βββ test_e2e.py # End-to-end tests | |
| βββ requirements.txt | |
| βββ Dockerfile | |
| βββ README.md | |
| ``` | |
| ## Dependencies | |
| Minimal dependencies (no heavy ML libraries unless `MOCK_LLM=false`): | |
| - `langgraph` - State graph orchestration | |
| - `fastapi` - Web framework | |
| - `uvicorn` - ASGI server | |
| - `pydantic` - Data validation | |
| - `pytest` + `pytest-asyncio` - Testing | |
| - `httpx` - Async HTTP client for tests | |
| - `llama-cpp-python` - Only in Docker prod layer for real LLM mode | |
| ## License | |
| MIT | |
| ## Contributing | |
| 1. Fork the repository | |
| 2. Create a feature branch (`git checkout -b feature/amazing-feature`) | |
| 3. Commit changes (`git commit -m 'Add amazing feature'`) | |
| 4. Push to branch (`git push origin feature/amazing-feature`) | |
| 5. Open a Pull Request | |
| ## Troubleshooting | |
| ### Model Download Fails | |
| If running with `MOCK_LLM=false` and the model fails to download: | |
| ```bash | |
| # Manually download the model | |
| python -c "from huggingface_hub import hf_hub_download; hf_hub_download('bartowski/Qwen2.5-0.5B-Instruct-GGUF', 'Qwen2.5-0.5B-Instruct-Q4_K_M.gguf', local_dir='/models')" | |
| ``` | |
| ### Session State Not Persisting | |
| Ensure you're using the same `session_id` across multiple `/chat` calls. Sessions are stored in-memory per process. | |
| ### Docker Build Fails | |
| The Dockerfile skips model download if `MOCK_LLM=true`. To force model download in Docker: | |
| ```bash | |
| docker build --build-arg MOCK_LLM=false -t clinical-intake-agent . | |
| ``` | |