sebis at ArchEHR-QA 2026: How Much Can You Do Locally? Evaluating Grounded EHR QA on a Single Notebook
Abstract
Clinical question answering over electronic health records (EHRs) can help clinicians and patients access relevant medical information more efficiently. However, many recent approaches rely on large cloud-based models, which are difficult to deploy in clinical environments due to privacy constraints and computational requirements. In this work, we investigate how far grounded EHR question answering can be pushed when restricted to a single notebook. We participate in all four subtasks of the ArchEHR-QA 2026 shared task and evaluate several approaches designed to run on commodity hardware. All experiments are conducted locally without external APIs or cloud infrastructure. Our results show that such systems can achieve competitive performance on the shared task leaderboards. In particular, our submissions perform above average in two subtasks, and we observe that smaller models can approach the performance of much larger systems when properly configured. These findings suggest that privacy-preserving EHR QA systems running fully locally are feasible with current models and commodity hardware. The source code is available at https://github.com/ibrahimey/ArchEHR-QA-2026.
Community
Paper for our (sebis) submission to ArchEHR-QA 2026 Shared Task (CL4Health @ LREC 2026), in which we ask how much can be done locally on a single notebook in EHR QA
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records (2026)
- MedGPT-oss: Training a General-Purpose Vision-Language Model for Biomedicine (2026)
- Who Judges the Judge? Evaluating LLM-as-a-Judge for French Medical open-ended QA (2026)
- Do Clinical Question Answering Systems Really Need Specialised Medical Fine Tuning? (2026)
- Prompt Sensitivity and Answer Consistency of Small Open-Source Large Language Models on Clinical Question Answering: Implications for Low-Resource Healthcare Deployment (2026)
- RAG-X: Systematic Diagnosis of Retrieval-Augmented Generation for Medical Question Answering (2026)
- Assessing Large Language Models for Medical QA: Zero-Shot and LLM-as-a-Judge Evaluation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper