hamedfrogh askyishan commited on
Commit
c3c4bcc
·
0 Parent(s):

Duplicate from askyishan/StethoLM

Browse files

Co-authored-by: Yishan Wang <askyishan@users.noreply.huggingface.co>

Files changed (3) hide show
  1. .gitattributes +35 -0
  2. README.md +131 -0
  3. stetholm_adapter.pt +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - audio
7
+ - medical
8
+ - cardiopulmonary
9
+ - auscultation
10
+ - instruction-tuning
11
+ - lora
12
+ - medgemma
13
+ base_model: google/medgemma-4b-it
14
+ datasets:
15
+ - askyishan/StethoBench
16
+ ---
17
+
18
+ # StethoLM
19
+
20
+ **StethoLM** is the first audio–language model specialized for cardiopulmonary auscultation, capable of performing instruction-driven clinical tasks across the full spectrum of auscultation analysis. It integrates a cardiopulmonary audio encoder with a medical language model backbone, trained on [StethoBench](https://huggingface.co/datasets/askyishan/StethoBench) — a comprehensive benchmark of 77,027 instruction–response pairs from 16,125 labeled recordings.
21
+
22
+ This work is published in the Transactions on Machine Learning Research (TMLR).
23
+
24
+ ---
25
+
26
+ ## Model Description
27
+
28
+ StethoLM connects a **COLA audio encoder** (EfficientNet-based, pre-trained on cardiopulmonary sounds via [CaReAQA](https://arxiv.org/abs/2505.01199)) to **MedGemma-4B-IT** via a learned MLP prefix projector. The audio is encoded into a short sequence of prefix tokens that are prepended to the text input of the language model. All components — audio encoder, prefix projector, and language model (via LoRA) — are jointly fine-tuned end-to-end.
29
+
30
+ **Architecture:**
31
+ - **Audio encoder:** COLA (EfficientNet backbone), pre-trained on cardiopulmonary audio, outputs 1280-dim embeddings; **fine-tuned** during StethoLM training
32
+ - **Prefix projector:** 3-layer MLP mapping audio features to 4 LM prefix tokens
33
+ - **Language model backbone:** [google/medgemma-4b-it](https://huggingface.co/google/medgemma-4b-it) fine-tuned with LoRA (r=8, α=32)
34
+
35
+ **Training:**
36
+ - **Stage 1:** Supervised fine-tuning (SFT) on StethoBench training split
37
+ - **Stage 2:** Multimodal Direct Preference Optimization (mDPO) with audio degradation-based conditional preference
38
+
39
+ ---
40
+
41
+ ## Intended Use
42
+
43
+ StethoLM is designed for **research** on AI-assisted cardiopulmonary auscultation. It supports seven clinical task categories:
44
+
45
+ | Task | Description |
46
+ |------|-------------|
47
+ | **Classification** | Binary normal/abnormal classification |
48
+ | **Identification** | Identifying specific sound types (e.g., wheezing, crackles) |
49
+ | **Report** | Generating a structured auscultation report |
50
+ | **Reasoning** | Explaining clinical findings |
51
+ | **Differential Diagnosis (DDx)** | Listing possible diagnoses |
52
+ | **Comparison** | Comparing findings across recordings |
53
+ | **Location** | Identifying anatomical auscultation site |
54
+
55
+ > ⚠️ **Not for clinical use.** This model is intended for research purposes only and has not been validated for clinical decision-making.
56
+
57
+ ---
58
+
59
+ ## How to Use
60
+
61
+ This repository contains the **adapter weights** (fine-tuned audio encoder + LoRA adapters + prefix projector, ~713 MB). The base MedGemma-4B model is downloaded automatically from HuggingFace on first run.
62
+
63
+ ### 1. Clone the code repository
64
+
65
+ ```bash
66
+ git clone https://github.com/askyishan/StethoLM
67
+ cd StethoLM
68
+ pip install -r requirements.txt
69
+ ```
70
+
71
+ ### 2. Download the adapter checkpoint
72
+
73
+ ```bash
74
+ huggingface-cli download askyishan/StethoLM stetholm_adapter.pt --local-dir checkpoints/
75
+ ```
76
+
77
+ ### 3. Run inference
78
+
79
+ ```bash
80
+ python predict.py \
81
+ --input_jsonl data/stethobench.jsonl \
82
+ --output_jsonl predictions.jsonl \
83
+ --audio_dir /path/to/audio_files \
84
+ --checkpoint checkpoints/stetholm_adapter.pt \
85
+ --model_name google/medgemma-4b-it \
86
+ --audio_encoder cola \
87
+ --split test
88
+ ```
89
+
90
+ ---
91
+
92
+ ## Training Data
93
+
94
+ StethoLM was trained on [StethoBench](https://huggingface.co/datasets/askyishan/StethoBench). The training split comprises recordings from 7 in-domain datasets; 4 additional datasets are held out as out-of-distribution (OOD) test sets.
95
+
96
+ **In-domain training datasets:**
97
+
98
+ | Dataset | Domain |
99
+ |---------|--------|
100
+ | CirCor DigiScope (heart-circor) | Heart |
101
+ | SPRSound (spr) | Lung |
102
+ | COVID-UK (coviduk) | Cough |
103
+ | CoughVid (coughvid) | Cough |
104
+ | ICBHI (icbhi) | Lung |
105
+ | ZCHSound (heart-zch) | Heart |
106
+ | KAUH (kauh) | Cardiopulmonary |
107
+
108
+ **Out-of-distribution (OOD) test datasets:**
109
+
110
+ | Dataset | Domain |
111
+ |---------|--------|
112
+ | BMD-HS | Heart |
113
+ | CINC | Cardiopulmonary |
114
+ | TR | Lung |
115
+ | FluSense | Cough |
116
+
117
+ ---
118
+
119
+ ## Citation
120
+
121
+ If you use StethoLM or StethoBench in your research, please cite:
122
+
123
+ ```bibtex
124
+ @article{stetholm2025,
125
+ title = {StethoLM: Audio Language Model for Cardiopulmonary Analysis Across Clinical Tasks},
126
+ author = {Wang, Yishan and Wang, Tsai-Ning and Funk, Mathias and Saeed, Aaqib},
127
+ journal = {Transactions on Machine Learning Research},
128
+ year = {2026},
129
+ url = {https://huggingface.co/askyishan/StethoLM}
130
+ }
131
+ ```
stetholm_adapter.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d94e0cd7223ba5098ff949c0635114a5ece72bbbcb7d322915d352bb0dfc608
3
+ size 695787446