askyishan commited on
Commit
8712650
·
verified ·
1 Parent(s): cae9c10

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -19,13 +19,13 @@ datasets:
19
 
20
  **StethoLM** is the first audio–language model specialized for cardiopulmonary auscultation, capable of performing instruction-driven clinical tasks across the full spectrum of auscultation analysis. It integrates a cardiopulmonary audio encoder with a medical language model backbone, trained on [StethoBench](https://huggingface.co/datasets/askyishan/StethoBench) — a comprehensive benchmark of 77,027 instruction–response pairs from 16,125 labeled recordings.
21
 
22
- > Published at **TMLR 2025**.
23
 
24
  ---
25
 
26
  ## Model Description
27
 
28
- StethoLM connects a **COLA audio encoder** (EfficientNet-based, pre-trained on cardiopulmonary sounds via [CaReAQA](https://arxiv.org/abs/2501.02225)) to **MedGemma-4B-IT** via a learned MLP prefix projector. The audio is encoded into a short sequence of prefix tokens that are prepended to the text input of the language model. All components — audio encoder, prefix projector, and language model (via LoRA) — are jointly fine-tuned end-to-end.
29
 
30
  **Architecture:**
31
  - **Audio encoder:** COLA (EfficientNet backbone), pre-trained on cardiopulmonary audio, outputs 1280-dim embeddings; **fine-tuned** during StethoLM training
 
19
 
20
  **StethoLM** is the first audio–language model specialized for cardiopulmonary auscultation, capable of performing instruction-driven clinical tasks across the full spectrum of auscultation analysis. It integrates a cardiopulmonary audio encoder with a medical language model backbone, trained on [StethoBench](https://huggingface.co/datasets/askyishan/StethoBench) — a comprehensive benchmark of 77,027 instruction–response pairs from 16,125 labeled recordings.
21
 
22
+ This work is published in the Transactions on Machine Learning Research (TMLR).
23
 
24
  ---
25
 
26
  ## Model Description
27
 
28
+ StethoLM connects a **COLA audio encoder** (EfficientNet-based, pre-trained on cardiopulmonary sounds via [CaReAQA](https://arxiv.org/abs/2505.01199)) to **MedGemma-4B-IT** via a learned MLP prefix projector. The audio is encoded into a short sequence of prefix tokens that are prepended to the text input of the language model. All components — audio encoder, prefix projector, and language model (via LoRA) — are jointly fine-tuned end-to-end.
29
 
30
  **Architecture:**
31
  - **Audio encoder:** COLA (EfficientNet backbone), pre-trained on cardiopulmonary audio, outputs 1280-dim embeddings; **fine-tuned** during StethoLM training