Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -19,13 +19,13 @@ datasets:
|
|
| 19 |
|
| 20 |
**StethoLM** is the first audio–language model specialized for cardiopulmonary auscultation, capable of performing instruction-driven clinical tasks across the full spectrum of auscultation analysis. It integrates a cardiopulmonary audio encoder with a medical language model backbone, trained on [StethoBench](https://huggingface.co/datasets/askyishan/StethoBench) — a comprehensive benchmark of 77,027 instruction–response pairs from 16,125 labeled recordings.
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
---
|
| 25 |
|
| 26 |
## Model Description
|
| 27 |
|
| 28 |
-
StethoLM connects a **COLA audio encoder** (EfficientNet-based, pre-trained on cardiopulmonary sounds via [CaReAQA](https://arxiv.org/abs/
|
| 29 |
|
| 30 |
**Architecture:**
|
| 31 |
- **Audio encoder:** COLA (EfficientNet backbone), pre-trained on cardiopulmonary audio, outputs 1280-dim embeddings; **fine-tuned** during StethoLM training
|
|
|
|
| 19 |
|
| 20 |
**StethoLM** is the first audio–language model specialized for cardiopulmonary auscultation, capable of performing instruction-driven clinical tasks across the full spectrum of auscultation analysis. It integrates a cardiopulmonary audio encoder with a medical language model backbone, trained on [StethoBench](https://huggingface.co/datasets/askyishan/StethoBench) — a comprehensive benchmark of 77,027 instruction–response pairs from 16,125 labeled recordings.
|
| 21 |
|
| 22 |
+
This work is published in the Transactions on Machine Learning Research (TMLR).
|
| 23 |
|
| 24 |
---
|
| 25 |
|
| 26 |
## Model Description
|
| 27 |
|
| 28 |
+
StethoLM connects a **COLA audio encoder** (EfficientNet-based, pre-trained on cardiopulmonary sounds via [CaReAQA](https://arxiv.org/abs/2505.01199)) to **MedGemma-4B-IT** via a learned MLP prefix projector. The audio is encoded into a short sequence of prefix tokens that are prepended to the text input of the language model. All components — audio encoder, prefix projector, and language model (via LoRA) — are jointly fine-tuned end-to-end.
|
| 29 |
|
| 30 |
**Architecture:**
|
| 31 |
- **Audio encoder:** COLA (EfficientNet backbone), pre-trained on cardiopulmonary audio, outputs 1280-dim embeddings; **fine-tuned** during StethoLM training
|