File size: 3,178 Bytes
d2fbef5
e79ef1b
 
d2fbef5
e79ef1b
d2fbef5
e79ef1b
 
 
 
d2fbef5
 
f89d64f
 
e79ef1b
f89d64f
e79ef1b
f89d64f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e79ef1b
b4ea910
 
 
 
 
 
 
 
 
 
 
 
 
 
f89d64f
 
 
 
 
 
 
 
e79ef1b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
language:
- en
license: mit
pipeline_tag: audio-text-to-text
tags:
- respiratory-sound
- medical-ai
- audio-generation
- audio-classification
---

# Resp-Agent Models

Model weights for **Resp-Agent**, an autonomous multimodal system for respiratory sound generation and disease diagnosis presented in the paper: **[Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis](https://huggingface.co/papers/2602.15909)** (ICLR 2026).

πŸ“¦ **GitHub Repository**: [zpforlove/Resp-Agent](https://github.com/zpforlove/Resp-Agent)

## πŸ“ Contents

| Model | Size | Description |
|-------|------|-------------|
| **Diagnoser/checkpoints/longformer** | 952 MB | Fine-tuned Longformer for EHR + audio analysis |
| **Diagnoser/pretrained_models** | 695 MB | BEATs & Tokenizer pretrained weights |
| **Generator/checkpoints/llm** | 3.8 GB | Fine-tuned LLM for audio generation |
| **Generator/checkpoints/flow** | 2.0 GB | CFM flow matching model |
| **Generator/pretrained_models** | 695 MB | BEATs & Tokenizer pretrained weights |
| **audio_descriptions.jsonl** | 87 MBΓ—2 | Audio description data |

> **Note**: DeepSeek-R1 model is **NOT** included here. Please download separately from:  
> πŸ”— [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)

## πŸš€ Quick Download

```python
from huggingface_hub import snapshot_download

# Download all models
snapshot_download(
    repo_id="AustinZhang/resp-agent-models",
    local_dir="./",
    ignore_patterns=["*.md", ".gitattributes"]
)

# Download DeepSeek-R1 separately
snapshot_download(
    repo_id="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
    local_dir="./Diagnoser/checkpoints/deepseek-r1"
)
```

## πŸ“‚ Expected Directory Structure

After downloading, your project should look like:

```
Resp-Agent/
β”œβ”€β”€ Diagnoser/
β”‚   β”œβ”€β”€ checkpoints/
β”‚   β”‚   β”œβ”€β”€ deepseek-r1/          # From deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
β”‚   β”‚   └── longformer/           # From this repo
β”‚   └── pretrained_models/        # From this repo
└── Generator/
    β”œβ”€β”€ checkpoints/
    β”‚   β”œβ”€β”€ llm/                  # From this repo
    β”‚   └── flow/                 # From this repo
    └── pretrained_models/        # From this repo
```

## πŸ“ Citation

If you find this work useful, please cite our paper:

```bibtex
@inproceedings{
zhang2026respagent,
title={Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis},
author={Pengfei ZHANG and Tianxin Xie and Minghao Yang and Li Liu},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=ZkoojtEm3W}
}
```

## πŸ™ Acknowledgements

- [BEATs](https://github.com/microsoft/unilm/tree/master/beats) - Audio pre-training framework
- [DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1) - Reasoning model
- [Longformer](https://github.com/allenai/longformer) - Long document transformer

## πŸ“„ License

These model weights are released for academic research purposes only under the MIT License.