| | --- |
| | license: mit |
| | tags: |
| | - respiratory-sound |
| | - medical-ai |
| | - audio-generation |
| | - audio-classification |
| | language: |
| | - en |
| | --- |
| | |
| | # Resp-Agent Models |
| |
|
| | Model weights for **Resp-Agent** - An intelligent respiratory sound analysis and generation system. |
| |
|
| | π¦ **GitHub Repository**: [AustinZhang/Resp-Agent](https://github.com/zpforlove/Resp-Agent) |
| |
|
| | ## π Contents |
| |
|
| | | Model | Size | Description | |
| | |-------|------|-------------| |
| | | **Diagnoser/checkpoints/longformer** | 952 MB | Fine-tuned Longformer for EHR + audio analysis | |
| | | **Diagnoser/pretrained_models** | 695 MB | BEATs & Tokenizer pretrained weights | |
| | | **Generator/checkpoints/llm** | 3.8 GB | Fine-tuned LLM for audio generation | |
| | | **Generator/checkpoints/flow** | 2.0 GB | CFM flow matching model | |
| | | **Generator/pretrained_models** | 695 MB | BEATs & Tokenizer pretrained weights | |
| | | **audio_descriptions.jsonl** | 87 MBΓ2 | Audio description data | |
| | |
| | > **Note**: DeepSeek-R1 model is **NOT** included here. Please download separately from: |
| | > π [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) |
| | |
| | ## π Quick Download |
| | |
| | ```python |
| | from huggingface_hub import snapshot_download |
| | |
| | # Download all models |
| | snapshot_download( |
| | repo_id="AustinZhang/resp-agent-models", |
| | local_dir="./", |
| | ignore_patterns=["*.md", ".gitattributes"] |
| | ) |
| | |
| | # Download DeepSeek-R1 separately |
| | snapshot_download( |
| | repo_id="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B", |
| | local_dir="./Diagnoser/checkpoints/deepseek-r1" |
| | ) |
| | ``` |
| | |
| | ## π Expected Directory Structure |
| | |
| | After downloading, your project should look like: |
| | |
| | ``` |
| | Resp-Agent/ |
| | βββ Diagnoser/ |
| | β βββ checkpoints/ |
| | β β βββ deepseek-r1/ # From deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
| | β β βββ longformer/ # From this repo |
| | β βββ pretrained_models/ # From this repo |
| | βββ Generator/ |
| | βββ checkpoints/ |
| | β βββ llm/ # From this repo |
| | β βββ flow/ # From this repo |
| | βββ pretrained_models/ # From this repo |
| | ``` |
| | |
| | ## π Paper |
| | |
| | **[Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis](https://openreview.net/forum?id=ZkoojtEm3W&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DICLR.cc%2F2026%2FConference%2FAuthors%23your-submissions))** (ICLR 2026) |
| |
|
| | If you find this work useful, please cite our paper: |
| |
|
| | ```bibtex |
| | @inproceedings{zhangresp, |
| | title={Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis}, |
| | author={ZHANG, Pengfei and Xie, Tianxin and Yang, Minghao and Liu, Li}, |
| | booktitle={The Fourteenth International Conference on Learning Representations} |
| | } |
| | ``` |
| |
|
| | ## π Acknowledgements |
| |
|
| | - [BEATs](https://github.com/microsoft/unilm/tree/master/beats) - Audio pre-training framework |
| | - [DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1) - Reasoning model |
| | - [Longformer](https://github.com/allenai/longformer) - Long document transformer |
| |
|
| | ## π License |
| |
|
| | These model weights are released for academic research purposes only. |
| |
|