--- language: - en license: mit pipeline_tag: audio-text-to-text tags: - respiratory-sound - medical-ai - audio-generation - audio-classification --- # Resp-Agent Models Model weights for **Resp-Agent**, an autonomous multimodal system for respiratory sound generation and disease diagnosis presented in the paper: **[Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis](https://huggingface.co/papers/2602.15909)** (ICLR 2026). 📦 **GitHub Repository**: [zpforlove/Resp-Agent](https://github.com/zpforlove/Resp-Agent) ## 📁 Contents | Model | Size | Description | |-------|------|-------------| | **Diagnoser/checkpoints/longformer** | 952 MB | Fine-tuned Longformer for EHR + audio analysis | | **Diagnoser/pretrained_models** | 695 MB | BEATs & Tokenizer pretrained weights | | **Generator/checkpoints/llm** | 3.8 GB | Fine-tuned LLM for audio generation | | **Generator/checkpoints/flow** | 2.0 GB | CFM flow matching model | | **Generator/pretrained_models** | 695 MB | BEATs & Tokenizer pretrained weights | | **audio_descriptions.jsonl** | 87 MB×2 | Audio description data | > **Note**: DeepSeek-R1 model is **NOT** included here. Please download separately from: > 🔗 [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) ## 🚀 Quick Download ```python from huggingface_hub import snapshot_download # Download all models snapshot_download( repo_id="AustinZhang/resp-agent-models", local_dir="./", ignore_patterns=["*.md", ".gitattributes"] ) # Download DeepSeek-R1 separately snapshot_download( repo_id="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B", local_dir="./Diagnoser/checkpoints/deepseek-r1" ) ``` ## 📂 Expected Directory Structure After downloading, your project should look like: ``` Resp-Agent/ ├── Diagnoser/ │ ├── checkpoints/ │ │ ├── deepseek-r1/ # From deepseek-ai/DeepSeek-R1-Distill-Qwen-7B │ │ └── longformer/ # From this repo │ └── pretrained_models/ # From this repo └── Generator/ ├── checkpoints/ │ ├── llm/ # From this repo │ └── flow/ # From this repo └── pretrained_models/ # From this repo ``` ## 📝 Citation If you find this work useful, please cite our paper: ```bibtex @inproceedings{ zhang2026respagent, title={Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis}, author={Pengfei ZHANG and Tianxin Xie and Minghao Yang and Li Liu}, booktitle={The Fourteenth International Conference on Learning Representations}, year={2026}, url={https://openreview.net/forum?id=ZkoojtEm3W} } ``` ## 🙏 Acknowledgements - [BEATs](https://github.com/microsoft/unilm/tree/master/beats) - Audio pre-training framework - [DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1) - Reasoning model - [Longformer](https://github.com/allenai/longformer) - Long document transformer ## 📄 License These model weights are released for academic research purposes only under the MIT License.