Improve model card and add metadata

e79ef1b verified 14 days ago

3.18 kB

language:
  - en
license: mit
pipeline_tag: audio-text-to-text
tags:
  - respiratory-sound
  - medical-ai
  - audio-generation
  - audio-classification

Resp-Agent Models

Model weights for Resp-Agent, an autonomous multimodal system for respiratory sound generation and disease diagnosis presented in the paper: Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis (ICLR 2026).

📦 GitHub Repository: zpforlove/Resp-Agent

📁 Contents

Model	Size	Description
Diagnoser/checkpoints/longformer	952 MB	Fine-tuned Longformer for EHR + audio analysis
Diagnoser/pretrained_models	695 MB	BEATs & Tokenizer pretrained weights
Generator/checkpoints/llm	3.8 GB	Fine-tuned LLM for audio generation
Generator/checkpoints/flow	2.0 GB	CFM flow matching model
Generator/pretrained_models	695 MB	BEATs & Tokenizer pretrained weights
audio_descriptions.jsonl	87 MB×2	Audio description data

Note: DeepSeek-R1 model is NOT included here. Please download separately from:
🔗 deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

🚀 Quick Download

from huggingface_hub import snapshot_download

# Download all models
snapshot_download(
    repo_id="AustinZhang/resp-agent-models",
    local_dir="./",
    ignore_patterns=["*.md", ".gitattributes"]
)

# Download DeepSeek-R1 separately
snapshot_download(
    repo_id="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
    local_dir="./Diagnoser/checkpoints/deepseek-r1"
)

📂 Expected Directory Structure

After downloading, your project should look like:

Resp-Agent/
├── Diagnoser/
│   ├── checkpoints/
│   │   ├── deepseek-r1/          # From deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
│   │   └── longformer/           # From this repo
│   └── pretrained_models/        # From this repo
└── Generator/
    ├── checkpoints/
    │   ├── llm/                  # From this repo
    │   └── flow/                 # From this repo
    └── pretrained_models/        # From this repo

📝 Citation

If you find this work useful, please cite our paper:

@inproceedings{
zhang2026respagent,
title={Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis},
author={Pengfei ZHANG and Tianxin Xie and Minghao Yang and Li Liu},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=ZkoojtEm3W}
}

🙏 Acknowledgements

BEATs - Audio pre-training framework
DeepSeek-R1 - Reasoning model
Longformer - Long document transformer

📄 License

These model weights are released for academic research purposes only under the MIT License.