Add pipeline tag, library name, and project links

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +10 -9
README.md CHANGED
@@ -1,9 +1,10 @@
1
  ---
2
- license: mit
3
  language:
4
  - en
5
- base_model:
6
- - nvidia/audio-flamingo-3-hf
 
7
  tags:
8
  - audio
9
  - audio temporal grounding
@@ -14,14 +15,14 @@ tags:
14
 
15
  [![GitHub](https://img.shields.io/badge/GitHub-Repo-black?logo=github)](https://github.com/LoieSun/SpotSound)
16
  [![Paper](https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv)](https://arxiv.org/abs/2604.13023)
 
17
  [![Benchmark](https://img.shields.io/badge/🤗_HuggingFace-Benchmark-yellow)](https://huggingface.co/datasets/Loie/SpotSound-Bench)
18
 
19
  ## Model Summary
20
 
21
  **SpotSound** is a model designed to enhance Large Audio-Language Models (ALMs) with fine-grained temporal grounding capabilities. Built on top of [Audio Flamingo 3](https://huggingface.co/nvidia/audio-flamingo-3), SpotSound is capable of accurately pinpointing the exact start and end timestamps of specific acoustic events within long, untrimmed audio recordings based on natural language queries.
22
 
23
- This model is particularly effective for "needle-in-a-haystack" audio retrieval tasks, where short target sounds are embedded within complex background noise.
24
-
25
 
26
  ## Usage / Quick Start
27
 
@@ -29,7 +30,7 @@ To use SpotSound for inference, you need to download both the base **Audio Flami
29
 
30
  ### 1. Installation
31
 
32
- First, clone the official [SpotSound GitHub repository](#) and set up the environment:
33
 
34
  ```bash
35
  conda create -n SpotSound python=3.10
@@ -53,13 +54,13 @@ python inference.py \
53
 
54
  ## Citation
55
 
56
- If you use SpotSound or our benchmark in your research, please cite our paper:
57
 
58
  ```bibtex
59
  @inproceedings{sun2026spotsound,
60
  title={SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding},
61
- author={Sun, Luoyi and Zhou, Xiao and Li, Zeqian and Zhang, Ya and Wang, Yanking and Xie, Weidi},
62
- journal={arXiv preprint arXiv:2604.13023},
63
  year={2026}
64
  }
65
  ```
 
1
  ---
2
+ base_model: nvidia/audio-flamingo-3-hf
3
  language:
4
  - en
5
+ license: mit
6
+ pipeline_tag: audio-text-to-text
7
+ library_name: peft
8
  tags:
9
  - audio
10
  - audio temporal grounding
 
15
 
16
  [![GitHub](https://img.shields.io/badge/GitHub-Repo-black?logo=github)](https://github.com/LoieSun/SpotSound)
17
  [![Paper](https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv)](https://arxiv.org/abs/2604.13023)
18
+ [![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://loiesun.github.io/spotsound/)
19
  [![Benchmark](https://img.shields.io/badge/🤗_HuggingFace-Benchmark-yellow)](https://huggingface.co/datasets/Loie/SpotSound-Bench)
20
 
21
  ## Model Summary
22
 
23
  **SpotSound** is a model designed to enhance Large Audio-Language Models (ALMs) with fine-grained temporal grounding capabilities. Built on top of [Audio Flamingo 3](https://huggingface.co/nvidia/audio-flamingo-3), SpotSound is capable of accurately pinpointing the exact start and end timestamps of specific acoustic events within long, untrimmed audio recordings based on natural language queries.
24
 
25
+ This model is particularly effective for "needle-in-a-haystack" audio retrieval tasks, where short target sounds are embedded within complex background noise. For more details, see the paper: [SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding](https://huggingface.co/papers/2604.13023).
 
26
 
27
  ## Usage / Quick Start
28
 
 
30
 
31
  ### 1. Installation
32
 
33
+ First, clone the official [SpotSound GitHub repository](https://github.com/LoieSun/SpotSound) and set up the environment:
34
 
35
  ```bash
36
  conda create -n SpotSound python=3.10
 
54
 
55
  ## Citation
56
 
57
+ If you use SpotSound or the benchmark in your research, please cite the paper:
58
 
59
  ```bibtex
60
  @inproceedings{sun2026spotsound,
61
  title={SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding},
62
+ author={Sun, Luoyi and Zhou, Xiao and Li, Zeqian and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
63
+ journal={arXiv preprint arXiv:2604.13023},
64
  year={2026}
65
  }
66
  ```