Update README.md
Browse files
README.md
CHANGED
|
@@ -8,6 +8,14 @@ tags:
|
|
| 8 |
library_name: transformers
|
| 9 |
---
|
| 10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
# SSLAM Pretrain (ViT Base, 15 epochs)
|
| 12 |
|
| 13 |
This repository provides an SSLAM checkpoint formatted for use with Hugging Face Transformers. It is intended for feature extraction in audio LLMs, sound event detection, and general purpose audio representation learning. The implementation follows the [EAT](https://arxiv.org/abs/2401.03497) code path while swapping in SSLAM pretrained weights.
|
|
|
|
| 8 |
library_name: transformers
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# 🔊 [ICLR 2025] SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
|
| 12 |
+
|
| 13 |
+
[](https://openreview.net/forum?id=odU59TxdiB)
|
| 14 |
+
|
| 15 |
+
🚀 **SSLAM** is a self-supervised learning framework designed to enhance audio representation quality for both **polyphonic(multiple overlapping sounds)** and monophonic soundscapes. Unlike traditional SSL models that focus on monophonic data, SSLAM introduces a novel **source retention loss** and **audio mixture training**, significantly improving performance on real-world polyphonic audio.
|
| 16 |
+
|
| 17 |
+
🔗 **[Paper](https://openreview.net/pdf?id=odU59TxdiB) | [ICLR 2025 Poster: Video & Slides](https://iclr.cc/virtual/2025/poster/28347) | [Open Review](https://openreview.net/forum?id=odU59TxdiB) | [🤗 Models](https://huggingface.co/ta012/SSLAM_pretrain)**
|
| 18 |
+
|
| 19 |
# SSLAM Pretrain (ViT Base, 15 epochs)
|
| 20 |
|
| 21 |
This repository provides an SSLAM checkpoint formatted for use with Hugging Face Transformers. It is intended for feature extraction in audio LLMs, sound event detection, and general purpose audio representation learning. The implementation follows the [EAT](https://arxiv.org/abs/2401.03497) code path while swapping in SSLAM pretrained weights.
|