Kiuyha commited on
Commit
138adc1
·
verified ·
1 Parent(s): 3236d95

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +149 -3
README.md CHANGED
@@ -1,3 +1,149 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - Kiuyha/dcase-5class-3source-mixtures-32k
5
+ ---
6
+ # Audio Source Separation with Time-Frequency Sequence Attention Res-U-Net (DCASE 2025)
7
+
8
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
9
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
10
+ [![PyTorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=flat&logo=PyTorch&logoColor=white)](https://pytorch.org/)
11
+
12
+ This repository contains an implementation that replicates the architecture described in **"TFSWA-ResUNet: music source separation with time–frequency sequence and shifted window attention-based ResUNet"**.
13
+
14
+ Instead of music source separation, this implementation adapts the model for **Sound Event Separation** using a subset of the DCASE 2025 Task 4 dataset. The entire training, validation, and testing pipeline is contained within a single Jupyter notebook.
15
+
16
+ ---
17
+
18
+ ## 🎯 Features
19
+
20
+ ### Architecture
21
+ - **Res-U-Net** with integrated Time-Frequency Sequence Attention (TF-SA) and Shifted Window Attention
22
+ - **Task**: Separating overlapping sound events in domestic environments
23
+ - **Input**: Magnitude spectrograms of mixed audio (32kHz sampling rate)
24
+ - **Output**: Estimated spectrograms of specific sound classes
25
+
26
+ ---
27
+
28
+ ## 📁 Project Structure
29
+
30
+ ```
31
+ Audio-Separation-ResUNet-TF-Attention/
32
+ ├── TF_SA_ResUNet.ipynb # Main notebook containing model, training, and inference
33
+ └── README.md # Project documentation
34
+ ```
35
+
36
+ ---
37
+
38
+ ## 📊 Dataset
39
+
40
+ This project uses a custom subset of the **DCASE 2025 Task 4 dataset**, reduced to facilitate efficient training while maintaining task complexity.
41
+
42
+ ### Dataset Statistics
43
+ - **Total Samples**: 10,000
44
+ - **Configuration**: 3 overlapping events per mixture
45
+ - **Classes**: 5 target sound classes
46
+ - **Sampling Rate**: 32kHz
47
+
48
+ ### Access the Dataset
49
+ - 🤗 [Hugging Face](https://huggingface.co/datasets/Kiuyha/dcase-5class-3source-mixtures-32k)
50
+ - 📦 [Kaggle](https://www.kaggle.com/datasets/kiuyha/dcase-5class-3source-mixtures-32k)
51
+
52
+ ---
53
+
54
+ ## 🚀 Installation & Usage
55
+
56
+ ### 1. Clone the Repository
57
+ ```bash
58
+ git clone https://github.com/kiuyha/Audio-Separation-ResUNet-TF-Attention.git
59
+ cd Audio-Separation-ResUNet-TF-Attention
60
+ ```
61
+
62
+ ### 2. Open the Notebook
63
+ This project is designed to run in **Google Colab** or a local **Jupyter** environment. All necessary dependencies are installed directly within the notebook cells.
64
+
65
+ - Open `TF_SA_ResUNet.ipynb`
66
+ - Ensure you have a **GPU runtime** enabled for training
67
+
68
+ ### 3. Dependencies
69
+ The code relies on standard deep learning and audio libraries:
70
+ - Python 3.8+
71
+ - PyTorch
72
+ - Librosa
73
+ - NumPy
74
+ - Matplotlib
75
+ - Soundfile
76
+
77
+ All dependencies are automatically installed when running the notebook cells.
78
+
79
+ ---
80
+
81
+ ## 🤖 Model Weights
82
+
83
+ Pre-trained model weights are hosted on Hugging Face:
84
+
85
+ 🤗 **[Download Model Weights](https://huggingface.co/kiuyha/TF-SA-ResUNet-Model)**
86
+
87
+ ### How to Load Weights
88
+ 1. Download the `.pth` file from the link above
89
+ 2. Place it in the root directory of the project (or upload it to your Colab session)
90
+ 3. Run the inference cell in the notebook to load the state dictionary
91
+
92
+ ---
93
+
94
+ ## 📈 Evaluation
95
+
96
+ The model is evaluated using the DCASE metric:
97
+ CA-SDRi (Class-Aware Sound Signal-to-Distortion Ratio Improvement)
98
+
99
+ ### Results
100
+
101
+ | Model Variant | CA-SDRi (dB) |
102
+ |---------------|--------------|
103
+ | ResUNet (Baseline) | 3.15857 |
104
+ | ResUNet + SpecAugment | 2.95301 |
105
+ | TF-SA-ResUNet | 5.25322 |
106
+ | TF-SA-ResUNet + SpecAugment | 4.66175 |
107
+
108
+ ---
109
+
110
+ ## Resources
111
+ - Read the Report: https://drive.google.com/file/d/1tsKs-xcIF_9E1K_2pLuiPkUknKcop8ik/view
112
+ - Code: https://github.com/kiuyha/Audio-Separation-ResUNet-TF-Attention
113
+
114
+ ---
115
+
116
+ ## 📝 Citation
117
+
118
+ If you use this implementation in your research, please cite the original paper:
119
+
120
+ ```bibtex
121
+ @article{kong2024tfswa,
122
+ title={TFSWA-ResUNet: music source separation with time–frequency sequence and shifted window attention-based ResUNet},
123
+ author={Kong, Q. and Cao, Y. and Liu, H. and Doi, K. and Iqbal, T.},
124
+ journal={Complex \& Intelligent Systems},
125
+ volume={10},
126
+ pages={1--17},
127
+ year={2024},
128
+ publisher={Springer}
129
+ }
130
+ ```
131
+
132
+ **Paper Link**: [TFSWA-ResUNet on Springer](https://link.springer.com/article/10.1186/s13634-025-01249-0)
133
+
134
+
135
+ ## 📜 License
136
+
137
+ This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.
138
+
139
+
140
+ ## 📧 Contact
141
+
142
+ For questions or issues, please open an issue on GitHub or contact the repository maintainer.
143
+
144
+
145
+ ## 🙏 Acknowledgments
146
+
147
+ - DCASE 2025 Task 4 organizers for providing the dataset framework
148
+ - Original authors of the TFSWA-ResUNet architecture
149
+ - The open-source audio processing community