Added README

Files changed (3) hide show

README.md +97 -3
figures/audio_generation_results.pdf +0 -0
figures/audio_understanding_results.pdf +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,97 @@
----
-license: apache-2.0
----

+---
+library_name: transformers
+pipeline_tag:
+- audio-to-audio
+- audio-classification
+license: apache-2.0
+---
+# DashengTokenizer
+![Audio Generation Results](./figures/audio_generation_results.pdf)
+![Audio Understanding Results](./figures/audio_understanding_results.pdf)
+DashengTokenizer is a high-performance neural audio tokenizer designed for audio understanding and generation tasks.
+## Usage
+### Installation
+```bash
+uv pip install transformers torch torchaudio einops
+```
+### Basic Usage
+```python
+import torch
+import torchaudio
+from transformers import AutoModel
+# Load the model
+model = AutoModel.from_pretrained("mispeech/dashengtokenizer", trust_remote_code=True)
+model.eval()
+# Load audio file (only 16kHz supported!)
+audio, sr = torchaudio.load("path/to/audio.wav")
+# Optional: Create attention mask for variable-length inputs
+# attention_mask = torch.ones(audio.shape[0], audio.shape[1])  # All ones for full audio
+# attention_mask[0, 8000:] = 0  # Example: mask second half of first sample
+# Method 1: End-to-end processing (encode + decode)
+with torch.no_grad():
+    outputs = model(audio)  # Optionally pass attention_mask=attention_mask
+    reconstructed_audio = outputs["audio"]
+    embeddings = outputs['embeddings']
+# Method 2: Separate encoding and decoding
+with torch.no_grad():
+    # Encode audio to embeddings
+    embeddings = model.encode(audio)  # Optionally pass attention_mask=attention_mask
+    # Decode embeddings back to audio
+    reconstructed_audio = model.decode(embeddings)
+# Save reconstructed audio
+torchaudio.save("reconstructed_audio.wav", reconstructed_audio, sr)
+```
+## Use Cases
+### 1. Audio Encoding
+```python
+embeddings = model.encode(audio)
+reconstructed = model.decode(embeddings)
+```
+### 2. Feature Extraction
+```python
+# Extract rich audio features for downstream tasks
+features = model.encode(audio)
+# Use features for classification, clustering, etc.
+```
+## Limitations
+- Optimized for 16kHz mono audio
+## Citation
+If you use DashengTokenizer in your research, please cite:
+```bibtex
+@misc{dinkel_dashengtokenizer_2026,
+  title={DashengTokenizer: One layer is enough for unified audio understanding and generation},
+  author={MiLM Plus, Xiaomi},
+  year={2026},
+  url={https://huggingface.co/mispeech/dashengtokenizer}
+}
+```
+## License
+Apache 2.0 License

figures/audio_generation_results.pdf ADDED Viewed

Binary file (16.6 kB). View file

figures/audio_understanding_results.pdf ADDED Viewed

Binary file (20.9 kB). View file