mrohan commited on
Commit
32caeee
·
verified ·
1 Parent(s): 34e685b

Upload 14 files

Browse files
README.md ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPIRIT-LM Expressive Interleaved (Corrected Teacher, Libri-Light)
2
+
3
+ **SPIRIT-LM Expressive Interleaved (Corrected)** is a fine-tuned version of the 7B SPIRIT-LM teacher model adapted to the **Libri-Light** domain. It supports **interleaved speech and text inputs**, and was used as the **teacher model for distilling TinyWave**.
4
+
5
+ This checkpoint was fine-tuned for 10k steps with **LoRA adapters** on synthetic interleaved data created from Libri-Light and Whisper transcriptions. The resulting model improves alignment with the target distribution and provides stronger supervision for expressive speech–text generation.
6
+
7
+ > 📖 This checkpoint is part of the *TinyWave* distillation framework. See [arXiv:2506.23670](https://arxiv.org/abs/2506.23670) for details.
8
+
9
+ ---
10
+
11
+ ## 🧠 Model Purpose
12
+
13
+ | Role | Distillation Teacher |
14
+ |------------------|-------------------------------------------|
15
+ | Base Model | `spirit-lm-expressive-7b` (SPIRIT-LM) |
16
+ | Fine-tuned on | Libri-Light (10k steps with LoRA) |
17
+ | Input Modalities | Interleaved speech + text |
18
+ | Output | Speech tokens |
19
+ | Used for | Training `tinywave/interleaved-expressive-2b` |
20
+
21
+ ---
22
+
23
+ ## 🔧 Usage
24
+
25
+ ### 1. Install SPIRIT-LM and Load Expressive Tokenizer
26
+
27
+ ```bash
28
+ git clone https://github.com/facebookresearch/spiritlm
29
+ cd spiritlm
30
+ pip install -e '.[eval]'
31
+ ````
32
+
33
+ ```python
34
+ from spiritlm.speech_tokenizer import spiritlm_expressive
35
+ speech_tokenizer = spiritlm_expressive()
36
+ ```
37
+
38
+ ---
39
+
40
+ ### 2. Inference (Speech or Interleaved)
41
+
42
+ ```python
43
+ from transformers import LlamaForCausalLM, AutoTokenizer
44
+ import torchaudio
45
+ import torch
46
+
47
+ MODEL_PATH = "tinywave/expressive-spirit-lm-interleaved-librilight"
48
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
49
+ model = LlamaForCausalLM.from_pretrained(MODEL_PATH, device_map="auto", torch_dtype=torch.bfloat16)
50
+
51
+ # Interleaved speech input
52
+ speech_tokenizer = spiritlm_expressive()
53
+
54
+ def get_inference(audio_path):
55
+ audio, _ = torchaudio.load(audio_path)
56
+ input_values = audio.view(1, 1, -1).to(speech_tokenizer.hubert_model.device).float()
57
+ tokens = speech_tokenizer.encode_string(input_values)
58
+ input_ids = tokenizer(tokens, return_tensors="pt").input_ids.to(model.device)
59
+ output = model.generate(input_ids, max_new_tokens=256, do_sample=True, temperature=0.9, top_p=0.9)
60
+ return tokenizer.decode(output[0])
61
+
62
+ def get_inference_text(prompt):
63
+ input_ids = tokenizer(prompt + " [Speech]", return_tensors="pt").input_ids.to(model.device)
64
+ output = model.generate(input_ids, max_new_tokens=256, do_sample=True, temperature=0.9, top_p=0.9)
65
+ return tokenizer.decode(output[0])
66
+ ```
67
+
68
+ ---
69
+
70
+ ## 🎧 Inference Modes
71
+
72
+ ### 💬 Text + Speech Interleaving
73
+
74
+ Input:
75
+
76
+ ```text
77
+ "The astronaut stepped outside the capsule— [Speech]"
78
+ ```
79
+
80
+ Output:
81
+ Expressive speech continuation in WAV format.
82
+
83
+ ---
84
+
85
+ ### 🔄 Speech Continuation
86
+
87
+ Input: `speech.wav`
88
+ Output: Semantically and stylistically aligned spoken continuation.
89
+
90
+ ---
91
+
92
+ ## 📂 Files
93
+
94
+ * `pytorch_model.bin`: LoRA-adapted SPIRIT-LM 7B weights
95
+ * `config.json`, `tokenizer.json`: Compatible with Hugging Face Transformers
96
+ * Compatible with `spiritlm_expressive` tokenizer only
97
+
98
+ ---
99
+
100
+ ## 📎 Citation
101
+
102
+ ```bibtex
103
+ @article{nouriborji2025tinywave,
104
+ title={Efficient Interleaved Speech Modeling through Knowledge Distillation},
105
+ author={Nouriborji, Mohammadmahdi and Rohanian, Morteza},
106
+ journal={arXiv preprint arXiv:2506.23670},
107
+ year={2025}
108
+ }
109
+ ```
110
+
111
+ ---
112
+
113
+ ## 🔗 Related
114
+
115
+ * 🔬 Paper: [arXiv:2506.23670](https://arxiv.org/abs/2506.23670)
116
+ * 🧠 Student model: [`tinywave/interleaved-expressive-2b`](https://huggingface.co/tinywave/interleaved-expressive-2b)
117
+ * 🌐 [Project Website](https://mohammadmahdinoori.github.io/tinywave-landing/)
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 4096,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 11008,
14
+ "max_position_embeddings": 16384,
15
+ "mlp_bias": false,
16
+ "model_type": "llama",
17
+ "num_attention_heads": 32,
18
+ "num_hidden_layers": 32,
19
+ "num_key_value_heads": 32,
20
+ "pretraining_tp": 1,
21
+ "rms_norm_eps": 1e-05,
22
+ "rope_scaling": null,
23
+ "rope_theta": 100000.0,
24
+ "tie_word_embeddings": false,
25
+ "torch_dtype": "float32",
26
+ "transformers_version": "4.51.3",
27
+ "use_cache": true,
28
+ "vocab_size": 32768
29
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.51.3"
6
+ }
gitattributes.txt ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
model-00001-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ac270904565dd8f9aeaa83d24467eae348e72eb8cdce5e4dd380d2b8f71004a
3
+ size 4852979328
model-00002-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64ec253cd3ef03c34ce7d3d0496923cfa01d84ef801028ad1022dbf7174195b6
3
+ size 4857206856
model-00003-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dff7bdedeaafb4d9d3627da5876ba2014f7d5806c2cf89baa0e30961b3220e20
3
+ size 4857206904
model-00004-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:728d45e2a8492a349906c7f2107d3a2b4db8bfe62f1cde1c55d208318ec5c564
3
+ size 4857206904
model-00005-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d90cd61a40c5d8f533380bc2fcbc1b85c144ce6fa738bda97236a6198346efed
3
+ size 4857206904
model-00006-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86bca5dfd3e4217fc1bdca507efec3f1ad83a68feb7096c9ecfbb6b8df6eff7e
3
+ size 2697055024
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1608b90876103c6c3c67ca079ab9d3c5ee4e7707acf869103e728a4d30626643
3
+ size 514364
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff