mrohan commited on
Commit
bd8a84d
·
verified ·
1 Parent(s): 32caeee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -117
README.md CHANGED
@@ -1,117 +0,0 @@
1
- # SPIRIT-LM Expressive Interleaved (Corrected Teacher, Libri-Light)
2
-
3
- **SPIRIT-LM Expressive Interleaved (Corrected)** is a fine-tuned version of the 7B SPIRIT-LM teacher model adapted to the **Libri-Light** domain. It supports **interleaved speech and text inputs**, and was used as the **teacher model for distilling TinyWave**.
4
-
5
- This checkpoint was fine-tuned for 10k steps with **LoRA adapters** on synthetic interleaved data created from Libri-Light and Whisper transcriptions. The resulting model improves alignment with the target distribution and provides stronger supervision for expressive speech–text generation.
6
-
7
- > 📖 This checkpoint is part of the *TinyWave* distillation framework. See [arXiv:2506.23670](https://arxiv.org/abs/2506.23670) for details.
8
-
9
- ---
10
-
11
- ## 🧠 Model Purpose
12
-
13
- | Role | Distillation Teacher |
14
- |------------------|-------------------------------------------|
15
- | Base Model | `spirit-lm-expressive-7b` (SPIRIT-LM) |
16
- | Fine-tuned on | Libri-Light (10k steps with LoRA) |
17
- | Input Modalities | Interleaved speech + text |
18
- | Output | Speech tokens |
19
- | Used for | Training `tinywave/interleaved-expressive-2b` |
20
-
21
- ---
22
-
23
- ## 🔧 Usage
24
-
25
- ### 1. Install SPIRIT-LM and Load Expressive Tokenizer
26
-
27
- ```bash
28
- git clone https://github.com/facebookresearch/spiritlm
29
- cd spiritlm
30
- pip install -e '.[eval]'
31
- ````
32
-
33
- ```python
34
- from spiritlm.speech_tokenizer import spiritlm_expressive
35
- speech_tokenizer = spiritlm_expressive()
36
- ```
37
-
38
- ---
39
-
40
- ### 2. Inference (Speech or Interleaved)
41
-
42
- ```python
43
- from transformers import LlamaForCausalLM, AutoTokenizer
44
- import torchaudio
45
- import torch
46
-
47
- MODEL_PATH = "tinywave/expressive-spirit-lm-interleaved-librilight"
48
- tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
49
- model = LlamaForCausalLM.from_pretrained(MODEL_PATH, device_map="auto", torch_dtype=torch.bfloat16)
50
-
51
- # Interleaved speech input
52
- speech_tokenizer = spiritlm_expressive()
53
-
54
- def get_inference(audio_path):
55
- audio, _ = torchaudio.load(audio_path)
56
- input_values = audio.view(1, 1, -1).to(speech_tokenizer.hubert_model.device).float()
57
- tokens = speech_tokenizer.encode_string(input_values)
58
- input_ids = tokenizer(tokens, return_tensors="pt").input_ids.to(model.device)
59
- output = model.generate(input_ids, max_new_tokens=256, do_sample=True, temperature=0.9, top_p=0.9)
60
- return tokenizer.decode(output[0])
61
-
62
- def get_inference_text(prompt):
63
- input_ids = tokenizer(prompt + " [Speech]", return_tensors="pt").input_ids.to(model.device)
64
- output = model.generate(input_ids, max_new_tokens=256, do_sample=True, temperature=0.9, top_p=0.9)
65
- return tokenizer.decode(output[0])
66
- ```
67
-
68
- ---
69
-
70
- ## 🎧 Inference Modes
71
-
72
- ### 💬 Text + Speech Interleaving
73
-
74
- Input:
75
-
76
- ```text
77
- "The astronaut stepped outside the capsule— [Speech]"
78
- ```
79
-
80
- Output:
81
- Expressive speech continuation in WAV format.
82
-
83
- ---
84
-
85
- ### 🔄 Speech Continuation
86
-
87
- Input: `speech.wav`
88
- Output: Semantically and stylistically aligned spoken continuation.
89
-
90
- ---
91
-
92
- ## 📂 Files
93
-
94
- * `pytorch_model.bin`: LoRA-adapted SPIRIT-LM 7B weights
95
- * `config.json`, `tokenizer.json`: Compatible with Hugging Face Transformers
96
- * Compatible with `spiritlm_expressive` tokenizer only
97
-
98
- ---
99
-
100
- ## 📎 Citation
101
-
102
- ```bibtex
103
- @article{nouriborji2025tinywave,
104
- title={Efficient Interleaved Speech Modeling through Knowledge Distillation},
105
- author={Nouriborji, Mohammadmahdi and Rohanian, Morteza},
106
- journal={arXiv preprint arXiv:2506.23670},
107
- year={2025}
108
- }
109
- ```
110
-
111
- ---
112
-
113
- ## 🔗 Related
114
-
115
- * 🔬 Paper: [arXiv:2506.23670](https://arxiv.org/abs/2506.23670)
116
- * 🧠 Student model: [`tinywave/interleaved-expressive-2b`](https://huggingface.co/tinywave/interleaved-expressive-2b)
117
- * 🌐 [Project Website](https://mohammadmahdinoori.github.io/tinywave-landing/)