Update version of model using phonemes
Browse files
README.md
CHANGED
|
@@ -3,91 +3,81 @@ license: apache-2.0
|
|
| 3 |
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
|
| 4 |
tags:
|
| 5 |
- lyrics
|
|
|
|
|
|
|
| 6 |
- katakana
|
|
|
|
| 7 |
- english-to-katakana
|
| 8 |
-
-
|
| 9 |
-
- english2katakana
|
| 10 |
- tinyllama
|
| 11 |
---
|
| 12 |
|
| 13 |
-
# TinyLlama-1.1B-
|
| 14 |
|
| 15 |
-
This model is a fine-tuned version of `TinyLlama/TinyLlama-1.1B-Chat-v1.0`
|
| 16 |
|
| 17 |
-
## 🌟 Concept: "
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
-
##
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
* **Silent Letters:** Trained to ignore silent consonants (e.g., `honest` → `オネス`, `hour` → `アワー`).
|
| 27 |
-
* **Lyric-focused Reductions:** Strong support for informal contractions like `gonna`, `wanna`, and `gotta`.
|
| 28 |
-
* **Complex Phonetics:** Specifically trained to handle difficult phonetic mappings like `Scarborough Fair` → `スカーブラフェア`.
|
| 29 |
|
| 30 |
-
##
|
| 31 |
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
| Scarborough Fair | スカーバラ フェア | **スカーブラフェア** |
|
| 37 |
-
| Take it anymore | テイク イット エニモア | **テイキッエニモー** |
|
| 38 |
|
| 39 |
-
##
|
| 40 |
|
| 41 |
-
To get the best results, use the following prompt format:
|
| 42 |
|
| 43 |
-
```text
|
| 44 |
-
英語を歌いやすいように、音のつながり(リエゾン)を考慮してカタカナに変換してください。
|
| 45 |
|
| 46 |
-
|
| 47 |
-
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
-
|
| 50 |
-
カタカナ: アイワナホージュー
|
| 51 |
|
| 52 |
-
|
| 53 |
-
カタカナ:
|
| 54 |
-
```
|
| 55 |
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
```python
|
| 59 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 60 |
-
from peft import PeftModel
|
| 61 |
-
|
| 62 |
-
base_model_path = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
|
| 63 |
-
lora_model_path = "YOUR_USERNAME/TinyLlama-1.1B-Katakana-Lyrics-Liaison"
|
| 64 |
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
|
|
|
| 68 |
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
outputs = model.generate(**inputs, max_new_tokens=50)
|
| 73 |
-
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 74 |
|
| 75 |
```
|
| 76 |
|
| 77 |
-
## 🛠
|
| 78 |
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
|
| 84 |
-
## ⚠️ Limitations
|
| 85 |
|
| 86 |
-
* **Model Size:**
|
| 87 |
-
* **
|
| 88 |
|
| 89 |
## 📜 License
|
| 90 |
|
| 91 |
-
|
| 92 |
|
| 93 |
-
|
|
|
|
| 3 |
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
|
| 4 |
tags:
|
| 5 |
- lyrics
|
| 6 |
+
- phonetics
|
| 7 |
+
- g2p
|
| 8 |
- katakana
|
| 9 |
+
- english-to-phoneme
|
| 10 |
- english-to-katakana
|
| 11 |
+
- liaison
|
|
|
|
| 12 |
- tinyllama
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# TinyLlama-1.1B-Phonetic-Liaison-Katakana-Generator
|
| 16 |
|
| 17 |
+
This model is a fine-tuned version of `TinyLlama/TinyLlama-1.1B-Chat-v1.0` designed to predict **connected phoneme sequences** and **rhythm-optimized Katakana**. It focuses on capturing real-world auditory phenomena like liaison, reduction, and flapping.
|
| 18 |
|
| 19 |
+
## 🌟 The Concept: "Phonetic Bridge for Natural Speech"
|
| 20 |
|
| 21 |
+
Traditional G2P (Grapheme-to-Phoneme) converters often treat words in isolation. This model serves as a **Phonetic Bridge**, predicting how sounds change in continuous speech.
|
| 22 |
|
| 23 |
+
### For Global Developers (The "Connected Phonemes" Advantage)
|
| 24 |
+
While the model outputs Katakana, its core intelligence lies in generating **Connected Phoneme Sequences (ARPAbet)**.
|
| 25 |
+
- **TTS Frontend:** Use the linked phoneme output to improve the prosody of your Text-to-Speech engines.
|
| 26 |
+
- **ESL Tools:** Visualize for learners how "Take it" becomes `/t ey1 k ih1 t/` instead of two separate words.
|
| 27 |
|
| 28 |
+
### For Japanese Learners ("The Training Wheels")
|
| 29 |
+
I am a firm believer that English should ideally be learned through ears, not Katakana. However, beginners often face a "fear of the written word."
|
| 30 |
+
This model provides **"Supportive Katakana"**—not a translation, but a phonetic map that mimics native rhythm, acting as training wheels for the ear.
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
## ✨ Key Features
|
| 33 |
|
| 34 |
+
* **Connected Phonemes (ARPAbet):** Outputs the exact phonetic string including liaison (e.g., `a little bit` -> `AH0 L IH1 D AH0 L B IH1 T`).
|
| 35 |
+
* **Liaison & Flapping:** Naturally handles `T` to `D` transformations and word-to-word connections.
|
| 36 |
+
* **Silent Letters:** Intelligently ignores non-vocalized consonants.
|
| 37 |
+
* **Modern ESL Approach:** Designed for high-speed inference on mobile devices (ready for GGUF/on-device PoC).
|
|
|
|
|
|
|
| 38 |
|
| 39 |
+
## 📊 Comparison: Beyond Dictionary Rules
|
| 40 |
|
|
|
|
| 41 |
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
| English Phrase | Dictionary Phonemes | **This Model (Linked Phonemes)** | **Supportive Katakana** |
|
| 44 |
+
| --- | --- | --- | --- |
|
| 45 |
+
| **A little bit** | `[AH0] [L IH1 T AH0 L] [B IH1 T]` | `AH0 L IH1 D AH0 L B IH1 T` | **アリロビッ** |
|
| 46 |
+
| **Check it out** | `[CH EH1 K] [IH1 T] [AW1 T]` | `CH EH1 K IH1 T AW1 T` | **チェキラッ** |
|
| 47 |
+
| **Middle of the night**| `[M IH1 D AH0 L] [AH1 V]...` | `M IH1 D AH0 L AH1 V DH AH0 N AY1 T`| **ミドロヴザナイッ** |
|
| 48 |
|
| 49 |
+
## 🚀 Prompt Format
|
|
|
|
| 50 |
|
| 51 |
+
To extract both Katakana and the connected phoneme sequence, use the following format:
|
|
|
|
|
|
|
| 52 |
|
| 53 |
+
```text
|
| 54 |
+
英語とその単語単位の音素から、リエゾンを考慮したカタカナと繋がった音素列を生成してください。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
+
英語: take it easy
|
| 57 |
+
単語音素: [T EY1 K] [IH1 T] [IY1 Z IY0]
|
| 58 |
+
カタカナ: テイキットイージー
|
| 59 |
+
繋がった音素: T EY1 K IH1 T IY1 Z IY0
|
| 60 |
|
| 61 |
+
英語: {Your Phrase}
|
| 62 |
+
単語音素: {Standard G2P Output}
|
| 63 |
+
カタカナ:
|
|
|
|
|
|
|
| 64 |
|
| 65 |
```
|
| 66 |
|
| 67 |
+
## 🛠 Technical Specs & Dataset
|
| 68 |
|
| 69 |
+
* **Dataset:** 1,200+ hand-curated pairs of English phrases and their auditory-correct phonetic mappings.
|
| 70 |
+
* **Evaluation:** Currently being benchmarked against the `speechocean762` dataset for pronunciation scoring PoC.
|
| 71 |
+
* **Architecture:** LoRA fine-tuning on TinyLlama 1.1B.
|
| 72 |
+
* **Optimization:** Highly compatible with **GGUF** for ultra-lightweight mobile app integration (MFCC/DTW based evaluation).
|
| 73 |
|
| 74 |
+
## ⚠️ Limitations & Bias
|
| 75 |
|
| 76 |
+
* **Model Size:** 1.1B parameters. While fast, it may hallucinate on rare proper nouns.
|
| 77 |
+
* **Accent:** Optimized for General American English (GenAm) commonly found in global pop music and media.
|
| 78 |
|
| 79 |
## 📜 License
|
| 80 |
|
| 81 |
+
Apache 2.0
|
| 82 |
|
| 83 |
+
```
|