Spaces:
Sleeping
Sleeping
Ander Arriandiaga commited on
Commit ·
a7c0c81
1
Parent(s): 42aaddc
Initial commit for Hugging Face Space
Browse files- .gitattributes +3 -35
- .gitignore +1 -0
- README.md +18 -14
- README_developer.md +57 -0
- app.py +9 -0
- dict/es_dicc.dic +3 -0
- dict/es_dicc_20241204.dic +3 -0
- dict/eu_dicc.dic +3 -0
- dict/eu_dicc_20250326.dic +3 -0
- eu_phonemizer_v2.py +333 -0
- gradio_phonemizer.py +506 -0
- img/download.png +0 -0
- modulo1y2/modulo1y2 +3 -0
- prepare.sh +19 -0
- push_to_hf.sh +35 -0
- requirements.txt +3 -0
.gitattributes
CHANGED
|
@@ -1,35 +1,3 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
-
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
-
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
-
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
-
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
-
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
-
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
-
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
-
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
-
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
-
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
-
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
-
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
-
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
-
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
-
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
-
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
-
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
-
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
-
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
-
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
-
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
-
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
-
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
-
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
-
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
-
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
-
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
-
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
-
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
-
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
-
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
| 1 |
+
# Track large dictionary files and binary with Git LFS if enabled
|
| 2 |
+
dict/* filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
modulo1y2/modulo1y2 filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.gitignore
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
outputs/
|
README.md
CHANGED
|
@@ -1,14 +1,18 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Phonemizer — Gradio demo (Hugging Face Space)
|
| 2 |
+
|
| 3 |
+
This Space provides a small web UI to phonemize Basque (eu) and Spanish (es) text.
|
| 4 |
+
|
| 5 |
+
How to use
|
| 6 |
+
- Input text: paste text into the main box or upload a `.txt` file.
|
| 7 |
+
- Language: select `eu` (Basque) or `es` (Spanish).
|
| 8 |
+
- Symbols: choose `sampa` (default) or `ipa` for the phoneme output format.
|
| 9 |
+
- Separate phonemes: toggle whether phonemes are separated by spaces to make easier to see multi-character phonemes.
|
| 10 |
+
- Submit: press `Submit` to run normalization + phonemization.
|
| 11 |
+
- Download: use the download buttons to get the phonemes or normalized text as `.txt` files.
|
| 12 |
+
|
| 13 |
+
Privacy
|
| 14 |
+
- This Space does not store user inputs beyond temporary files used to serve downloads. Do not upload sensitive data.
|
| 15 |
+
|
| 16 |
+
Credits
|
| 17 |
+
- Developed by Ander Arriandiaga in Aholab (HiTZ).
|
| 18 |
+
|
README_developer.md
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Phonemizer Gradio Space — Developer Notes
|
| 2 |
+
|
| 3 |
+
This repository contains a Gradio app wrapper for the Phonemizer used in this project.
|
| 4 |
+
|
| 5 |
+
Files to keep in the Space repo for runtime
|
| 6 |
+
- `gradio_phonemizer.py` (UI) and `eu_phonemizer_v2.py` (phonemizer logic)
|
| 7 |
+
- `app.py` (Gradio entrypoint)
|
| 8 |
+
- `modulo1y2/modulo1y2` (the phonemizer executable) OR source+build files in `modulo1y2/`
|
| 9 |
+
- `dict/` containing `eu_dicc` (or `eu_dicc.dic`) and `es_dicc` (or `es_dicc.dic`)
|
| 10 |
+
- `requirements.txt`
|
| 11 |
+
|
| 12 |
+
Recommended deployment options
|
| 13 |
+
|
| 14 |
+
- Ship the `modulo1y2` executable and the minimal dictionary files in the repo (fastest).
|
| 15 |
+
- OR keep only sources and build the executable on Space startup using an `apt.txt` and a `make` step.
|
| 16 |
+
- OR host large dictionaries/executables on the Hugging Face Hub (dataset/model repo) and download them at startup using `huggingface_hub.hf_hub_download`.
|
| 17 |
+
|
| 18 |
+
Quick local test
|
| 19 |
+
|
| 20 |
+
1. Create a venv and install dependencies:
|
| 21 |
+
|
| 22 |
+
```bash
|
| 23 |
+
python3 -m venv .venv
|
| 24 |
+
source .venv/bin/activate
|
| 25 |
+
pip install -r requirements.txt
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
2. Ensure the executable is present and executable:
|
| 29 |
+
|
| 30 |
+
```bash
|
| 31 |
+
chmod +x modulo1y2/modulo1y2
|
| 32 |
+
ls -l modulo1y2/modulo1y2
|
| 33 |
+
ls -l dict/eu_dicc* dict/es_dicc*
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
3. Run the app locally:
|
| 37 |
+
|
| 38 |
+
```bash
|
| 39 |
+
python app.py
|
| 40 |
+
# then open http://localhost:7860
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
Pushing to Hugging Face Spaces
|
| 44 |
+
|
| 45 |
+
1. (Optional) Install git-lfs and track large files:
|
| 46 |
+
|
| 47 |
+
```bash
|
| 48 |
+
git lfs install
|
| 49 |
+
git lfs track "dict/*"
|
| 50 |
+
git lfs track "modulo1y2/modulo1y2"
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
2. Create a Space (via web UI or `huggingface-cli repo create <user>/<space> --type=space`), then push this repo to the Space remote.
|
| 54 |
+
|
| 55 |
+
Licensing and redistribution
|
| 56 |
+
|
| 57 |
+
Before uploading binaries or dictionary files, confirm you have the right to redistribute them.
|
app.py
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
from gradio_phonemizer import build_interface
|
| 3 |
+
|
| 4 |
+
demo = build_interface()
|
| 5 |
+
|
| 6 |
+
if __name__ == "__main__":
|
| 7 |
+
# Respect common env vars used by hosting platforms
|
| 8 |
+
port = int(os.environ.get("PORT", os.environ.get("GRADIO_SERVER_PORT", 7860)))
|
| 9 |
+
demo.launch(server_name="0.0.0.0", server_port=port)
|
dict/es_dicc.dic
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3880d688565dcfc4c1a239cb94c6cc0466b603cbf86fbf8a20ca411d64cb3c03
|
| 3 |
+
size 141770
|
dict/es_dicc_20241204.dic
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3880d688565dcfc4c1a239cb94c6cc0466b603cbf86fbf8a20ca411d64cb3c03
|
| 3 |
+
size 141770
|
dict/eu_dicc.dic
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4a4c6553965ac7c7937b599d3e8a3d8d94df48a0bdef943a84c63f4b261172f8
|
| 3 |
+
size 865575
|
dict/eu_dicc_20250326.dic
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4a4c6553965ac7c7937b599d3e8a3d8d94df48a0bdef943a84c63f4b261172f8
|
| 3 |
+
size 865575
|
eu_phonemizer_v2.py
ADDED
|
@@ -0,0 +1,333 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import subprocess
|
| 2 |
+
import logging
|
| 3 |
+
import string
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
from collections import OrderedDict
|
| 6 |
+
from nltk.tokenize import TweetTokenizer
|
| 7 |
+
from typing import List, Dict, Optional
|
| 8 |
+
import re
|
| 9 |
+
|
| 10 |
+
# Constants
|
| 11 |
+
SUPPORTED_LANGUAGES = {'eu', 'es'}
|
| 12 |
+
SUPPORTED_SYMBOLS = {'sampa', 'ipa'}
|
| 13 |
+
SAMPA_TO_IPA = OrderedDict([
|
| 14 |
+
("p", "p"), ("b", "b"), ("t", "t"), ("c", "c"), ("d", "d"),
|
| 15 |
+
("k", "k"), ("g", "ɡ"), ("tS", "tʃ"), ("ts", "ts"), ("ts`", "tʂ"),
|
| 16 |
+
("gj", "ɟ"), ("jj", "ɪ"), ("f", "f"), ("B", "β"), ("T", "θ"),
|
| 17 |
+
("D", "ð"), ("s", "s"), ("s`", "ʂ"), ("S", "ʃ"), ("x", "x"),
|
| 18 |
+
("G", "ɣ"), ("m", "m"), ("n", "n"), ("J", "ɲ"), ("l", "l"),
|
| 19 |
+
("L", "ʎ"), ("r", "ɾ"), ("rr", "r"), ("j", "j"), ("w", "w"),
|
| 20 |
+
("i", "i"), ("'i", "'i"), ("e", "e"), ("'e", "'e"), ("a", "a"),
|
| 21 |
+
("'a", "'a"), ("o", "o"), ("'o", "'o"), ("u", "u"), ("'u", "'u"),
|
| 22 |
+
("y", "y"), ("Z", "ʒ"), ("h", "h"), ("ph", "pʰ"), ("kh", "kʰ"),
|
| 23 |
+
("th", "tʰ")
|
| 24 |
+
])
|
| 25 |
+
|
| 26 |
+
MULTICHAR_TO_SINGLECHAR = {
|
| 27 |
+
"tʃ": "C",
|
| 28 |
+
"ts": "V",
|
| 29 |
+
"tʂ": "P",
|
| 30 |
+
"'i": "I",
|
| 31 |
+
"'e": "E",
|
| 32 |
+
"'a": "A",
|
| 33 |
+
"'o": "O",
|
| 34 |
+
"'u": "U",
|
| 35 |
+
"pʰ": "H",
|
| 36 |
+
"kʰ": "K",
|
| 37 |
+
"tʰ": "T"
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
class PhonemizerError(Exception):
|
| 41 |
+
"""Custom exception for Phonemizer errors."""
|
| 42 |
+
pass
|
| 43 |
+
|
| 44 |
+
class Phonemizer:
|
| 45 |
+
def __init__(self, language: str = "eu", symbol: str = "sampa",
|
| 46 |
+
path_modulo1y2: str = "modulo1y2/modulo1y2",
|
| 47 |
+
path_dicts: str = "dict") -> None:
|
| 48 |
+
"""Initialize the Phonemizer with the given language and symbol."""
|
| 49 |
+
if language not in SUPPORTED_LANGUAGES:
|
| 50 |
+
raise PhonemizerError(f"Unsupported language: {language}")
|
| 51 |
+
if symbol not in SUPPORTED_SYMBOLS:
|
| 52 |
+
raise PhonemizerError(f"Unsupported symbol type: {symbol}")
|
| 53 |
+
|
| 54 |
+
self.language = language
|
| 55 |
+
self.symbol = symbol
|
| 56 |
+
self.path_modulo1y2 = Path(path_modulo1y2)
|
| 57 |
+
self.path_dicts = Path(path_dicts)
|
| 58 |
+
self.logger = logging.getLogger(__name__)
|
| 59 |
+
|
| 60 |
+
# Initialize SAMPA to IPA dictionary
|
| 61 |
+
self._sampa_to_ipa_dict = SAMPA_TO_IPA
|
| 62 |
+
|
| 63 |
+
# Initialize word splitter regex
|
| 64 |
+
self._word_splitter = re.compile(r'\w+|[^\w\s]', re.UNICODE)
|
| 65 |
+
|
| 66 |
+
self._validate_paths()
|
| 67 |
+
|
| 68 |
+
def normalize(self, text: str) -> str:
|
| 69 |
+
"""Normalize the given text using an external command."""
|
| 70 |
+
try:
|
| 71 |
+
command = self._build_normalization_command()
|
| 72 |
+
process = subprocess.Popen(
|
| 73 |
+
command,
|
| 74 |
+
stdin=subprocess.PIPE,
|
| 75 |
+
stdout=subprocess.PIPE,
|
| 76 |
+
stderr=subprocess.PIPE,
|
| 77 |
+
text=True,
|
| 78 |
+
encoding='ISO-8859-15',
|
| 79 |
+
shell=True
|
| 80 |
+
)
|
| 81 |
+
stdout, stderr = process.communicate(input=text)
|
| 82 |
+
|
| 83 |
+
if process.returncode != 0:
|
| 84 |
+
# Filter out the SetDur warning from the error message
|
| 85 |
+
filtered_stderr = '\n'.join(line for line in stderr.split('\n')
|
| 86 |
+
if 'Warning: argument not used SetDur' not in line)
|
| 87 |
+
if filtered_stderr.strip(): # Only raise error if there are other errors
|
| 88 |
+
error_msg = f"Normalization failed: {filtered_stderr}"
|
| 89 |
+
self.logger.error(error_msg)
|
| 90 |
+
raise PhonemizerError(error_msg)
|
| 91 |
+
|
| 92 |
+
return stdout.strip()
|
| 93 |
+
|
| 94 |
+
except Exception as e:
|
| 95 |
+
error_msg = f"Error during normalization: {str(e)}"
|
| 96 |
+
self.logger.error(error_msg)
|
| 97 |
+
return text
|
| 98 |
+
|
| 99 |
+
def getPhonemes(self, text: str, separate_phonemes: bool = False) -> str:
|
| 100 |
+
"""Extract phonemes from the given text.
|
| 101 |
+
|
| 102 |
+
Args:
|
| 103 |
+
text (str): The input text to convert to phonemes
|
| 104 |
+
separate_phonemes (bool): If True, keeps spaces between phonemes. If False, produces compact phoneme strings.
|
| 105 |
+
Defaults to False.
|
| 106 |
+
|
| 107 |
+
Returns:
|
| 108 |
+
str: The phoneme sequence with words separated by " | "
|
| 109 |
+
"""
|
| 110 |
+
try:
|
| 111 |
+
# Pre-process text to handle dots consistently
|
| 112 |
+
# Replace multiple dots with a single dot to avoid issues with ellipsis
|
| 113 |
+
text = re.sub(r'\.{2,}', '.', text)
|
| 114 |
+
|
| 115 |
+
# Process input line-by-line so we preserve original newlines
|
| 116 |
+
lines = text.split('\n')
|
| 117 |
+
per_line_outputs = []
|
| 118 |
+
for line in lines:
|
| 119 |
+
# If the input line is empty, preserve empty line
|
| 120 |
+
if not line.strip():
|
| 121 |
+
per_line_outputs.append('')
|
| 122 |
+
continue
|
| 123 |
+
|
| 124 |
+
command = self._build_phoneme_extraction_command()
|
| 125 |
+
proc = subprocess.Popen(
|
| 126 |
+
command,
|
| 127 |
+
stdin=subprocess.PIPE,
|
| 128 |
+
stdout=subprocess.PIPE,
|
| 129 |
+
stderr=subprocess.PIPE,
|
| 130 |
+
text=True,
|
| 131 |
+
encoding='ISO-8859-15',
|
| 132 |
+
shell=True
|
| 133 |
+
)
|
| 134 |
+
stdout, stderr = proc.communicate(input=line)
|
| 135 |
+
if proc.returncode != 0:
|
| 136 |
+
error_msg = f"Phoneme extraction failed: {stderr}"
|
| 137 |
+
self.logger.error(error_msg)
|
| 138 |
+
raise PhonemizerError(error_msg)
|
| 139 |
+
|
| 140 |
+
# Replace any internal newlines in tool output with sentinel (shouldn't normally occur for single line)
|
| 141 |
+
stdout_line = stdout.replace('\n', ' | _ | ')
|
| 142 |
+
|
| 143 |
+
# Split into words and handle each separately for this line
|
| 144 |
+
word_phonemes = stdout_line.split(" | ")
|
| 145 |
+
result_phonemes = []
|
| 146 |
+
cleaned_phonemes = []
|
| 147 |
+
for phoneme_seq in word_phonemes:
|
| 148 |
+
if not phoneme_seq.strip():
|
| 149 |
+
continue
|
| 150 |
+
if phoneme_seq.strip() == "_":
|
| 151 |
+
continue
|
| 152 |
+
cleaned_phonemes.append(phoneme_seq.strip())
|
| 153 |
+
# Tokenize the original line into words/punctuation
|
| 154 |
+
words = self._word_splitter.findall(line)
|
| 155 |
+
|
| 156 |
+
# Count non-punctuation words
|
| 157 |
+
non_punct_words = [w for w in words if w not in string.punctuation]
|
| 158 |
+
|
| 159 |
+
# Ensure we have enough phonemes for all non-punctuation words
|
| 160 |
+
if len(cleaned_phonemes) < len(non_punct_words):
|
| 161 |
+
while len(cleaned_phonemes) < len(non_punct_words):
|
| 162 |
+
if cleaned_phonemes:
|
| 163 |
+
cleaned_phonemes.append(cleaned_phonemes[-1])
|
| 164 |
+
else:
|
| 165 |
+
cleaned_phonemes.append("a")
|
| 166 |
+
|
| 167 |
+
# Process words and phonemes together for this line
|
| 168 |
+
phoneme_idx = 0
|
| 169 |
+
word_idx = 0
|
| 170 |
+
line_result = []
|
| 171 |
+
|
| 172 |
+
while word_idx < len(words):
|
| 173 |
+
word = words[word_idx]
|
| 174 |
+
|
| 175 |
+
if word in string.punctuation:
|
| 176 |
+
line_result.append(word)
|
| 177 |
+
word_idx += 1
|
| 178 |
+
continue
|
| 179 |
+
|
| 180 |
+
# Regular word processing
|
| 181 |
+
if phoneme_idx < len(cleaned_phonemes):
|
| 182 |
+
phonemes = cleaned_phonemes[phoneme_idx].split()
|
| 183 |
+
if self.symbol == "sampa":
|
| 184 |
+
if separate_phonemes:
|
| 185 |
+
processed_phonemes = " ".join(p for p in phonemes if p != "-")
|
| 186 |
+
else:
|
| 187 |
+
processed_phonemes = "".join(p for p in phonemes if p != "-")
|
| 188 |
+
else:
|
| 189 |
+
ipa_phonemes = [self._sampa_to_ipa_dict.get(p, p) for p in phonemes if p != "-"]
|
| 190 |
+
if separate_phonemes:
|
| 191 |
+
processed_phonemes = " ".join(ipa_phonemes)
|
| 192 |
+
else:
|
| 193 |
+
processed_phonemes = "".join(ipa_phonemes)
|
| 194 |
+
|
| 195 |
+
line_result.append(processed_phonemes)
|
| 196 |
+
phoneme_idx += 1
|
| 197 |
+
word_idx += 1
|
| 198 |
+
else:
|
| 199 |
+
# No phoneme left for this word: skip it
|
| 200 |
+
word_idx += 1
|
| 201 |
+
|
| 202 |
+
# If there are leftover phonemes, append them
|
| 203 |
+
while phoneme_idx < len(cleaned_phonemes):
|
| 204 |
+
phonemes = cleaned_phonemes[phoneme_idx].split()
|
| 205 |
+
if self.symbol == "sampa":
|
| 206 |
+
processed_phonemes = " ".join(p for p in phonemes if p != "-")
|
| 207 |
+
else:
|
| 208 |
+
ipa_phonemes = [self._sampa_to_ipa_dict.get(p, p) for p in phonemes if p != "-"]
|
| 209 |
+
if separate_phonemes:
|
| 210 |
+
processed_phonemes = " ".join(ipa_phonemes)
|
| 211 |
+
else:
|
| 212 |
+
processed_phonemes = "".join(ipa_phonemes)
|
| 213 |
+
|
| 214 |
+
line_result.append(processed_phonemes)
|
| 215 |
+
phoneme_idx += 1
|
| 216 |
+
|
| 217 |
+
# Format final output for this line using spacing rules
|
| 218 |
+
out_parts = []
|
| 219 |
+
# Keep a parallel map to the original words so we can decide sentence splits
|
| 220 |
+
orig_map = []
|
| 221 |
+
for idx, token in enumerate(line_result):
|
| 222 |
+
is_punct = token in string.punctuation
|
| 223 |
+
if not is_punct:
|
| 224 |
+
normalized = re.sub(r"\s+", " ", token.strip())
|
| 225 |
+
out_parts.append(normalized)
|
| 226 |
+
# Map this output token to the corresponding original word (if available)
|
| 227 |
+
if idx < len(words):
|
| 228 |
+
orig_map.append(words[idx])
|
| 229 |
+
else:
|
| 230 |
+
orig_map.append(None)
|
| 231 |
+
else:
|
| 232 |
+
out_parts.append(token)
|
| 233 |
+
if idx < len(words):
|
| 234 |
+
orig_map.append(words[idx])
|
| 235 |
+
else:
|
| 236 |
+
orig_map.append(None)
|
| 237 |
+
|
| 238 |
+
final_line = ""
|
| 239 |
+
for i, tok in enumerate(out_parts):
|
| 240 |
+
if i == 0:
|
| 241 |
+
final_line += tok
|
| 242 |
+
continue
|
| 243 |
+
|
| 244 |
+
prev = out_parts[i-1]
|
| 245 |
+
|
| 246 |
+
if tok in string.punctuation:
|
| 247 |
+
final_line = final_line.rstrip(' ')
|
| 248 |
+
final_line += (' ' if separate_phonemes else ' ') + tok
|
| 249 |
+
# Preserve input line boundaries: do NOT insert newlines mid-line.
|
| 250 |
+
# Always add the standard separator after punctuation.
|
| 251 |
+
if i < len(out_parts) - 1:
|
| 252 |
+
final_line += (' ' if separate_phonemes else ' ')
|
| 253 |
+
else:
|
| 254 |
+
if prev in string.punctuation:
|
| 255 |
+
final_line += tok
|
| 256 |
+
else:
|
| 257 |
+
sep = ' ' if separate_phonemes else ' '
|
| 258 |
+
final_line += sep + tok
|
| 259 |
+
|
| 260 |
+
# If a sentence-ending punctuation is followed by a capital letter,
|
| 261 |
+
# split into separate lines (keeps numeric periods like "1980. urtean" intact).
|
| 262 |
+
# This turns "... ? Ni ..." into two lines at the sentence boundary.
|
| 263 |
+
split_line = re.sub(r"(?<=[\?\!\.])\s+(?=[A-ZÁÉÍÓÚÜÑ])", "\n", final_line)
|
| 264 |
+
per_line_outputs.append(split_line)
|
| 265 |
+
|
| 266 |
+
return "\n".join(per_line_outputs)
|
| 267 |
+
|
| 268 |
+
except Exception as e:
|
| 269 |
+
error_msg = f"Error in phoneme extraction: {str(e)}"
|
| 270 |
+
self.logger.error(error_msg)
|
| 271 |
+
return ""
|
| 272 |
+
|
| 273 |
+
def _build_normalization_command(self) -> str:
|
| 274 |
+
"""Build the command string for normalization."""
|
| 275 |
+
modulo_path = self._get_file_path() / self.path_modulo1y2
|
| 276 |
+
dict_path = self._get_file_path() / self.path_dicts
|
| 277 |
+
dict_file = f"{self.language}_dicc"
|
| 278 |
+
return f'{modulo_path} -TxtMode=Word -Lang={self.language} -HDic={dict_path/dict_file}'
|
| 279 |
+
|
| 280 |
+
def _build_phoneme_extraction_command(self) -> str:
|
| 281 |
+
"""Build the command string for phoneme extraction."""
|
| 282 |
+
modulo_path = self._get_file_path() / self.path_modulo1y2
|
| 283 |
+
dict_path = self._get_file_path() / self.path_dicts
|
| 284 |
+
dict_file = f"{self.language}_dicc"
|
| 285 |
+
return f'{modulo_path} -Lang={self.language} -HDic={dict_path/dict_file}'
|
| 286 |
+
|
| 287 |
+
def _get_file_path(self) -> Path:
|
| 288 |
+
return Path(__file__).parent
|
| 289 |
+
|
| 290 |
+
def _validate_paths(self) -> None:
|
| 291 |
+
"""Validate paths with enhanced error reporting."""
|
| 292 |
+
try:
|
| 293 |
+
if not self.path_modulo1y2.exists():
|
| 294 |
+
raise PhonemizerError(f"Modulo1y2 executable not found at: {self.path_modulo1y2}")
|
| 295 |
+
if not self.path_dicts.exists():
|
| 296 |
+
raise PhonemizerError(f"Dictionary directory not found at: {self.path_dicts}")
|
| 297 |
+
|
| 298 |
+
# Check for both possible dictionary files
|
| 299 |
+
dict_file = self.path_dicts / f"{self.language}_dicc"
|
| 300 |
+
if not dict_file.exists():
|
| 301 |
+
# Try with .dic extension as fallback
|
| 302 |
+
dict_file_alt = self.path_dicts / f"{self.language}_dicc.dic"
|
| 303 |
+
if not dict_file_alt.exists():
|
| 304 |
+
raise PhonemizerError(f"Dictionary file not found at either {dict_file} or {dict_file_alt}")
|
| 305 |
+
|
| 306 |
+
except Exception as e:
|
| 307 |
+
self.logger.error(f"Path validation error: {str(e)}")
|
| 308 |
+
raise
|
| 309 |
+
|
| 310 |
+
def _transform_multichar_phonemes(self, phoneme_sequence: str) -> str:
|
| 311 |
+
"""
|
| 312 |
+
Transform multicharacter IPA phonemes to single characters using the MULTICHAR_TO_SINGLECHAR mapping.
|
| 313 |
+
|
| 314 |
+
Args:
|
| 315 |
+
phoneme_sequence (str): A string containing phonemes separated by spaces
|
| 316 |
+
|
| 317 |
+
Returns:
|
| 318 |
+
str: The transformed phoneme sequence with multicharacter phonemes replaced by single characters
|
| 319 |
+
"""
|
| 320 |
+
# Split the sequence into individual phonemes
|
| 321 |
+
phonemes = phoneme_sequence.split()
|
| 322 |
+
transformed_phonemes = []
|
| 323 |
+
|
| 324 |
+
for phoneme in phonemes:
|
| 325 |
+
# Check if the phoneme exists in our mapping
|
| 326 |
+
if phoneme in MULTICHAR_TO_SINGLECHAR:
|
| 327 |
+
transformed_phonemes.append(MULTICHAR_TO_SINGLECHAR[phoneme])
|
| 328 |
+
else:
|
| 329 |
+
transformed_phonemes.append(phoneme)
|
| 330 |
+
|
| 331 |
+
return " ".join(transformed_phonemes)
|
| 332 |
+
|
| 333 |
+
|
gradio_phonemizer.py
ADDED
|
@@ -0,0 +1,506 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
import tempfile
|
| 3 |
+
import base64
|
| 4 |
+
import re
|
| 5 |
+
import socket
|
| 6 |
+
import os
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
from typing import Optional, Tuple
|
| 9 |
+
import threading
|
| 10 |
+
import time
|
| 11 |
+
import atexit
|
| 12 |
+
|
| 13 |
+
# Output cleanup configuration
|
| 14 |
+
OUTPUTS_DIR = Path(__file__).parent / 'outputs'
|
| 15 |
+
OUTPUT_CLEANUP_TTL = 24 * 3600 # seconds, default 24 hours
|
| 16 |
+
OUTPUT_CLEANUP_MAX_FILES = 500 # keep at most this many files
|
| 17 |
+
OUTPUT_CLEANUP_INTERVAL = 60 * 60 # in seconds, run cleanup every hour
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def _cleanup_outputs(out_dir: Path = None, max_files: int = None, ttl: int = None):
|
| 21 |
+
"""Delete old files in `out_dir` older than `ttl` seconds and keep at most
|
| 22 |
+
`max_files` newest files. If parameters are None, use module defaults."""
|
| 23 |
+
if out_dir is None:
|
| 24 |
+
out_dir = OUTPUTS_DIR
|
| 25 |
+
if not out_dir.exists():
|
| 26 |
+
return
|
| 27 |
+
if max_files is None:
|
| 28 |
+
max_files = OUTPUT_CLEANUP_MAX_FILES
|
| 29 |
+
if ttl is None:
|
| 30 |
+
ttl = OUTPUT_CLEANUP_TTL
|
| 31 |
+
|
| 32 |
+
now = time.time()
|
| 33 |
+
files = [p for p in out_dir.iterdir() if p.is_file()]
|
| 34 |
+
# Remove files older than ttl
|
| 35 |
+
for p in files:
|
| 36 |
+
try:
|
| 37 |
+
if now - p.stat().st_mtime > ttl:
|
| 38 |
+
p.unlink()
|
| 39 |
+
except Exception:
|
| 40 |
+
pass
|
| 41 |
+
|
| 42 |
+
# Re-list and trim to max_files
|
| 43 |
+
files = sorted([p for p in out_dir.iterdir() if p.is_file()], key=lambda p: p.stat().st_mtime, reverse=True)
|
| 44 |
+
if len(files) > max_files:
|
| 45 |
+
for p in files[max_files:]:
|
| 46 |
+
try:
|
| 47 |
+
p.unlink()
|
| 48 |
+
except Exception:
|
| 49 |
+
pass
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
def _cleanup_all_on_exit():
|
| 53 |
+
"""Remove all files in outputs folder on process exit."""
|
| 54 |
+
try:
|
| 55 |
+
if OUTPUTS_DIR.exists():
|
| 56 |
+
for p in OUTPUTS_DIR.iterdir():
|
| 57 |
+
try:
|
| 58 |
+
if p.is_file():
|
| 59 |
+
p.unlink()
|
| 60 |
+
except Exception:
|
| 61 |
+
pass
|
| 62 |
+
except Exception:
|
| 63 |
+
pass
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
def _start_periodic_cleanup():
|
| 67 |
+
def _worker():
|
| 68 |
+
while True:
|
| 69 |
+
try:
|
| 70 |
+
_cleanup_outputs(OUTPUTS_DIR)
|
| 71 |
+
except Exception:
|
| 72 |
+
pass
|
| 73 |
+
time.sleep(OUTPUT_CLEANUP_INTERVAL)
|
| 74 |
+
|
| 75 |
+
t = threading.Thread(target=_worker, daemon=True, name='outputs-cleaner')
|
| 76 |
+
t.start()
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
# Ensure outputs dir exists and start background cleaner; register atexit
|
| 80 |
+
OUTPUTS_DIR.mkdir(parents=True, exist_ok=True)
|
| 81 |
+
_start_periodic_cleanup()
|
| 82 |
+
atexit.register(_cleanup_all_on_exit)
|
| 83 |
+
from eu_phonemizer_v2 import Phonemizer, PhonemizerError
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
def _read_uploaded_file(file_obj) -> str:
|
| 87 |
+
if not file_obj:
|
| 88 |
+
return ""
|
| 89 |
+
# gradio will provide a temporary file path
|
| 90 |
+
p = Path(file_obj.name) if hasattr(file_obj, "name") else Path(file_obj)
|
| 91 |
+
try:
|
| 92 |
+
return p.read_text(encoding='utf-8')
|
| 93 |
+
except Exception:
|
| 94 |
+
return p.read_text(encoding='ISO-8859-15')
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
def process(text: str,
|
| 98 |
+
uploaded_file,
|
| 99 |
+
language: str,
|
| 100 |
+
symbol: str,
|
| 101 |
+
separate_phonemes: bool) -> Tuple[str, Optional[str], str, Optional[str]]:
|
| 102 |
+
"""Process either text input or uploaded txt file and return (text_output, download_file_path)
|
| 103 |
+
|
| 104 |
+
If the user uploaded a file, the function will return the path to a tmp file
|
| 105 |
+
suitable for download as the second return value and an empty text output.
|
| 106 |
+
If the user provided text in the box, the function will return the phonemes
|
| 107 |
+
as text and also a downloadable txt file containing the same output.
|
| 108 |
+
"""
|
| 109 |
+
# Prefer uploaded file if present
|
| 110 |
+
source_text = ""
|
| 111 |
+
is_file_input = False
|
| 112 |
+
if uploaded_file:
|
| 113 |
+
source_text = _read_uploaded_file(uploaded_file)
|
| 114 |
+
is_file_input = True
|
| 115 |
+
else:
|
| 116 |
+
source_text = text or ""
|
| 117 |
+
|
| 118 |
+
# Try to instantiate Phonemizer using repo-local modulo1y2 and dicts
|
| 119 |
+
try:
|
| 120 |
+
phon = Phonemizer(language=language, symbol=symbol)
|
| 121 |
+
except PhonemizerError as e:
|
| 122 |
+
if language == 'eu':
|
| 123 |
+
err = f"Ezin izan da fonemizadorea hasi: {e}\nEgiaztatu 'modulo1y2' eta 'dict' karpetak."
|
| 124 |
+
else:
|
| 125 |
+
err = f"No se pudo inicializar el fonemizador: {e}\nComprueba las carpetas 'modulo1y2' y 'dict'."
|
| 126 |
+
# Return 6 outputs matching the UI: result text, file, normalized text, norm file, ph_path, norm_path
|
| 127 |
+
return err, None, "", None, "", ""
|
| 128 |
+
except Exception as e:
|
| 129 |
+
if language == 'eu':
|
| 130 |
+
return f"Hasieratze errore ezezaguna: {e}", None, "", None, "", ""
|
| 131 |
+
return f"Error inesperado al inicializar: {e}", None
|
| 132 |
+
|
| 133 |
+
|
| 134 |
+
# Normalize then get phonemes. Run normalization per original input line so the
|
| 135 |
+
# external normalizer doesn't insert extra newlines across sentences and
|
| 136 |
+
# we preserve the user's original line boundaries.
|
| 137 |
+
try:
|
| 138 |
+
lines = source_text.split('\n')
|
| 139 |
+
normalized_lines = []
|
| 140 |
+
for ln in lines:
|
| 141 |
+
if not ln.strip():
|
| 142 |
+
normalized_lines.append('')
|
| 143 |
+
else:
|
| 144 |
+
# normalize each line independently, collapse any internal newlines
|
| 145 |
+
# produced by the external normalizer, collapse multiple whitespace
|
| 146 |
+
# (this avoids producing double spaces when the normalizer inserts
|
| 147 |
+
# a '\n' while the original text already had a space), and strip
|
| 148 |
+
norm_line = phon.normalize(ln)
|
| 149 |
+
norm_line = norm_line.replace('\n', ' ')
|
| 150 |
+
norm_line = re.sub(r"\s+", ' ', norm_line).strip()
|
| 151 |
+
normalized_lines.append(norm_line)
|
| 152 |
+
normalized = '\n'.join(normalized_lines)
|
| 153 |
+
|
| 154 |
+
phonemes = phon.getPhonemes(normalized, separate_phonemes=separate_phonemes)
|
| 155 |
+
# Defensive cleanup: if any '|' separators remain, replace them with single spaces
|
| 156 |
+
if isinstance(phonemes, str) and '|' in phonemes:
|
| 157 |
+
phonemes = re.sub(r"\s*\|\s*", " ", phonemes)
|
| 158 |
+
except PhonemizerError as e:
|
| 159 |
+
if language == 'eu':
|
| 160 |
+
msg = f"Fonemizazio errorea: {e}"
|
| 161 |
+
else:
|
| 162 |
+
msg = f"Error del fonemizador: {e}"
|
| 163 |
+
return msg, None, "", None, "", ""
|
| 164 |
+
except Exception as e:
|
| 165 |
+
if language == 'eu':
|
| 166 |
+
msg = f"Errore ezezaguna prozesatzean: {e}"
|
| 167 |
+
else:
|
| 168 |
+
msg = f"Error inesperado al procesar: {e}"
|
| 169 |
+
return msg, None, "", None, "", ""
|
| 170 |
+
|
| 171 |
+
# Create persistent downloadable files under outputs/ so the browser can reliably
|
| 172 |
+
# download them using Gradio's `gr.File` component (avoid ephemeral tmp files
|
| 173 |
+
# that some browsers may not fetch correctly).
|
| 174 |
+
out_dir = Path(__file__).parent / 'outputs'
|
| 175 |
+
out_dir.mkdir(parents=True, exist_ok=True)
|
| 176 |
+
from datetime import datetime
|
| 177 |
+
ts = datetime.now().strftime('%Y%m%d_%H%M%S')
|
| 178 |
+
ph_file = out_dir / f'phonemes_{ts}.txt'
|
| 179 |
+
norm_file = out_dir / f'normalized_{ts}.txt'
|
| 180 |
+
ph_file.write_text(phonemes, encoding='utf-8')
|
| 181 |
+
norm_file.write_text(normalized, encoding='utf-8')
|
| 182 |
+
|
| 183 |
+
# Cleanup old files opportunistically after creating new ones
|
| 184 |
+
try:
|
| 185 |
+
_cleanup_outputs(out_dir)
|
| 186 |
+
except Exception:
|
| 187 |
+
pass
|
| 188 |
+
|
| 189 |
+
# Return phonemes and normalized text in all cases (text or uploaded file)
|
| 190 |
+
# so users who upload a .txt can see the processed text inline and download it.
|
| 191 |
+
return phonemes, str(ph_file), normalized, str(norm_file), str(ph_file), str(norm_file)
|
| 192 |
+
|
| 193 |
+
|
| 194 |
+
def download_from_text(text: str) -> Optional[str]:
|
| 195 |
+
"""Create a temporary .txt file from the given text and return its path for download."""
|
| 196 |
+
if not text:
|
| 197 |
+
return None
|
| 198 |
+
# Save into a persistent outputs/ directory with a readable timestamped filename
|
| 199 |
+
out_dir = Path(__file__).parent / 'outputs'
|
| 200 |
+
out_dir.mkdir(parents=True, exist_ok=True)
|
| 201 |
+
from datetime import datetime
|
| 202 |
+
ts = datetime.now().strftime('%Y%m%d_%H%M%S')
|
| 203 |
+
filename = f'phonemes_{ts}.txt'
|
| 204 |
+
out_path = out_dir / filename
|
| 205 |
+
out_path.write_text(text, encoding='utf-8')
|
| 206 |
+
# Return the path string so Gradio's File component can serve it
|
| 207 |
+
return str(out_path)
|
| 208 |
+
|
| 209 |
+
|
| 210 |
+
def build_interface():
|
| 211 |
+
with gr.Blocks(title="Eu/Es Phonemizer") as demo:
|
| 212 |
+
# Simple header (image removed per user preference)
|
| 213 |
+
header = gr.Markdown("# Fonemizadorea — Euskara (eu) eta Gaztelania (es)")
|
| 214 |
+
# Style the Submit button to be orange for better visibility (higher specificity)
|
| 215 |
+
gr.HTML("""
|
| 216 |
+
<style>
|
| 217 |
+
/* Stronger selectors to override theme/defaults */
|
| 218 |
+
#submit_btn, #submit_btn button, button#submit_btn, .gradio-container #submit_btn button {
|
| 219 |
+
background-color: #ff8c00 !important;
|
| 220 |
+
color: white !important;
|
| 221 |
+
border-radius: 6px !important;
|
| 222 |
+
padding: 6px 12px !important;
|
| 223 |
+
border: none !important;
|
| 224 |
+
}
|
| 225 |
+
#submit_btn:hover, #submit_btn button:hover, button#submit_btn:hover {
|
| 226 |
+
background-color: #ff7a00 !important;
|
| 227 |
+
}
|
| 228 |
+
/* Don't force download buttons to orange */
|
| 229 |
+
#download_ph_btn button, #download_norm_btn button { background-color: transparent !important; }
|
| 230 |
+
|
| 231 |
+
/* Compact upload file box */
|
| 232 |
+
#upload_file { max-width: 160px !important; }
|
| 233 |
+
#upload_file .gr-file {
|
| 234 |
+
height: 32px !important;
|
| 235 |
+
padding: 2px 6px !important;
|
| 236 |
+
font-size: 0.9rem !important;
|
| 237 |
+
line-height: 1 !important;
|
| 238 |
+
}
|
| 239 |
+
#upload_file .gr-file input[type=file] { height: 32px !important; }
|
| 240 |
+
|
| 241 |
+
/* Make textareas vertically resizable and more roomy */
|
| 242 |
+
#input_text textarea, #normalized_box textarea, #result_box textarea {
|
| 243 |
+
resize: vertical !important;
|
| 244 |
+
min-height: 120px !important;
|
| 245 |
+
max-height: 800px !important;
|
| 246 |
+
width: 100% !important;
|
| 247 |
+
box-sizing: border-box !important;
|
| 248 |
+
}
|
| 249 |
+
|
| 250 |
+
/* Center container and add padding for a cleaner look */
|
| 251 |
+
.gradio-container { max-width: 1100px; margin: 12px auto !important; padding: 8px !important; }
|
| 252 |
+
/* Fix controls column width so changing labels doesn't reflow layout.
|
| 253 |
+
Use a slightly smaller fixed width so the upload column sits closer. */
|
| 254 |
+
/* Make controls column appear taller by increasing internal spacing
|
| 255 |
+
between control rows rather than forcing the whole column height.
|
| 256 |
+
This avoids adding extra vertical gap between adjacent columns
|
| 257 |
+
(upload box / buttons). */
|
| 258 |
+
#controls_col { min-width: 220px; max-width: 260px; flex: 0 0 240px; align-self: flex-start; padding-top: 6px; padding-bottom: 6px; box-sizing: border-box; }
|
| 259 |
+
/* Increase the gap between controls so the column looks taller without
|
| 260 |
+
enlarging its outer box or shifting neighboring columns. */
|
| 261 |
+
#controls_col .gr-row { gap: 12px; row-gap: 12px; }
|
| 262 |
+
#controls_col .gr-label, #controls_col label { line-height: 1.4; }
|
| 263 |
+
|
| 264 |
+
/* Ensure the upload column aligns to the top of the row so it doesn't
|
| 265 |
+
get vertically centered when other columns grow; keep the upload box
|
| 266 |
+
compact but aligned with the controls stack. */
|
| 267 |
+
#upload_col { min-height: 110px; display: flex !important; align-items: flex-start !important; justify-content: center !important; align-self: flex-start; padding-top: 6px; }
|
| 268 |
+
/* Ensure labels wrap instead of expanding layout */
|
| 269 |
+
#controls_col .gr-label, #controls_col label { white-space: normal !important; word-break: break-word !important; }
|
| 270 |
+
/* Keep action buttons a fixed size so they don't push layout when language changes */
|
| 271 |
+
#submit_btn, #clear_btn { }
|
| 272 |
+
/* Enforce pixel-perfect identical size and box-model for both action buttons */
|
| 273 |
+
#submit_btn button, #clear_btn button {
|
| 274 |
+
width: 120px !important;
|
| 275 |
+
height: 40px !important;
|
| 276 |
+
min-height: 40px !important;
|
| 277 |
+
box-sizing: border-box !important;
|
| 278 |
+
padding: 6px 12px !important;
|
| 279 |
+
display: inline-flex !important;
|
| 280 |
+
align-items: center !important;
|
| 281 |
+
justify-content: center !important;
|
| 282 |
+
font-size: 14px !important;
|
| 283 |
+
line-height: 1 !important;
|
| 284 |
+
border-radius: 6px !important;
|
| 285 |
+
border: none !important;
|
| 286 |
+
margin: 0 !important;
|
| 287 |
+
vertical-align: middle !important;
|
| 288 |
+
font-family: inherit !important;
|
| 289 |
+
background-clip: padding-box !important;
|
| 290 |
+
}
|
| 291 |
+
/* Make main column flexible and allow it to shrink without pushing controls */
|
| 292 |
+
#main_col { flex: 1 1 auto; min-width: 0; }
|
| 293 |
+
/* Pull the upload box a bit left to close the gap if needed */
|
| 294 |
+
#upload_file { margin-left: -6px !important; }
|
| 295 |
+
/* Force a compact file control so it doesn't become taller than the
|
| 296 |
+
nearby control stack. */
|
| 297 |
+
/* Keep the file control compact so it doesn't exceed nearby controls */
|
| 298 |
+
#upload_file .gr-file { max-height: 44px !important; height: 36px !important; box-sizing: border-box !important; }
|
| 299 |
+
/* Position decorative image absolutely so it doesn't force wrapping.
|
| 300 |
+
Reserve space on the right of #top_row to avoid overlap. */
|
| 301 |
+
#top_row { position: relative !important; padding-right: 520px !important; }
|
| 302 |
+
#img_col { position: absolute !important; right: 8px !important; top: 6px !important; width: 480px !important; max-width: 100% !important; box-sizing: border-box !important; }
|
| 303 |
+
#download_img img { width: 480px !important; max-width: 100% !important; height: auto !important; display:block !important; pointer-events: none !important; user-select: none !important; }
|
| 304 |
+
/* Ensure action buttons share the same height and vertical alignment.
|
| 305 |
+
Consolidated to authoritative sizing above to avoid conflicting rules. */
|
| 306 |
+
/* (Sizing enforced in the main button block above.) */
|
| 307 |
+
</style>
|
| 308 |
+
""")
|
| 309 |
+
|
| 310 |
+
with gr.Row():
|
| 311 |
+
# Left controls column
|
| 312 |
+
with gr.Column(scale=1, elem_id='controls_col'):
|
| 313 |
+
language = gr.Radio(choices=['eu', 'es'], value='eu', label='Hizkuntza / Idioma')
|
| 314 |
+
symbol = gr.Radio(choices=['sampa', 'ipa'], value='sampa', label='Sinboloak / Símbolos (Irteera)')
|
| 315 |
+
# Default checked and Basque-only label; will switch to Spanish when language changes
|
| 316 |
+
separate_phonemes = gr.Checkbox(label='Banatu fonemak espazioz', value=True)
|
| 317 |
+
|
| 318 |
+
# Small column to the right of controls that holds the upload box
|
| 319 |
+
with gr.Column(scale=1, elem_id='upload_col'):
|
| 320 |
+
upload = gr.File(file_types=['.txt'], label='Igo .txt fitxategia / Subir archivo .txt', elem_id='upload_file')
|
| 321 |
+
|
| 322 |
+
# Decorative/download image column to the right of the upload box.
|
| 323 |
+
# We'll embed the local `img/download.png` as a base64 <img> inside
|
| 324 |
+
# gr.HTML so Gradio doesn't add overlay controls (download/enlarge).
|
| 325 |
+
# Use an integer `scale` to avoid Gradio warnings; keep the image
|
| 326 |
+
# column compact by using a small integer scale and reserving width
|
| 327 |
+
# via CSS (#img_col). Changing to `scale=1` prevents the float-scale
|
| 328 |
+
# warning while preserving layout.
|
| 329 |
+
with gr.Column(scale=1, elem_id='img_col'):
|
| 330 |
+
img_path = Path(__file__).parent / 'img' / 'download.png'
|
| 331 |
+
_img_data_uri = ''
|
| 332 |
+
try:
|
| 333 |
+
with open(img_path, 'rb') as _img_f:
|
| 334 |
+
_img_b64 = base64.b64encode(_img_f.read()).decode('ascii')
|
| 335 |
+
_img_data_uri = f"data:image/png;base64,{_img_b64}"
|
| 336 |
+
except Exception:
|
| 337 |
+
_img_data_uri = ''
|
| 338 |
+
|
| 339 |
+
# Render HTML with a non-interactive <img>; let CSS control width
|
| 340 |
+
download_img = gr.HTML(f'<img src="{_img_data_uri}" alt="download" style="height:auto;pointer-events:none;user-select:none;">', elem_id='download_img')
|
| 341 |
+
|
| 342 |
+
# Main column on the right: buttons above the wide input textbox
|
| 343 |
+
with gr.Column(scale=3, elem_id='main_col'):
|
| 344 |
+
with gr.Row():
|
| 345 |
+
submit_btn = gr.Button('Submit', elem_id='submit_btn')
|
| 346 |
+
clear_btn = gr.Button('Clear', elem_id='clear_btn')
|
| 347 |
+
with gr.Row():
|
| 348 |
+
with gr.Column(scale=5):
|
| 349 |
+
input_text = gr.Textbox(lines=12, elem_id='input_text', label="Sarrera testua (uzteko hutsik .txt fitxategia igo behar baduzu) / Texto de entrada (dejar vacío si subes un .txt)")
|
| 350 |
+
# Outputs area: normalized text and phoneme output side-by-side
|
| 351 |
+
with gr.Row():
|
| 352 |
+
with gr.Column(scale=1):
|
| 353 |
+
normalized_box = gr.Textbox(lines=12, elem_id='normalized_box', label='Normalizatua', interactive=False)
|
| 354 |
+
download_norm_btn = gr.DownloadButton('Deskargatu normalizatua', elem_id='download_norm_btn')
|
| 355 |
+
|
| 356 |
+
with gr.Column(scale=1):
|
| 357 |
+
result_box = gr.Textbox(lines=12, elem_id='result_box', label='Fonemak', interactive=False)
|
| 358 |
+
download_ph_btn = gr.DownloadButton('Deskargatu fonemak', elem_id='download_ph_btn')
|
| 359 |
+
|
| 360 |
+
# hidden boxes to hold latest generated file paths so download buttons can trigger
|
| 361 |
+
ph_path_box = gr.Textbox(visible=False, elem_id='ph_path_box')
|
| 362 |
+
norm_path_box = gr.Textbox(visible=False, elem_id='norm_path_box')
|
| 363 |
+
|
| 364 |
+
def _on_click(input_text, upload, language, symbol, separate_phonemes):
|
| 365 |
+
return process(input_text, upload, language, symbol, separate_phonemes)
|
| 366 |
+
|
| 367 |
+
# When a user uploads a .txt file, read its contents and populate the
|
| 368 |
+
# `input_text` box so they can review or edit before sending.
|
| 369 |
+
def _on_upload(uploaded_file):
|
| 370 |
+
if not uploaded_file:
|
| 371 |
+
return gr.update(value="")
|
| 372 |
+
try:
|
| 373 |
+
content = _read_uploaded_file(uploaded_file)
|
| 374 |
+
except Exception:
|
| 375 |
+
content = ''
|
| 376 |
+
return gr.update(value=content)
|
| 377 |
+
|
| 378 |
+
def _clear_all():
|
| 379 |
+
# Clear input, outputs and any hidden path boxes so UI resets
|
| 380 |
+
return (
|
| 381 |
+
gr.update(value=""), # input_text
|
| 382 |
+
gr.update(value=None), # upload (clear any uploaded file)
|
| 383 |
+
gr.update(value=""), # normalized_box
|
| 384 |
+
gr.update(value=""), # result_box
|
| 385 |
+
gr.update(value=None), # download_ph_btn
|
| 386 |
+
gr.update(value=None), # download_norm_btn
|
| 387 |
+
gr.update(value=""), # ph_path_box
|
| 388 |
+
gr.update(value="") # norm_path_box
|
| 389 |
+
)
|
| 390 |
+
|
| 391 |
+
# Re-run processing automatically when symbol or separation options change
|
| 392 |
+
# so users don't have to press the Process button again.
|
| 393 |
+
symbol.change(fn=_on_click, inputs=[input_text, upload, language, symbol, separate_phonemes], outputs=[result_box, download_ph_btn, normalized_box, download_norm_btn, ph_path_box, norm_path_box])
|
| 394 |
+
separate_phonemes.change(fn=_on_click, inputs=[input_text, upload, language, symbol, separate_phonemes], outputs=[result_box, download_ph_btn, normalized_box, download_norm_btn, ph_path_box, norm_path_box])
|
| 395 |
+
|
| 396 |
+
# Populate the input textbox when a file is uploaded so users can see/edit it
|
| 397 |
+
# before sending. Does not auto-run processing.
|
| 398 |
+
upload.change(fn=_on_upload, inputs=[upload], outputs=[input_text])
|
| 399 |
+
|
| 400 |
+
# Update UI texts when language selection changes
|
| 401 |
+
def _update_language_ui(lang):
|
| 402 |
+
# Note: we intentionally do NOT update the header here to avoid
|
| 403 |
+
# large DOM changes that reflow the layout when switching languages.
|
| 404 |
+
if lang == 'eu':
|
| 405 |
+
return (
|
| 406 |
+
gr.update(label='Sinboloak (Irteera)'), # symbol
|
| 407 |
+
gr.update(label='Banatu fonemak espazioz'), # separate_phonemes
|
| 408 |
+
# keep input/upload labels stable (do not update them to avoid reflow)
|
| 409 |
+
gr.update(label='Fonemak'),
|
| 410 |
+
gr.update(label='Deskargatu irteera (.txt)'),
|
| 411 |
+
gr.update(label='Normalizatua'),
|
| 412 |
+
gr.update(label='Deskargatu normalizatua (.txt)'),
|
| 413 |
+
gr.update(value=''),
|
| 414 |
+
gr.update(value='')
|
| 415 |
+
)
|
| 416 |
+
else:
|
| 417 |
+
return (
|
| 418 |
+
gr.update(label='Símbolos (Salida)'),
|
| 419 |
+
gr.update(label='Separar fonemas con espacios'),
|
| 420 |
+
# keep input/upload labels stable (do not update them to avoid reflow)
|
| 421 |
+
gr.update(label='Fonemas'),
|
| 422 |
+
gr.update(label='Descargar salida (.txt)'),
|
| 423 |
+
gr.update(label='Normalizado'),
|
| 424 |
+
gr.update(label='Descargar normalizado (.txt)'),
|
| 425 |
+
gr.update(value=''),
|
| 426 |
+
gr.update(value='')
|
| 427 |
+
)
|
| 428 |
+
|
| 429 |
+
# Note: don't include `header`, `input_text`, upload or action buttons
|
| 430 |
+
# in outputs to avoid reflow when changing language. Only update the
|
| 431 |
+
# smaller output labels and hidden path boxes which the function
|
| 432 |
+
# actually returns (8 outputs).
|
| 433 |
+
language.change(fn=_update_language_ui, inputs=[language], outputs=[symbol, separate_phonemes, result_box, download_ph_btn, normalized_box, download_norm_btn, ph_path_box, norm_path_box])
|
| 434 |
+
|
| 435 |
+
submit_btn.click(fn=_on_click, inputs=[input_text, upload, language, symbol, separate_phonemes], outputs=[result_box, download_ph_btn, normalized_box, download_norm_btn, ph_path_box, norm_path_box])
|
| 436 |
+
clear_btn.click(fn=_clear_all, inputs=[], outputs=[input_text, upload, normalized_box, result_box, download_ph_btn, download_norm_btn, ph_path_box, norm_path_box])
|
| 437 |
+
|
| 438 |
+
# Note: download buttons are created in the outputs area above.
|
| 439 |
+
|
| 440 |
+
def _download_file(path: str):
|
| 441 |
+
# Keep a simple path-return helper for backwards compatibility
|
| 442 |
+
if not path:
|
| 443 |
+
return None
|
| 444 |
+
p = Path(path)
|
| 445 |
+
if not p.exists():
|
| 446 |
+
return None
|
| 447 |
+
return str(p)
|
| 448 |
+
|
| 449 |
+
# Provide download callbacks that generate the outputs on-demand so a
|
| 450 |
+
# single click will both create and return the file path to the browser.
|
| 451 |
+
def _download_ph_from_inputs(input_text, upload, language, symbol, separate_phonemes):
|
| 452 |
+
# Call the same `process()` function to ensure files are generated
|
| 453 |
+
res = process(input_text, upload, language, symbol, separate_phonemes)
|
| 454 |
+
# process() returns (result_text, ph_path, normalized_text, norm_path, ph_path, norm_path)
|
| 455 |
+
if isinstance(res, tuple) and len(res) >= 2:
|
| 456 |
+
return _download_file(res[1])
|
| 457 |
+
return None
|
| 458 |
+
|
| 459 |
+
def _download_norm_from_inputs(input_text, upload, language, symbol, separate_phonemes):
|
| 460 |
+
res = process(input_text, upload, language, symbol, separate_phonemes)
|
| 461 |
+
if isinstance(res, tuple) and len(res) >= 4:
|
| 462 |
+
return _download_file(res[3])
|
| 463 |
+
return None
|
| 464 |
+
|
| 465 |
+
# Wire the DownloadButtons to generate-and-return callbacks so a single
|
| 466 |
+
# click performs generation and triggers immediate download.
|
| 467 |
+
download_ph_btn.click(fn=_download_ph_from_inputs, inputs=[input_text, upload, language, symbol, separate_phonemes], outputs=[download_ph_btn])
|
| 468 |
+
download_norm_btn.click(fn=_download_norm_from_inputs, inputs=[input_text, upload, language, symbol, separate_phonemes], outputs=[download_norm_btn])
|
| 469 |
+
|
| 470 |
+
return demo
|
| 471 |
+
|
| 472 |
+
|
| 473 |
+
def _find_free_port(start: int = 7860, end: int = 7870) -> Optional[int]:
|
| 474 |
+
"""Find a free TCP port in the given inclusive range."""
|
| 475 |
+
for port in range(start, end + 1):
|
| 476 |
+
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
|
| 477 |
+
try:
|
| 478 |
+
s.bind(('0.0.0.0', port))
|
| 479 |
+
return port
|
| 480 |
+
except OSError:
|
| 481 |
+
continue
|
| 482 |
+
return None
|
| 483 |
+
|
| 484 |
+
|
| 485 |
+
if __name__ == '__main__':
|
| 486 |
+
app = build_interface()
|
| 487 |
+
|
| 488 |
+
# Allow explicit override via environment variable
|
| 489 |
+
env_port = os.environ.get('GRADIO_SERVER_PORT')
|
| 490 |
+
if env_port:
|
| 491 |
+
try:
|
| 492 |
+
port = int(env_port)
|
| 493 |
+
except ValueError:
|
| 494 |
+
print(f"Invalid GRADIO_SERVER_PORT='{env_port}', falling back to automatic selection.")
|
| 495 |
+
port = None
|
| 496 |
+
else:
|
| 497 |
+
port = None
|
| 498 |
+
|
| 499 |
+
if port is None:
|
| 500 |
+
port = _find_free_port(7860, 7880)
|
| 501 |
+
|
| 502 |
+
if port is None:
|
| 503 |
+
raise OSError("No free port found in range 7860-7880. Set GRADIO_SERVER_PORT to a free port.")
|
| 504 |
+
|
| 505 |
+
print(f"Launching Gradio on port {port} (server_name=0.0.0.0)")
|
| 506 |
+
app.launch(server_name='0.0.0.0', server_port=port)
|
img/download.png
ADDED
|
modulo1y2/modulo1y2
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c122bd6197e5e360d534957322f8d98a06cb3bcb4d412ee9978e891ae1b43e8a
|
| 3 |
+
size 2245952
|
prepare.sh
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
set -euo pipefail
|
| 3 |
+
|
| 4 |
+
echo "Preparing phonemizer workspace..."
|
| 5 |
+
|
| 6 |
+
# Make sure executable bit is set if present
|
| 7 |
+
if [ -f "modulo1y2/modulo1y2" ]; then
|
| 8 |
+
chmod +x modulo1y2/modulo1y2 || true
|
| 9 |
+
echo "Ensured modulo1y2/modulo1y2 is executable."
|
| 10 |
+
else
|
| 11 |
+
echo "Warning: modulo1y2/modulo1y2 not found. If you plan to ship the binary, add it to the repo."
|
| 12 |
+
fi
|
| 13 |
+
|
| 14 |
+
echo "Preparation complete. To run locally:
|
| 15 |
+
python3 -m venv .venv
|
| 16 |
+
source .venv/bin/activate
|
| 17 |
+
pip install -r requirements.txt
|
| 18 |
+
python app.py
|
| 19 |
+
"
|
push_to_hf.sh
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
set -euo pipefail
|
| 3 |
+
|
| 4 |
+
# Safe push script for Hugging Face Spaces using an env var HF_TOKEN.
|
| 5 |
+
# Usage:
|
| 6 |
+
# export HF_TOKEN="<your_token>"
|
| 7 |
+
# cd /path/to/tmp_space
|
| 8 |
+
# chmod +x push_to_hf.sh
|
| 9 |
+
# ./push_to_hf.sh
|
| 10 |
+
|
| 11 |
+
REPO_DIR="$(cd "$(dirname "$0")" && pwd)"
|
| 12 |
+
cd "$REPO_DIR"
|
| 13 |
+
|
| 14 |
+
if [ -z "${HF_TOKEN:-}" ]; then
|
| 15 |
+
echo "ERROR: HF_TOKEN is not set. Run: export HF_TOKEN=\"<your_token>\""
|
| 16 |
+
exit 1
|
| 17 |
+
fi
|
| 18 |
+
|
| 19 |
+
# Show current branch and changes
|
| 20 |
+
git --no-pager status --porcelain --branch
|
| 21 |
+
|
| 22 |
+
# Push using http.extraHeader so token is not stored in git config or logs
|
| 23 |
+
echo "Pushing to origin (authenticated via HF_TOKEN) ..."
|
| 24 |
+
GIT_HTTP_EXTRAHEADER="Authorization: Bearer $HF_TOKEN"
|
| 25 |
+
# Use git -c to pass header for single command
|
| 26 |
+
git -c http.extraHeader="Authorization: Bearer $HF_TOKEN" push origin HEAD:main
|
| 27 |
+
|
| 28 |
+
RET=$?
|
| 29 |
+
if [ $RET -eq 0 ]; then
|
| 30 |
+
echo "Push succeeded. Space should start building shortly on Hugging Face."
|
| 31 |
+
else
|
| 32 |
+
echo "Push failed with exit code $RET"
|
| 33 |
+
fi
|
| 34 |
+
|
| 35 |
+
exit $RET
|
requirements.txt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio>=3.0
|
| 2 |
+
nltk
|
| 3 |
+
huggingface-hub
|