alexwengg commited on
Commit
31648f9
Β·
verified Β·
1 Parent(s): fe9b550

Upload 4 files

Browse files
Files changed (2) hide show
  1. README.md +9 -6
  2. inference.py +1 -1
README.md CHANGED
@@ -56,8 +56,8 @@ pip install -e ".[audio]"
56
  ```python
57
  from scripts.inference import ParakeetCoreML
58
 
59
- # Load model
60
- model = ParakeetCoreML("./model")
61
 
62
  # Transcribe with TDT (higher quality)
63
  text = model.transcribe("audio.wav", mode="tdt")
@@ -72,10 +72,10 @@ print(text)
72
 
73
  ```bash
74
  # TDT decoding (default, higher quality)
75
- uv run scripts/inference.py --audio audio.wav --model-dir ./model
76
 
77
  # CTC decoding (faster, good for keyword spotting)
78
- uv run scripts/inference.py --audio audio.wav --model-dir ./model --mode ctc
79
  ```
80
 
81
  ## Model Conversion
@@ -98,14 +98,17 @@ This will:
98
  ## File Structure
99
 
100
  ```
101
- model/
102
  β”œβ”€β”€ Preprocessor.mlpackage # Audio β†’ Mel spectrogram
103
  β”œβ”€β”€ Encoder.mlpackage # Mel β†’ Encoder features
104
  β”œβ”€β”€ CTCHead.mlpackage # Encoder β†’ CTC log probs
105
  β”œβ”€β”€ Decoder.mlpackage # TDT prediction network
106
  β”œβ”€β”€ JointDecision.mlpackage # TDT joint network
107
  β”œβ”€β”€ vocab.json # Token vocabulary (1024 tokens)
108
- └── metadata.json # Model configuration
 
 
 
109
  ```
110
 
111
  ## Decoding Modes
 
56
  ```python
57
  from scripts.inference import ParakeetCoreML
58
 
59
+ # Load model (from current directory with .mlpackage files)
60
+ model = ParakeetCoreML(".")
61
 
62
  # Transcribe with TDT (higher quality)
63
  text = model.transcribe("audio.wav", mode="tdt")
 
72
 
73
  ```bash
74
  # TDT decoding (default, higher quality)
75
+ uv run scripts/inference.py --audio audio.wav
76
 
77
  # CTC decoding (faster, good for keyword spotting)
78
+ uv run scripts/inference.py --audio audio.wav --mode ctc
79
  ```
80
 
81
  ## Model Conversion
 
98
  ## File Structure
99
 
100
  ```
101
+ ./
102
  β”œβ”€β”€ Preprocessor.mlpackage # Audio β†’ Mel spectrogram
103
  β”œβ”€β”€ Encoder.mlpackage # Mel β†’ Encoder features
104
  β”œβ”€β”€ CTCHead.mlpackage # Encoder β†’ CTC log probs
105
  β”œβ”€β”€ Decoder.mlpackage # TDT prediction network
106
  β”œβ”€β”€ JointDecision.mlpackage # TDT joint network
107
  β”œβ”€β”€ vocab.json # Token vocabulary (1024 tokens)
108
+ β”œβ”€β”€ metadata.json # Model configuration
109
+ β”œβ”€β”€ pyproject.toml # Python dependencies
110
+ β”œβ”€β”€ uv.lock # Locked dependencies
111
+ └── scripts/ # Inference & conversion scripts
112
  ```
113
 
114
  ## Decoding Modes
inference.py CHANGED
@@ -279,7 +279,7 @@ def main():
279
  help="Path to audio file (WAV, MP3, etc.)"
280
  )
281
  parser.add_argument(
282
- "--model-dir", type=str, default="./model",
283
  help="Directory containing CoreML model files"
284
  )
285
  parser.add_argument(
 
279
  help="Path to audio file (WAV, MP3, etc.)"
280
  )
281
  parser.add_argument(
282
+ "--model-dir", type=str, default=".",
283
  help="Directory containing CoreML model files"
284
  )
285
  parser.add_argument(