sinhala-tts / scripts

Commit History

Add scratch training notes and metadata
1870050

outlawmold commited on

Use parallel hf_transfer for model uploads
3b37854

outlawmold commited on

Add training logs and checkpoint upload tooling
2df5f7d

outlawmold commited on

Add full dataset recovery artifacts and run notes
d0cbd0a

outlawmold commited on

Add server deployment instructions and cross-platform training fixes
937d0e8

outlawmold commited on

Add MPS training stability fixes and experiment logs
19655a1

outlawmold commited on

Make training scripts cross-platform (CUDA / MPS / CPU)
16b3354

outlawmold commited on

Add bnb_optimizer and configurable batch params to training scripts
e76ab40

outlawmold commited on

Fix critical issues, migrate to IndicF5 fine-tuning, update pipeline
dd75f48

outlawmold commited on

Fix Windows finetune launcher and local artifact ignores
f31f6bc

outlawmold commited on

Add auto next-batch launchers for Mac/Windows and cache-safe Mac training defaults
075e81a

outlawmold commited on

Add HF cache modes to safe finetune launcher and docs
3972cd5

outlawmold commited on

Fix: remove --finetune flag (vocab size mismatch with pretrained base), train from scratch with custom Sinhala vocab
cb99d69
verified

outlawmold commited on

Update finetune script: epochs=20, batch_size=600, tuned for RTX 4050 6GB
0e8bc92
verified

outlawmold commited on

Add safe finetune launcher and latest training artifacts (HF-compatible)
5bed4d1

outlawmold commited on

Add local training launch script (Mac MPS + Linux CUDA)
e6e5d50
verified

outlawmold commited on

Add F5-TTS data prep script (Sinhala Arrow builder, bypasses pinyin)
76d096a
verified

outlawmold commited on

Add start-index support to HF CC runner for non-overlapping batches
3a7dac7

outlawmold commited on

Add resilient CC batching, hybrid filtering, and 10-video training prep workflow
718d352

outlawmold commited on

Upload scripts/run_cc_batches.py
380a0e9
verified

outlawmold commited on

Upload scripts/cc_pipeline.py
297e25c
verified

outlawmold commited on

Add Gemini sentence-quality filter to CC processing pipeline
6e3a4f8

outlawmold commited on

Add HF-matched CC+WAV download runner for iteration 6
4b3e68e

outlawmold commited on

Fix iteration 6 CC input parsing and UTF-8 handling
dd41b53

outlawmold commited on

Add CC-based pipeline script (v5) — uses YouTube auto-generated captions instead of local ASR"
8fba19c
verified

outlawmold commited on

Batch wav2vec2 ASR inference
ec0ade5

outlawmold commited on

Fix distributed upload replacement races
9229bd9

outlawmold commited on

Add TTS quality controls
4e72dd9

outlawmold commited on

Replace train_f5tts.py with complete fine-tuning script (vocab gen + data prep + training + inference)
cb6d98d
verified

outlawmold commited on

Add local investigation notes and logs
789aa59

outlawmold commited on

Add distributed pipeline coordination
d42bc19

outlawmold commited on

Align local pipeline with iteration 004 ASR plan
6ca3942

outlawmold commited on

Merge origin/macos-apple-silicon into main
7a31795

outlawmold commited on

feat(macos): implement Apple Silicon optimizations and switch to wav2vec2 ASR
1a2a2b3

outlawmold commited on

fix: switch to faster-whisper with local CT2 model for stable ASR
2a9a208

outlawmold commited on

fix: complete ASR transcription fix in transcribe_whisper_hf
0556117

outlawmold commited on

fix: resolve ASR transcription failure in whisper-hf backend
13ed016

outlawmold commited on

Pipeline v3: Multi-ASR backend (whisper-hf, MMS, faster-whisper), anti-hallucination, default to Lingalingeswaran/whisper-small-sinhala_v3
7f82ad6
verified

outlawmold commited on

Pipeline v3: Multi-ASR backend (whisper-hf, MMS, faster-whisper), anti-hallucination, default to Lingalingeswaran/whisper-small-sinhala_v3
0f417a9
verified

outlawmold commited on

Add ASR model comparison test script (MMS, Whisper fine-tunes, Whisper large-v3)
4ed7a6b
verified

outlawmold commited on

Add scripts/train_f5tts.py
ab9854a
verified

outlawmold commited on

Add scripts/convert_to_f5.py
f075a37
verified

outlawmold commited on

Finalize optimized pipeline with fine-tuned Whisper and model files
6990c3a

outlawmold commited on

Finalize verified local pipeline for GPU processing
b6f4c31

outlawmold commited on

Enhance local pipeline robustness and uploader stability
510e706

outlawmold commited on

Add local_pipeline.py — laptop-optimized data processing (GTX 4050 6GB VRAM)
1e3cb7c
verified

outlawmold commited on

Optimize uploader: 20-way parallel streams and 10-hour timeout
832b9a6

outlawmold commited on

Add upload timeout and bypass stuck batch
dfd121b

outlawmold commited on

Optimize download_and_upload.py: add batch cleanup and fix encoding
bd33bfc

outlawmold commited on

Add Phase 1: local download & upload script (Option C hybrid approach)
1225d5f
verified

outlawmold commited on