| license: apache-2.0 | |
| base_model: HuggingFaceTB/SmolLM2-135M-Instruct | |
| tags: | |
| - code-tape | |
| - subtitle | |
| - lora | |
| - text-generation | |
| # code-tape Subtitle Postprocessor LoRA v12 | |
| LoRA adapter for the code-tape browser-local subtitle postprocessor. It is trained to correct ASR subtitles for frontend/code terminology and generate playback chapter jump points. | |
| ## Contract | |
| Input messages contain: | |
| - `context`: file name, code/runtime snippets, and glossary. | |
| - `inputSegments`: subtitle `id` and `text` only. | |
| - `timeline`: subtitle `id`, `startMs`, and `endMs`. | |
| Output must be one JSON object: | |
| ```json | |
| {"segments":[{"id":"subtitle-1","text":"这里用 useState 维护 count"}],"chapters":[{"title":"状态设计","startMs":0,"endMs":1000}]} | |
| ``` | |
| `segments` should be sparse and contain only changed subtitles. | |
| ## Training Notes | |
| - Base: `HuggingFaceTB/SmolLM2-135M-Instruct` | |
| - Records: 450 curated/distilled examples | |
| - Epochs: 2 | |
| - Final train loss: 0.2545 | |
| - Corpus gates: JSON valid rate 1.0, sparse output rate 0.9333, unknown segment reference rate 0 | |