ceilf6's picture
Publish subtitle postprocessor v12
1ed2b2d verified
metadata
license: apache-2.0
base_model: HuggingFaceTB/SmolLM2-135M-Instruct
tags:
  - code-tape
  - subtitle
  - lora
  - text-generation

code-tape Subtitle Postprocessor LoRA v12

LoRA adapter for the code-tape browser-local subtitle postprocessor. It is trained to correct ASR subtitles for frontend/code terminology and generate playback chapter jump points.

Contract

Input messages contain:

  • context: file name, code/runtime snippets, and glossary.
  • inputSegments: subtitle id and text only.
  • timeline: subtitle id, startMs, and endMs.

Output must be one JSON object:

{"segments":[{"id":"subtitle-1","text":"这里用 useState 维护 count"}],"chapters":[{"title":"状态设计","startMs":0,"endMs":1000}]}

segments should be sparse and contain only changed subtitles.

Training Notes

  • Base: HuggingFaceTB/SmolLM2-135M-Instruct
  • Records: 450 curated/distilled examples
  • Epochs: 2
  • Final train loss: 0.2545
  • Corpus gates: JSON valid rate 1.0, sparse output rate 0.9333, unknown segment reference rate 0