File size: 1,063 Bytes
2100300
7586591
8277ab1
 
1ed2b2d
 
 
 
2100300
 
1ed2b2d
2100300
1ed2b2d
2100300
1ed2b2d
2100300
1ed2b2d
2100300
1ed2b2d
 
 
2100300
1ed2b2d
2100300
7586591
1ed2b2d
7586591
2100300
1ed2b2d
2100300
1ed2b2d
2100300
1ed2b2d
 
 
 
 
8277ab1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
license: apache-2.0
base_model: HuggingFaceTB/SmolLM2-135M-Instruct
tags:
  - code-tape
  - subtitle
  - lora
  - text-generation
---

# code-tape Subtitle Postprocessor LoRA v12

LoRA adapter for the code-tape browser-local subtitle postprocessor. It is trained to correct ASR subtitles for frontend/code terminology and generate playback chapter jump points.

## Contract

Input messages contain:

- `context`: file name, code/runtime snippets, and glossary.
- `inputSegments`: subtitle `id` and `text` only.
- `timeline`: subtitle `id`, `startMs`, and `endMs`.

Output must be one JSON object:

```json
{"segments":[{"id":"subtitle-1","text":"这里用 useState 维护 count"}],"chapters":[{"title":"状态设计","startMs":0,"endMs":1000}]}
```

`segments` should be sparse and contain only changed subtitles.

## Training Notes

- Base: `HuggingFaceTB/SmolLM2-135M-Instruct`
- Records: 450 curated/distilled examples
- Epochs: 2
- Final train loss: 0.2545
- Corpus gates: JSON valid rate 1.0, sparse output rate 0.9333, unknown segment reference rate 0