M725 commited on
Commit
02ecd92
·
verified ·
1 Parent(s): 7c3df05

upload cortexa-create-feedback v1

Browse files
Files changed (4) hide show
  1. README.md +63 -0
  2. config.json +21 -0
  3. student_int8.onnx +3 -0
  4. tokenizer.json +143 -0
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: other
5
+ license_name: pleius-internal
6
+ tags:
7
+ - onnx
8
+ - conditional-text-generation
9
+ - video-feedback
10
+ - distillation
11
+ - creator-tools
12
+ ---
13
+
14
+ # cortexa-create-feedback (distilled student)
15
+
16
+ A ~4.4M-parameter conditional decoder distilled from
17
+ `M725/cortexa-create-scorer` outputs. Takes CLIP-ViT-B/32 vision
18
+ features (mean-pooled across video frames, 768-d) + the 5 Create pillar
19
+ scores and emits a creator-vernacular phrase chain about the short-form
20
+ video:
21
+
22
+ ```
23
+ "first frame slaps | feels intentional"
24
+ "thumb stopping | shareable"
25
+ "filler | feels rushed | first frame is nothing"
26
+ "feels off beat | slow open | no payoff"
27
+ ```
28
+
29
+ ## Files
30
+
31
+ | file | purpose |
32
+ |---|---|
33
+ | `student_int8.onnx` | TinyTransformer decoder, 4 layers / 256-dim / 4 heads, INT8 dynamic-quantized. 6.9 MB. |
34
+ | `tokenizer.json` | Whole-phrase tokenizer (vocab ~138; specials `<pad>`, `<bos>`, `<eos>`, `<sep>`). |
35
+ | `config.json` | Encoder dim, pillar names, vocab size, special-token ids. |
36
+
37
+ ## Inference shape
38
+
39
+ ```
40
+ inputs:
41
+ encoder_feats (1, 768) float32 # mean-pooled CLIP-ViT-B/32 vision across frames
42
+ scores (1, 5) float32 # [hook, hold, algorithmic_fit, brand_lift, overall] in [0,1]
43
+ scores_present (1,) float32 # 1.0 anchored, 0.0 fast-mode
44
+ input_ids (1, T) int64
45
+ outputs:
46
+ logits (1, T, V) float32
47
+ ```
48
+
49
+ Same sampling recommendation as `cortexa-marketing-feedback`: temperature
50
+ 0.8 + top-k 20 + SEP-veto.
51
+
52
+ ## Training
53
+
54
+ 6k phrase triples from 3 real short-form videos
55
+ (`public/create-tutorial/*.mp4`) + 1997 synthetic "videos" built by
56
+ random-crop + color jitter over COCO stills (each frame goes through
57
+ cortexa_v10 separately, so the per-frame curve has real variation). 15
58
+ epochs. Val loss 2.39 → 1.97. See
59
+ `research/distill_students/train_create.py` in the app repo.
60
+
61
+ ## License
62
+
63
+ Pleius internal — see https://pleius.com. Not for redistribution.
config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "modality": "create",
3
+ "encoder": "openai/clip-vit-base-patch32",
4
+ "encoder_dim": 768,
5
+ "n_pillars": 5,
6
+ "pillars": [
7
+ "hook",
8
+ "hold",
9
+ "algorithmic_fit",
10
+ "brand_lift",
11
+ "overall"
12
+ ],
13
+ "d_model": 256,
14
+ "n_layers": 4,
15
+ "max_seq_len": 16,
16
+ "vocab_size": 138,
17
+ "bos_id": 1,
18
+ "eos_id": 2,
19
+ "pad_id": 0,
20
+ "sep_id": 3
21
+ }
student_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2fd3f324a4cf5d14ca7db136981b49070672411c8c5dc479e4d23922b91ff67b
3
+ size 7238496
tokenizer.json ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "modality": "create",
3
+ "tokens": [
4
+ "<pad>",
5
+ "<bos>",
6
+ "<eos>",
7
+ "<sep>",
8
+ "fast paced hook",
9
+ "hook lands",
10
+ "opening lands",
11
+ "first frame slaps",
12
+ "first 2 seconds work",
13
+ "good first impression",
14
+ "actually grabs you",
15
+ "i kept watching",
16
+ "scroll stopping",
17
+ "thumb stopping",
18
+ "strong opening",
19
+ "first 3 seconds work",
20
+ "hook didn't land",
21
+ "first 3 seconds dead",
22
+ "slow open",
23
+ "slow start",
24
+ "buried the hook",
25
+ "i'd skip",
26
+ "i'd swipe",
27
+ "first frame is nothing",
28
+ "weak opening",
29
+ "no reason to watch",
30
+ "where's the hook",
31
+ "i'm gone in 1 second",
32
+ "engaging movement",
33
+ "tight editing",
34
+ "great pacing",
35
+ "no dead air",
36
+ "every cut earns it",
37
+ "kept me to the end",
38
+ "stayed locked in",
39
+ "high energy",
40
+ "kept the energy",
41
+ "good rhythm",
42
+ "didn't skip",
43
+ "didn't swipe",
44
+ "lost me in the middle",
45
+ "drags",
46
+ "drags in the middle",
47
+ "filler shots",
48
+ "filler",
49
+ "too long",
50
+ "could be 10 seconds",
51
+ "boring middle",
52
+ "low energy",
53
+ "no rhythm",
54
+ "feels long",
55
+ "i bounced halfway",
56
+ "smooth transitions",
57
+ "good music sync",
58
+ "music matches",
59
+ "music elevates",
60
+ "beat drop works",
61
+ "good lighting",
62
+ "lighting clean",
63
+ "color grade clean",
64
+ "captions on point",
65
+ "captions readable",
66
+ "feels native",
67
+ "feels organic",
68
+ "looks like a real creator",
69
+ "looks high production",
70
+ "looks pro",
71
+ "framing works",
72
+ "shots feel intentional",
73
+ "too many cuts",
74
+ "shaky camera",
75
+ "shaky",
76
+ "out of focus",
77
+ "looks blurry",
78
+ "looks pixelated",
79
+ "out of sync",
80
+ "off beat",
81
+ "cuts feel random",
82
+ "music feels random",
83
+ "music too loud",
84
+ "music kills the vo",
85
+ "captions wrong timing",
86
+ "captions cover face",
87
+ "vertical crop weird",
88
+ "wrong aspect",
89
+ "lighting bad",
90
+ "shadows weird",
91
+ "looks dark",
92
+ "looks overexposed",
93
+ "clear voiceover",
94
+ "audio is clean",
95
+ "audio is crisp",
96
+ "mic is good",
97
+ "voiceover clear",
98
+ "satisfying ending",
99
+ "payoff lands",
100
+ "ending pays off",
101
+ "loops well",
102
+ "good loop",
103
+ "would save this",
104
+ "would share this",
105
+ "would rewatch",
106
+ "doesn't feel like an ad",
107
+ "muddy audio",
108
+ "background noise",
109
+ "audio peaks",
110
+ "audio clipping",
111
+ "mic is bad",
112
+ "echo",
113
+ "wind noise",
114
+ "abrupt ending",
115
+ "weak ending",
116
+ "ending falls flat",
117
+ "no payoff",
118
+ "doesn't loop",
119
+ "ai voice",
120
+ "ai face",
121
+ "uncanny",
122
+ "screams ad",
123
+ "looks like a stock video",
124
+ "low effort",
125
+ "feels intentional",
126
+ "every shot earns it",
127
+ "the vibe is right",
128
+ "saveable",
129
+ "shareable",
130
+ "would send this to a friend",
131
+ "forgettable",
132
+ "boring",
133
+ "no vibe",
134
+ "feels random",
135
+ "feels rushed",
136
+ "feels stitched together",
137
+ "would scroll past",
138
+ "fast hook",
139
+ "loses you halfway",
140
+ "feels off beat",
141
+ "no hook"
142
+ ]
143
+ }