upload cortexa-create-feedback v1
Browse files- README.md +63 -0
- config.json +21 -0
- student_int8.onnx +3 -0
- tokenizer.json +143 -0
README.md
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: other
|
| 5 |
+
license_name: pleius-internal
|
| 6 |
+
tags:
|
| 7 |
+
- onnx
|
| 8 |
+
- conditional-text-generation
|
| 9 |
+
- video-feedback
|
| 10 |
+
- distillation
|
| 11 |
+
- creator-tools
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# cortexa-create-feedback (distilled student)
|
| 15 |
+
|
| 16 |
+
A ~4.4M-parameter conditional decoder distilled from
|
| 17 |
+
`M725/cortexa-create-scorer` outputs. Takes CLIP-ViT-B/32 vision
|
| 18 |
+
features (mean-pooled across video frames, 768-d) + the 5 Create pillar
|
| 19 |
+
scores and emits a creator-vernacular phrase chain about the short-form
|
| 20 |
+
video:
|
| 21 |
+
|
| 22 |
+
```
|
| 23 |
+
"first frame slaps | feels intentional"
|
| 24 |
+
"thumb stopping | shareable"
|
| 25 |
+
"filler | feels rushed | first frame is nothing"
|
| 26 |
+
"feels off beat | slow open | no payoff"
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
## Files
|
| 30 |
+
|
| 31 |
+
| file | purpose |
|
| 32 |
+
|---|---|
|
| 33 |
+
| `student_int8.onnx` | TinyTransformer decoder, 4 layers / 256-dim / 4 heads, INT8 dynamic-quantized. 6.9 MB. |
|
| 34 |
+
| `tokenizer.json` | Whole-phrase tokenizer (vocab ~138; specials `<pad>`, `<bos>`, `<eos>`, `<sep>`). |
|
| 35 |
+
| `config.json` | Encoder dim, pillar names, vocab size, special-token ids. |
|
| 36 |
+
|
| 37 |
+
## Inference shape
|
| 38 |
+
|
| 39 |
+
```
|
| 40 |
+
inputs:
|
| 41 |
+
encoder_feats (1, 768) float32 # mean-pooled CLIP-ViT-B/32 vision across frames
|
| 42 |
+
scores (1, 5) float32 # [hook, hold, algorithmic_fit, brand_lift, overall] in [0,1]
|
| 43 |
+
scores_present (1,) float32 # 1.0 anchored, 0.0 fast-mode
|
| 44 |
+
input_ids (1, T) int64
|
| 45 |
+
outputs:
|
| 46 |
+
logits (1, T, V) float32
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
Same sampling recommendation as `cortexa-marketing-feedback`: temperature
|
| 50 |
+
0.8 + top-k 20 + SEP-veto.
|
| 51 |
+
|
| 52 |
+
## Training
|
| 53 |
+
|
| 54 |
+
6k phrase triples from 3 real short-form videos
|
| 55 |
+
(`public/create-tutorial/*.mp4`) + 1997 synthetic "videos" built by
|
| 56 |
+
random-crop + color jitter over COCO stills (each frame goes through
|
| 57 |
+
cortexa_v10 separately, so the per-frame curve has real variation). 15
|
| 58 |
+
epochs. Val loss 2.39 → 1.97. See
|
| 59 |
+
`research/distill_students/train_create.py` in the app repo.
|
| 60 |
+
|
| 61 |
+
## License
|
| 62 |
+
|
| 63 |
+
Pleius internal — see https://pleius.com. Not for redistribution.
|
config.json
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"modality": "create",
|
| 3 |
+
"encoder": "openai/clip-vit-base-patch32",
|
| 4 |
+
"encoder_dim": 768,
|
| 5 |
+
"n_pillars": 5,
|
| 6 |
+
"pillars": [
|
| 7 |
+
"hook",
|
| 8 |
+
"hold",
|
| 9 |
+
"algorithmic_fit",
|
| 10 |
+
"brand_lift",
|
| 11 |
+
"overall"
|
| 12 |
+
],
|
| 13 |
+
"d_model": 256,
|
| 14 |
+
"n_layers": 4,
|
| 15 |
+
"max_seq_len": 16,
|
| 16 |
+
"vocab_size": 138,
|
| 17 |
+
"bos_id": 1,
|
| 18 |
+
"eos_id": 2,
|
| 19 |
+
"pad_id": 0,
|
| 20 |
+
"sep_id": 3
|
| 21 |
+
}
|
student_int8.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2fd3f324a4cf5d14ca7db136981b49070672411c8c5dc479e4d23922b91ff67b
|
| 3 |
+
size 7238496
|
tokenizer.json
ADDED
|
@@ -0,0 +1,143 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"modality": "create",
|
| 3 |
+
"tokens": [
|
| 4 |
+
"<pad>",
|
| 5 |
+
"<bos>",
|
| 6 |
+
"<eos>",
|
| 7 |
+
"<sep>",
|
| 8 |
+
"fast paced hook",
|
| 9 |
+
"hook lands",
|
| 10 |
+
"opening lands",
|
| 11 |
+
"first frame slaps",
|
| 12 |
+
"first 2 seconds work",
|
| 13 |
+
"good first impression",
|
| 14 |
+
"actually grabs you",
|
| 15 |
+
"i kept watching",
|
| 16 |
+
"scroll stopping",
|
| 17 |
+
"thumb stopping",
|
| 18 |
+
"strong opening",
|
| 19 |
+
"first 3 seconds work",
|
| 20 |
+
"hook didn't land",
|
| 21 |
+
"first 3 seconds dead",
|
| 22 |
+
"slow open",
|
| 23 |
+
"slow start",
|
| 24 |
+
"buried the hook",
|
| 25 |
+
"i'd skip",
|
| 26 |
+
"i'd swipe",
|
| 27 |
+
"first frame is nothing",
|
| 28 |
+
"weak opening",
|
| 29 |
+
"no reason to watch",
|
| 30 |
+
"where's the hook",
|
| 31 |
+
"i'm gone in 1 second",
|
| 32 |
+
"engaging movement",
|
| 33 |
+
"tight editing",
|
| 34 |
+
"great pacing",
|
| 35 |
+
"no dead air",
|
| 36 |
+
"every cut earns it",
|
| 37 |
+
"kept me to the end",
|
| 38 |
+
"stayed locked in",
|
| 39 |
+
"high energy",
|
| 40 |
+
"kept the energy",
|
| 41 |
+
"good rhythm",
|
| 42 |
+
"didn't skip",
|
| 43 |
+
"didn't swipe",
|
| 44 |
+
"lost me in the middle",
|
| 45 |
+
"drags",
|
| 46 |
+
"drags in the middle",
|
| 47 |
+
"filler shots",
|
| 48 |
+
"filler",
|
| 49 |
+
"too long",
|
| 50 |
+
"could be 10 seconds",
|
| 51 |
+
"boring middle",
|
| 52 |
+
"low energy",
|
| 53 |
+
"no rhythm",
|
| 54 |
+
"feels long",
|
| 55 |
+
"i bounced halfway",
|
| 56 |
+
"smooth transitions",
|
| 57 |
+
"good music sync",
|
| 58 |
+
"music matches",
|
| 59 |
+
"music elevates",
|
| 60 |
+
"beat drop works",
|
| 61 |
+
"good lighting",
|
| 62 |
+
"lighting clean",
|
| 63 |
+
"color grade clean",
|
| 64 |
+
"captions on point",
|
| 65 |
+
"captions readable",
|
| 66 |
+
"feels native",
|
| 67 |
+
"feels organic",
|
| 68 |
+
"looks like a real creator",
|
| 69 |
+
"looks high production",
|
| 70 |
+
"looks pro",
|
| 71 |
+
"framing works",
|
| 72 |
+
"shots feel intentional",
|
| 73 |
+
"too many cuts",
|
| 74 |
+
"shaky camera",
|
| 75 |
+
"shaky",
|
| 76 |
+
"out of focus",
|
| 77 |
+
"looks blurry",
|
| 78 |
+
"looks pixelated",
|
| 79 |
+
"out of sync",
|
| 80 |
+
"off beat",
|
| 81 |
+
"cuts feel random",
|
| 82 |
+
"music feels random",
|
| 83 |
+
"music too loud",
|
| 84 |
+
"music kills the vo",
|
| 85 |
+
"captions wrong timing",
|
| 86 |
+
"captions cover face",
|
| 87 |
+
"vertical crop weird",
|
| 88 |
+
"wrong aspect",
|
| 89 |
+
"lighting bad",
|
| 90 |
+
"shadows weird",
|
| 91 |
+
"looks dark",
|
| 92 |
+
"looks overexposed",
|
| 93 |
+
"clear voiceover",
|
| 94 |
+
"audio is clean",
|
| 95 |
+
"audio is crisp",
|
| 96 |
+
"mic is good",
|
| 97 |
+
"voiceover clear",
|
| 98 |
+
"satisfying ending",
|
| 99 |
+
"payoff lands",
|
| 100 |
+
"ending pays off",
|
| 101 |
+
"loops well",
|
| 102 |
+
"good loop",
|
| 103 |
+
"would save this",
|
| 104 |
+
"would share this",
|
| 105 |
+
"would rewatch",
|
| 106 |
+
"doesn't feel like an ad",
|
| 107 |
+
"muddy audio",
|
| 108 |
+
"background noise",
|
| 109 |
+
"audio peaks",
|
| 110 |
+
"audio clipping",
|
| 111 |
+
"mic is bad",
|
| 112 |
+
"echo",
|
| 113 |
+
"wind noise",
|
| 114 |
+
"abrupt ending",
|
| 115 |
+
"weak ending",
|
| 116 |
+
"ending falls flat",
|
| 117 |
+
"no payoff",
|
| 118 |
+
"doesn't loop",
|
| 119 |
+
"ai voice",
|
| 120 |
+
"ai face",
|
| 121 |
+
"uncanny",
|
| 122 |
+
"screams ad",
|
| 123 |
+
"looks like a stock video",
|
| 124 |
+
"low effort",
|
| 125 |
+
"feels intentional",
|
| 126 |
+
"every shot earns it",
|
| 127 |
+
"the vibe is right",
|
| 128 |
+
"saveable",
|
| 129 |
+
"shareable",
|
| 130 |
+
"would send this to a friend",
|
| 131 |
+
"forgettable",
|
| 132 |
+
"boring",
|
| 133 |
+
"no vibe",
|
| 134 |
+
"feels random",
|
| 135 |
+
"feels rushed",
|
| 136 |
+
"feels stitched together",
|
| 137 |
+
"would scroll past",
|
| 138 |
+
"fast hook",
|
| 139 |
+
"loses you halfway",
|
| 140 |
+
"feels off beat",
|
| 141 |
+
"no hook"
|
| 142 |
+
]
|
| 143 |
+
}
|