mae-video-ingestion

Profile-driven structured-signal extraction from videos using Qwen2.5-VL native video mode + Whisper transcript + 7 archetype prompt profiles + schema-agnostic provenance-tagged output.

Built as Layer 3 of the offline LLM stack documented in architecture-notes. Validated against DeepMind AlphaProof Nexus paper (arXiv 2605.22763) and López de Prado / Warrior Trading educational videos.

What this is

A sealed-box wrapper around the canonical Layer-2 notebook in memory-oracle/notebooks/video-ingestion/. Customer provides:

video_url: YouTube URL or direct .mp4 URL
profile: one of seven prompt archetypes
Tuning knobs: chunk_duration_sec, video_fps, video_max_pixels, max_new_tokens

Returns the same schema_version=2 JSON the local pipeline produces — metadata + transcript + per-chunk signal + schema-agnostic aggregate with provenance tagging on every entry.

The seven prompt profiles

Profile	When to use
`ai-systems-research`	Paper-companion videos (Two Minute Papers, AI Coffee Break, paper explainers)
`paper-author-talk`	Conference talks by paper authors (NeurIPS / ICML / ICLR), with Q&A extraction
`coding-tutorial`	Hands-on walkthroughs (Karpathy-style) — code is the content
`product-announcement`	AI lab launch videos, with capability_claims + caveats_buried_in_fine_print extraction
`trading-education`	Pedagogical trading videos with pattern / filter / risk extraction
`trading-intelligence`	Market-intel financial videos with ticker / sentiment extraction
`general-summary`	Fallback for videos that don't match any archetype

Full schemas in memory-oracle/notebooks/video-ingestion/prompt-profiles/.

Companion artifacts

Public corpus: mae-curriculae-quant-foundations — 10 López lesson cards under CC-BY-SA 4.0, prototype of operator-curated curriculum extraction
Substrate paper: memory-oracle/paper/ — LNCS clinical case study + CoALA position paper (in revision)
Lead essay: The Harness IS the Intelligence — positions this work alongside DeepMind AlphaProof Nexus

What this is NOT

This model card describes a pipeline, not a new model. The actual inference uses Qwen2.5-VL-7B-Instruct for vision-language extraction and openai/whisper-base for audio transcript — both unmodified upstream weights.

The differentiation is the substrate layer around the model:

Profile-driven prompt selection with channel-hint runtime injection
Schema-agnostic aggregate with provenance tagging (_chunk_idx, _chunk_start_sec on every entry)
Cross-field consistency hallucination catch (operator-curated rejection log per source)
Pinned reproducible build (CUDA + torch + transformers + decord + qwen-vl-utils all version-locked)
Append-never-mutate amendment pattern via .amendments.jsonl sidecars when extractions are later corrected

Reach the operator

What	Where
Replicate.com	https://replicate.com/ramene/mae-video-ingestion
GitHub source	https://github.com/ramene/memory-oracle (notebook + Dockerfile + Replicate cog wrapper + this card)
Commercial / batch / SLA / custom prompts	mailto:anthony.ramene@appmaestro.ai (appmaestro.ai launching Q3)

License

MIT for the wrapper code (cog wrapper, Dockerfile, predict.py, prompt profiles). Upstream model licenses apply for Qwen2.5-VL (Tongyi Qianwen License) and Whisper (MIT).

Citation

If you use this in research:

@misc{ramene2026mae,
  author       = {Anthony, Ramene},
  title        = {mae-video-ingestion: profile-driven VLM extraction with operator-curated substrate},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ramene/mae-video-ingestion}},
  note         = {Built on Qwen2.5-VL-7B-Instruct + Whisper. Substrate pattern: Evidence-Bound Retrieval (EBR) via memory-oracle.}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Video-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ramene/mae-video-ingestion

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

(1129)

this model