mae-video-ingestion

Profile-driven structured-signal extraction from videos using Qwen2.5-VL native video mode + Whisper transcript + 7 archetype prompt profiles + schema-agnostic provenance-tagged output.

Built as Layer 3 of the offline LLM stack documented in architecture-notes. Validated against DeepMind AlphaProof Nexus paper (arXiv 2605.22763) and López de Prado / Warrior Trading educational videos.

What this is

A sealed-box wrapper around the canonical Layer-2 notebook in memory-oracle/notebooks/video-ingestion/. Customer provides:

  • video_url: YouTube URL or direct .mp4 URL
  • profile: one of seven prompt archetypes
  • Tuning knobs: chunk_duration_sec, video_fps, video_max_pixels, max_new_tokens

Returns the same schema_version=2 JSON the local pipeline produces — metadata + transcript + per-chunk signal + schema-agnostic aggregate with provenance tagging on every entry.

The seven prompt profiles

Profile When to use
ai-systems-research Paper-companion videos (Two Minute Papers, AI Coffee Break, paper explainers)
paper-author-talk Conference talks by paper authors (NeurIPS / ICML / ICLR), with Q&A extraction
coding-tutorial Hands-on walkthroughs (Karpathy-style) — code is the content
product-announcement AI lab launch videos, with capability_claims + caveats_buried_in_fine_print extraction
trading-education Pedagogical trading videos with pattern / filter / risk extraction
trading-intelligence Market-intel financial videos with ticker / sentiment extraction
general-summary Fallback for videos that don't match any archetype

Full schemas in memory-oracle/notebooks/video-ingestion/prompt-profiles/.

Companion artifacts

What this is NOT

This model card describes a pipeline, not a new model. The actual inference uses Qwen2.5-VL-7B-Instruct for vision-language extraction and openai/whisper-base for audio transcript — both unmodified upstream weights.

The differentiation is the substrate layer around the model:

  • Profile-driven prompt selection with channel-hint runtime injection
  • Schema-agnostic aggregate with provenance tagging (_chunk_idx, _chunk_start_sec on every entry)
  • Cross-field consistency hallucination catch (operator-curated rejection log per source)
  • Pinned reproducible build (CUDA + torch + transformers + decord + qwen-vl-utils all version-locked)
  • Append-never-mutate amendment pattern via .amendments.jsonl sidecars when extractions are later corrected

Reach the operator

What Where
Replicate.com https://replicate.com/ramene/mae-video-ingestion
GitHub source https://github.com/ramene/memory-oracle (notebook + Dockerfile + Replicate cog wrapper + this card)
Commercial / batch / SLA / custom prompts mailto:anthony.ramene@appmaestro.ai (appmaestro.ai launching Q3)

License

MIT for the wrapper code (cog wrapper, Dockerfile, predict.py, prompt profiles). Upstream model licenses apply for Qwen2.5-VL (Tongyi Qianwen License) and Whisper (MIT).

Citation

If you use this in research:

@misc{ramene2026mae,
  author       = {Anthony, Ramene},
  title        = {mae-video-ingestion: profile-driven VLM extraction with operator-curated substrate},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/ramene/mae-video-ingestion}},
  note         = {Built on Qwen2.5-VL-7B-Instruct + Whisper. Substrate pattern: Evidence-Bound Retrieval (EBR) via memory-oracle.}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ramene/mae-video-ingestion

Finetuned
(1129)
this model