AI & ML interests

None defined yet.

pollixĀ 
posted an update about 5 hours ago
view post
Post
22
Shipped StudioMI300 for the AMD x lablab hackathon. One English sentence becomes a 30-second cinematic reel, end-to-end on a single AMD Instinct MI300X.

Every model in the pipeline is Apache 2.0 or MIT.

šŸŽ¬ Director Agent — Qwen3.5-35B-A3B via vLLM with AITER MoE acceleration. Plans 6 shots, character bibles, music brief, per-shot voice-over.

šŸŽØ Character keyframes — FLUX.2 klein 4B reference editing. No LoRA training step. Identity stays consistent across shots by construction.

šŸŽžļø Animation — Wan2.2-I2V-A14B with ParaAttention FBCache (lossless 2x) and selective torch.compile on transformer_2 (another 1.2x). End-to-end Wan2.2 inference went from 25.9 min to 10.4 min per 720p clip.

šŸ” Vision Critic — same Qwen3.5 checkpoint reloaded with a 10-label failure taxonomy (character drift, extras invade frame, camera ignored, walking backwards, hand artifact, wardrobe drift, neon glow leak, stylized AI look, random intimacy, object morphing). Bad clips auto-retry with targeted strategies. Up to 3 attempts.

šŸŽµ Music — ACE-Step v1 generates 30s instrumental from Director's brief.

šŸ—£ļø Narration — Kokoro-82M, 9 languages. Director picks language to match setting. Tokyo to Japanese, Paris to French, Mumbai to Hindi.

The 192 GB HBM3 on MI300X is what lets four very different model architectures share one card sequentially. On a 24 GB consumer GPU this stack needs 4-5 separate machines wired together.

Space (live infra restoring after hackathon close, pls like this space):
lablab-ai-amd-developer-hackathon/studiomi300

Code (Apache 2.0):
https://github.com/bladedevoff/studiomi300

Special thanks to the FLUX, Wan2.2, ACE-Step and Kokoro teams for keeping serious generative AI open. The pipeline composes their work into something none of them alone can produce — a complete cinematic artifact from a single prompt.

#AMDHackathon #ROCm #MI300X #OpenSource