AI & ML interests

None defined yet.

alvdansen 
posted an update 3 months ago
view post
Post
1530
Releasing Flimmer today — a video LoRA training toolkit for WAN 2.1 and 2.2 that covers the full pipeline from raw footage to trained checkpoint.
The standout feature is phased training: multi-stage runs where each phase has its own learning rate, epochs, and dataset, with the checkpoint carrying forward automatically. Built specifically with WAN 2.2's dual-expert MoE architecture in mind.

Data prep tools are standalone and output standard formats — they work with any trainer, not just Flimmer.

Early release, building in the open. LTX support coming next.

http://github.com/alvdansen/flimmer-trainer
alvdansen 
posted an update 3 months ago
view post
Post
1870


Just open-sourced LoRA Gym with Timothy - production-ready training pipeline for character, motion, aesthetic, and style LoRAs on Wan 2.1/2.2, built on musubi-tuner.

16 training templates across Modal (serverless) and RunPod (bare metal) covering T2V, I2V, Lightning-merged, and vanilla variants.

Our current experimentation focus is Wan 2.2, which is why we built on musubi-tuner (kohya-ss). Wan 2.2's DiT uses a Mixture-of-Experts architecture with two separate experts gated by a hard timestep switch - you're training two LoRAs per concept, one for high-noise (composition/motion) and one for low-noise (texture/identity), and loading both at inference. Musubi handles this dual-expert training natively, and our templates build on top of it to manage the correct timestep boundaries, precision settings, and flow shift values so you don't have to debug those yourself. We've also documented bug fixes for undocumented issues in musubi-tuner and validated hyperparameter defaults derived from cross-referencing multiple practitioners' results rather than untested community defaults.

Also releasing our auto-captioning toolkit for the first time. Per-LoRA-type captioning strategies for characters, styles, motion, and objects. Gemini (free) or Replicate backends.

Current hyperparameters reflect consolidated community findings. We've started our own refinement and plan to release specific recommendations and methodology as soon as next week.

Repo: github.com/alvdansen/lora-gym
  • 2 replies
·
multimodalart 
posted an update 7 months ago
view post
Post
28487
Want to iterate on a Hugging Face Space with an LLM?

Now you can easily convert any HF entire repo (Model, Dataset or Space) to a text file and feed it to a language model!

multimodalart/repo2txt
  • 1 reply
·
multimodalart 
posted an update 11 months ago
view post
Post
18409
Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it 🐐

I've built a live real time demo on Spaces 📹💨

multimodalart/self-forcing
  • 6 replies
·
awacke1 
posted an update about 1 year ago
view post
Post
2938
AI Vision & SFT Titans 🌟 Turns PDFs into text, snaps pics, and births AI art.

https://huggingface.co/spaces/awacke1/TorchTransformers-Diffusion-CV-SFT

1. OCR a grocery list or train a titan while sipping coffee? ☕
2. Camera Snap 📷: Capture life’s chaos—your cat’s face or that weird receipt. Proof you’re a spy!
3. OCR 🔍: PDFs beg for mercy as GPT-4o extracts text.
4. Image Gen 🎨: Prompt “neon superhero me”
5. PDF 📄: Double-page OCR Single-page sniping

Build Titans 🌱: Train tiny AI models. 💪Characters🧑‍🎨: Craft quirky heroes.
🎥