Can you make a nvfp4 version of kijai's extracted transformers model?
Hi,
This has the dev transformers only model. It can be used with separated audio and video vae also help in better memory management.
Technically the transformer only version is just all the layers removed except ones starting with model.diffusion_model.. The only advantage here would be file size though.
The vae, audio vae, vocoder and text embeddings projector together only end up taking around 4 gigabytes regardless, this is in storage space.
The models in this repo can also be used with separated audio and video vaes, because simply, if comfy never needs a model, it won't load it, so if you load the extracted vaes from kijai and don't connect the checkpoint loader's vae, you still only end up loading the transformer from the file, and the vaes from the other files, so the memory usage is the same. (This is in theory, but if it's different, that's unintentional behavior in comfy.)
DIY solution if you really need it
If the 4gb difference bothers you (like for example if you're really low on storage space), I recommend making a filter script or asking an LLM like gemini to write a python script which uses pytorch with safetensors to load a safetensors file, then filter the dict keys to only keep keys which start with "model.diffusion_model.", then save that to a new file, keeping the metadata intact
Again I don't think there's any benefit to doing this besides saving 4gb storage space.
Thanks man for the great explanation. I will try to make a script and see it i can make it..thanks