YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Mega Multimodal Model V6 (mega-multimodal-v6)
Multimodal model with unified interface, selective loading, auto-skip for failed loads.
Included Capabilities & Models (Based on Config)
- ControlNet (Canny): lllyasviel/control_v11p_sd15_canny (Not Loaded)
- ControlNet (Depth): lllyasviel/control_v11f1p_sd15_depth (Not Loaded)
- ControlNet (Pose): lllyasviel/control_v11p_sd15_openpose (Not Loaded)
- ControlNet (Softedge): N/A (controlnet_softedge) (Not Loaded)
- Audio_cls Model: microsoft/wavlm-base-plus-sd (Not Loaded)
- bark: suno/bark (Not Loaded)
- Caption Model: nlpconnect/vit-gpt2-image-captioning (Not Loaded)
- clip: openai/clip-vit-base-patch32 (Not Loaded)
- code: bigcode/starcoder2-3b (Not Loaded)
- Depth Model: Intel/dpt-large (Not Loaded)
- detr: facebook/detr-resnet-50 (Not Loaded)
- Docvqa Model: naver-clova-ix/donut-base-finetuned-docvqa (Not Loaded)
- music: facebook/musicgen-medium (Not Loaded)
- ner: dbmdz/bert-large-cased-finetuned-conll03-english (Not Loaded)
- qa: distilbert-base-uncased-distilled-squad (Not Loaded)
- sentiment: unitary/toxic-bert (Not Loaded)
- speech: openai/whisper-tiny (Not Loaded)
- text: Qwen/Qwen1.5-1.8B (Not Loaded)
- tqa: google/tapas-base-finetuned-wtq (Not Loaded)
- trocr: microsoft/trocr-small-handwritten (Not Loaded)
- Video_cls Model: MCG-NJU/videomae-base-finetuned-kinetics (Not Loaded)
- Vqa Model: dandelin/vilt-b32-finetuned-vqa (Not Loaded)
- zshot_cls: openai/clip-vit-large-patch14 (Not Loaded)
- zshot_det: google/owlvit-base-patch32 (Not Loaded)
- i2v: stabilityai/stable-video-diffusion-img2vid-xt (Not Loaded)
- instruct_pix2pix: timbrooks/instruct-pix2pix (Not Loaded)
- kandinsky_decoder: kandinsky-community/kandinsky-2-2-decoder (Not Loaded)
- kandinsky_prior: kandinsky-community/kandinsky-2-2-prior (Not Loaded)
- refine: stabilityai/stable-diffusion-xl-refiner-1.0 (Not Loaded)
- Text-to-Image (SD 1.5): runwayml/stable-diffusion-v1-5 (Not Loaded)
- sd_inpainting: runwayml/stable-diffusion-inpainting (Not Loaded)
- Text-to-Image (SDXL): stabilityai/stable-diffusion-xl-base-1.0 (Not Loaded)
- shape_pipe: stabilityai/shap-e (Not Loaded)
- t2v: cerspense/zeroscope_v2_576w (Not Loaded)
- Audio_cls Processor: microsoft/wavlm-base-plus-sd (Not Loaded)
- Bark Processor: suno/bark (Not Loaded)
- Caption Processor: nlpconnect/vit-gpt2-image-captioning (Not Loaded)
- Clip Processor: openai/clip-vit-base-patch32 (Not Loaded)
- Depth Processor: Intel/dpt-large (Not Loaded)
- Detr Processor: facebook/detr-resnet-50 (Not Loaded)
- Docvqa Processor: naver-clova-ix/donut-base-finetuned-docvqa (Not Loaded)
- Speech Processor: openai/whisper-tiny (Not Loaded)
- Trocr Processor: microsoft/trocr-small-handwritten (Not Loaded)
- Video_cls Processor: MCG-NJU/videomae-base-finetuned-kinetics (Not Loaded)
- Vqa Processor: dandelin/vilt-b32-finetuned-vqa (Not Loaded)
- Zshot_cls Processor: openai/clip-vit-large-patch14 (Not Loaded)
- Zshot_det Processor: google/owlvit-base-patch32 (Not Loaded)
- Code Tokenizer: bigcode/starcoder2-3b (Not Loaded)
- Music Tokenizer: facebook/musicgen-medium (Not Loaded)
- Ner Tokenizer: dbmdz/bert-large-cased-finetuned-conll03-english (Not Loaded)
- Qa Tokenizer: distilbert-base-uncased-distilled-squad (Not Loaded)
- Sentiment Tokenizer: unitary/toxic-bert (Not Loaded)
- Text Tokenizer: Qwen/Qwen1.5-1.8B (Not Loaded)
- Tqa Tokenizer: google/tapas-base-finetuned-wtq (Not Loaded)
Optimizations & Mechanisms
- Selective Loading: from_pretrained(..., components_to_load=[...]).
- Auto-Skip Failed Loads: Logs errors and continues if a component fails.
- Logging & Performance Timing: Optional generate(..., time_execution=True).
- Input Validation: Enhanced type/value checks.
- Custom BitLinear: Configured: False.
- BitsAndBytes Quantization: Configured: False, Mode: 4bit.
- Global Pruning: Configured Amount: 0.0.
- Gradient Checkpointing: Configured: False.
- Flash Attention 2: Configured: False.
- Diffusers Optimizations: Slicing (True), Offload (True).
- Low CPU Mem Usage: Configured: True.
Saving & Loading
Uses standard save_pretrained / from_pretrained. Components in subdirs. Failed loads during from_pretrained skipped.
python from mega_multimodal_model import MegaMultimodalModel # Assuming class definition saved
Save
model.save_pretrained("./my_mega_multimodal_model_v6")
Load all
loaded_model = MegaMultimodalModel.from_pretrained("./my_mega_multimodal_model_v6")
Load selectively (example)
components = ['text', 'text_tok', 'sd', 'canny'] # Use class attribute names or controlnet types
loaded_model_subset = MegaMultimodalModel.from_pretrained("./my_mega_multimodal_model_v6", components_to_load=components)
Load from Hub
loaded_model = MegaMultimodalModel.from_pretrained("your_hf_username/your_repo_name")
loaded_model_subset = MegaMultimodalModel.from_pretrained("your_hf_username/your_repo_name", components_to_load=components)
Usage
text_output = loaded_model.generate("Hello!", task="text", time_execution=True)
Installation Dependencies
Core: torch torchvision torchaudio transformers diffusers huggingface_hub[hf_xet] hf_xet safetensors timm Pillow accelerate bitsandbytes einops pandas decord ftfy pyav --upgrade Optional: controlnet-aux, flash-attn --no-build-isolation
Model Configuration (config.json)
Stores component IDs and optimization settings.
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support