YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Mega Multimodal Model V6 (mega-multimodal-v6)

Multimodal model with unified interface, selective loading, auto-skip for failed loads.

Included Capabilities & Models (Based on Config)

  • ControlNet (Canny): lllyasviel/control_v11p_sd15_canny (Not Loaded)
  • ControlNet (Depth): lllyasviel/control_v11f1p_sd15_depth (Not Loaded)
  • ControlNet (Pose): lllyasviel/control_v11p_sd15_openpose (Not Loaded)
  • ControlNet (Softedge): N/A (controlnet_softedge) (Not Loaded)
  • Audio_cls Model: microsoft/wavlm-base-plus-sd (Not Loaded)
  • bark: suno/bark (Not Loaded)
  • Caption Model: nlpconnect/vit-gpt2-image-captioning (Not Loaded)
  • clip: openai/clip-vit-base-patch32 (Not Loaded)
  • code: bigcode/starcoder2-3b (Not Loaded)
  • Depth Model: Intel/dpt-large (Not Loaded)
  • detr: facebook/detr-resnet-50 (Not Loaded)
  • Docvqa Model: naver-clova-ix/donut-base-finetuned-docvqa (Not Loaded)
  • music: facebook/musicgen-medium (Not Loaded)
  • ner: dbmdz/bert-large-cased-finetuned-conll03-english (Not Loaded)
  • qa: distilbert-base-uncased-distilled-squad (Not Loaded)
  • sentiment: unitary/toxic-bert (Not Loaded)
  • speech: openai/whisper-tiny (Not Loaded)
  • text: Qwen/Qwen1.5-1.8B (Not Loaded)
  • tqa: google/tapas-base-finetuned-wtq (Not Loaded)
  • trocr: microsoft/trocr-small-handwritten (Not Loaded)
  • Video_cls Model: MCG-NJU/videomae-base-finetuned-kinetics (Not Loaded)
  • Vqa Model: dandelin/vilt-b32-finetuned-vqa (Not Loaded)
  • zshot_cls: openai/clip-vit-large-patch14 (Not Loaded)
  • zshot_det: google/owlvit-base-patch32 (Not Loaded)
  • i2v: stabilityai/stable-video-diffusion-img2vid-xt (Not Loaded)
  • instruct_pix2pix: timbrooks/instruct-pix2pix (Not Loaded)
  • kandinsky_decoder: kandinsky-community/kandinsky-2-2-decoder (Not Loaded)
  • kandinsky_prior: kandinsky-community/kandinsky-2-2-prior (Not Loaded)
  • refine: stabilityai/stable-diffusion-xl-refiner-1.0 (Not Loaded)
  • Text-to-Image (SD 1.5): runwayml/stable-diffusion-v1-5 (Not Loaded)
  • sd_inpainting: runwayml/stable-diffusion-inpainting (Not Loaded)
  • Text-to-Image (SDXL): stabilityai/stable-diffusion-xl-base-1.0 (Not Loaded)
  • shape_pipe: stabilityai/shap-e (Not Loaded)
  • t2v: cerspense/zeroscope_v2_576w (Not Loaded)
  • Audio_cls Processor: microsoft/wavlm-base-plus-sd (Not Loaded)
  • Bark Processor: suno/bark (Not Loaded)
  • Caption Processor: nlpconnect/vit-gpt2-image-captioning (Not Loaded)
  • Clip Processor: openai/clip-vit-base-patch32 (Not Loaded)
  • Depth Processor: Intel/dpt-large (Not Loaded)
  • Detr Processor: facebook/detr-resnet-50 (Not Loaded)
  • Docvqa Processor: naver-clova-ix/donut-base-finetuned-docvqa (Not Loaded)
  • Speech Processor: openai/whisper-tiny (Not Loaded)
  • Trocr Processor: microsoft/trocr-small-handwritten (Not Loaded)
  • Video_cls Processor: MCG-NJU/videomae-base-finetuned-kinetics (Not Loaded)
  • Vqa Processor: dandelin/vilt-b32-finetuned-vqa (Not Loaded)
  • Zshot_cls Processor: openai/clip-vit-large-patch14 (Not Loaded)
  • Zshot_det Processor: google/owlvit-base-patch32 (Not Loaded)
  • Code Tokenizer: bigcode/starcoder2-3b (Not Loaded)
  • Music Tokenizer: facebook/musicgen-medium (Not Loaded)
  • Ner Tokenizer: dbmdz/bert-large-cased-finetuned-conll03-english (Not Loaded)
  • Qa Tokenizer: distilbert-base-uncased-distilled-squad (Not Loaded)
  • Sentiment Tokenizer: unitary/toxic-bert (Not Loaded)
  • Text Tokenizer: Qwen/Qwen1.5-1.8B (Not Loaded)
  • Tqa Tokenizer: google/tapas-base-finetuned-wtq (Not Loaded)

Optimizations & Mechanisms

  • Selective Loading: from_pretrained(..., components_to_load=[...]).
  • Auto-Skip Failed Loads: Logs errors and continues if a component fails.
  • Logging & Performance Timing: Optional generate(..., time_execution=True).
  • Input Validation: Enhanced type/value checks.
  • Custom BitLinear: Configured: False.
  • BitsAndBytes Quantization: Configured: False, Mode: 4bit.
  • Global Pruning: Configured Amount: 0.0.
  • Gradient Checkpointing: Configured: False.
  • Flash Attention 2: Configured: False.
  • Diffusers Optimizations: Slicing (True), Offload (True).
  • Low CPU Mem Usage: Configured: True.

Saving & Loading

Uses standard save_pretrained / from_pretrained. Components in subdirs. Failed loads during from_pretrained skipped.

python from mega_multimodal_model import MegaMultimodalModel # Assuming class definition saved

Save

model.save_pretrained("./my_mega_multimodal_model_v6")

Load all

loaded_model = MegaMultimodalModel.from_pretrained("./my_mega_multimodal_model_v6")

Load selectively (example)

components = ['text', 'text_tok', 'sd', 'canny'] # Use class attribute names or controlnet types

loaded_model_subset = MegaMultimodalModel.from_pretrained("./my_mega_multimodal_model_v6", components_to_load=components)

Load from Hub

loaded_model = MegaMultimodalModel.from_pretrained("your_hf_username/your_repo_name")

loaded_model_subset = MegaMultimodalModel.from_pretrained("your_hf_username/your_repo_name", components_to_load=components)

Usage

text_output = loaded_model.generate("Hello!", task="text", time_execution=True)

Installation Dependencies

Core: torch torchvision torchaudio transformers diffusers huggingface_hub[hf_xet] hf_xet safetensors timm Pillow accelerate bitsandbytes einops pandas decord ftfy pyav --upgrade Optional: controlnet-aux, flash-attn --no-build-isolation

Model Configuration (config.json)

Stores component IDs and optimization settings.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support