Text-to-YOLO-Weights Hypernetwork
A hypernetwork that takes a text description of a computer vision detection task and generates YOLO detector weights (LoRA-style adapters) in a single forward pass.
Architecture
Based on Drag-and-Drop LLMs (DnD) and Neural Network Diffusion (p-diff):
- Text Encoder: Frozen
sentence-transformers/all-MiniLM-L6-v2(384-dim) - Hyper-Convolutional Decoder: Cascaded 1D conv blocks mapping text embeddings β weight vectors
- LoRA Adapter Generation: Outputs low-rank A/B matrices for YOLOv8 detection head layers
How It Works
Text Description (e.g. "Detect license plates in traffic images")
β
Sentence-BERT Encoder β 384-dim embedding
β
Hyper-Convolutional Decoder (cascaded 1D conv blocks)
β
Flattened Weight Vector
β
Reshape into LoRA A/B matrices per detection head layer
β
Apply to frozen YOLOv8-n backbone
Files
text_to_yolo_hypernet.pyβ Core architecture (encoder + decoder)train_hypernet.pyβ Full training loop with noise augmentationsynthetic_data_generator.pyβ Generate synthetic training dataprepare_dataset.pyβ Prepare data from HF Hub fine-tuned YOLO modelsinference.pyβ Text prompt β weights β YOLO inference
Training
# Generate synthetic training data
python synthetic_data_generator.py --num_samples 500 --perturbation_scale 0.05
# Train hypernetwork
python train_hypernet.py --epochs 200 --batch_size 8 --lr 1e-4
Inference
from text_to_yolo_hypernet import Config, TextEncoder, HyperWeightDecoder
config = Config()
encoder = TextEncoder(config.text_encoder_model)
decoder = HyperWeightDecoder(config, layer_shapes)
adapters = generate_yolo_weights("Detect cars and pedestrians", decoder, encoder, config)
Research Background
- DnD: Drag-and-Drop LLMs β cascaded hyper-convolutional decoder for textβweights
- p-diff: Neural Network Diffusion β noise augmentation in weight space
- D2NWG: Diffusion-Based Neural Network Weights Generation β dataset-conditioned weight diffusion
License
AGPL-3.0 (same as Ultralytics YOLO)
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "mabbam/text-to-yolo-weights-hypernet"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support