Text-to-YOLO-Weights Hypernetwork

A hypernetwork that takes a text description of a computer vision detection task and generates YOLO detector weights (LoRA-style adapters) in a single forward pass.

Architecture

Based on Drag-and-Drop LLMs (DnD) and Neural Network Diffusion (p-diff):

Text Encoder: Frozen sentence-transformers/all-MiniLM-L6-v2 (384-dim)
Hyper-Convolutional Decoder: Cascaded 1D conv blocks mapping text embeddings → weight vectors
LoRA Adapter Generation: Outputs low-rank A/B matrices for YOLOv8 detection head layers

How It Works

Text Description (e.g. "Detect license plates in traffic images")
    ↓
Sentence-BERT Encoder → 384-dim embedding
    ↓
Hyper-Convolutional Decoder (cascaded 1D conv blocks)
    ↓
Flattened Weight Vector
    ↓
Reshape into LoRA A/B matrices per detection head layer
    ↓
Apply to frozen YOLOv8-n backbone

Files

text_to_yolo_hypernet.py — Core architecture (encoder + decoder)
train_hypernet.py — Full training loop with noise augmentation
synthetic_data_generator.py — Generate synthetic training data
prepare_dataset.py — Prepare data from HF Hub fine-tuned YOLO models
inference.py — Text prompt → weights → YOLO inference

Training

# Generate synthetic training data
python synthetic_data_generator.py --num_samples 500 --perturbation_scale 0.05

# Train hypernetwork
python train_hypernet.py --epochs 200 --batch_size 8 --lr 1e-4

Inference

from text_to_yolo_hypernet import Config, TextEncoder, HyperWeightDecoder

config = Config()
encoder = TextEncoder(config.text_encoder_model)
decoder = HyperWeightDecoder(config, layer_shapes)

adapters = generate_yolo_weights("Detect cars and pedestrians", decoder, encoder, config)

Research Background

DnD: Drag-and-Drop LLMs — cascaded hyper-convolutional decoder for text→weights
p-diff: Neural Network Diffusion — noise augmentation in weight space
D2NWG: Diffusion-Based Neural Network Weights Generation — dataset-conditioned weight diffusion

License

AGPL-3.0 (same as Ultralytics YOLO)

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mabbam/text-to-yolo-weights-hypernet"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support