--- license: apache-2.0 tags: - parameter-generation - diffusion - personalization - text-to-model - neural-network-diffusion - image-classification datasets: - cifar100 language: - en pipeline_tag: other --- # Tina: Text-to-Model Generative AI (CIFAR-100, CNN) **Tina** is a text-conditioned neural network diffusion model that generates personalized image classifiers from natural language prompts. Given a text description of the desired classification task (e.g., a list of class names), Tina directly outputs the full parameters of a lightweight CNN — no gradient-based training required at inference time. This checkpoint is the Tina model trained on **CIFAR-100**, capable of generating **10-class personalized CNN classifiers** (~5K parameters) from text prompts. ## Model Description | Property | Value | |---|---| | **Architecture** | Diffusion Transformer (DiT), GPT-2 style backbone | | **Text Encoder** | CLIP ViT-B/32 (frozen) | | **Hidden Size** | 2048 | | **Transformer Layers** | 12 encoder layers + 12 decoder layers | | **Attention Heads** | 16 | | **Diffusion Steps** | 1000 (DDPM sampling) | | **Prediction Type** | Signal prediction (x₀) | | **Generated Model** | 2-layer CNN, ~5K parameters | | **Max Classification Classes** | 10 | | **Training p-Models** | 1000 personalized models | | **Training Dataset** | CIFAR-100 (100 classes, 32×32 images) | ## How It Works Tina treats model generation as a conditional diffusion process — analogous to how text-to-image diffusion models denoise random pixels into coherent images, Tina denoises random vectors into functional neural network parameters. 1. **Training**: Tina is trained on (task description, personalized model) pairs. Each personalized model is a CNN fine-tuned on a specific 10-class subset of CIFAR-100. 2. **Inference**: Given a text prompt listing the desired classes (e.g., `["apple", "bear", "bicycle", "bus", "castle", "clock", "cloud", "forest", "mountain", "train"]`), Tina generates a complete CNN classifier in a single forward pass through 1000 denoising steps. Thanks to the vision-language alignment of CLIP, Tina also supports: - **Image prompts**: Zero-shot and few-shot image-prompted generation - **Natural language descriptions**: Using class descriptions instead of class names - **Unseen classes**: Generalization to classes not seen during training - **Variable class counts**: Any number of classes up to 10 via classification sequence padding ## Intended Use - **On-demand personalized classification**: Quickly generate a lightweight classifier tailored to a user's specific needs without any training data or GPU-intensive fine-tuning. - **Edge AI deployment**: The generated CNN (~5K params) is extremely lightweight, suitable for resource-constrained devices. - **Research on text-to-model generation**: Exploring the paradigm of generating functional AI models from natural language. ## Performance ### Main Results on CIFAR-100 (10-class personalization) | Method | In-Distribution | Out-of-Distribution | |---|---|---| | Generic Model | 28.72 | 29.88 | | Classifier Selection | 64.83 | 64.15 | | TAPER | 67.71 | 66.85 | | **Tina (this model)** | **68.35** | **67.14** | ### Inference Efficiency | Method | Time per model (CNN) | |---|---| | Pretrain + fine-tune | 94.35s | | TAPER | 18.10s | | **Tina** | **4.88s** | ## Limitations - This checkpoint generates **CNN classifiers only** (2-layer, ~5K parameters) for **CIFAR-100** class subsets. - Input images are expected to be 32×32 resolution. - A single Tina cannot generate models across different architectures or modalities simultaneously. - Performance on entirely out-of-domain classes (beyond CIFAR-100 semantic scope) may degrade. ## Links - **Code**: [https://github.com/aoliliao/Tina](https://github.com/aoliliao/Tina)