| --- |
| license: apache-2.0 |
| tags: |
| - parameter-generation |
| - diffusion |
| - personalization |
| - text-to-model |
| - neural-network-diffusion |
| - image-classification |
| datasets: |
| - cifar100 |
| language: |
| - en |
| pipeline_tag: other |
| --- |
| |
| # Tina: Text-to-Model Generative AI (CIFAR-100, CNN) |
|
|
| **Tina** is a text-conditioned neural network diffusion model that generates personalized image classifiers from natural language prompts. Given a text description of the desired classification task (e.g., a list of class names), Tina directly outputs the full parameters of a lightweight CNN — no gradient-based training required at inference time. |
|
|
| This checkpoint is the Tina model trained on **CIFAR-100**, capable of generating **10-class personalized CNN classifiers** (~5K parameters) from text prompts. |
|
|
| ## Model Description |
|
|
| | Property | Value | |
| |---|---| |
| | **Architecture** | Diffusion Transformer (DiT), GPT-2 style backbone | |
| | **Text Encoder** | CLIP ViT-B/32 (frozen) | |
| | **Hidden Size** | 2048 | |
| | **Transformer Layers** | 12 encoder layers + 12 decoder layers | |
| | **Attention Heads** | 16 | |
| | **Diffusion Steps** | 1000 (DDPM sampling) | |
| | **Prediction Type** | Signal prediction (x₀) | |
| | **Generated Model** | 2-layer CNN, ~5K parameters | |
| | **Max Classification Classes** | 10 | |
| | **Training p-Models** | 1000 personalized models | |
| | **Training Dataset** | CIFAR-100 (100 classes, 32×32 images) | |
|
|
| ## How It Works |
|
|
| Tina treats model generation as a conditional diffusion process — analogous to how text-to-image diffusion models denoise random pixels into coherent images, Tina denoises random vectors into functional neural network parameters. |
|
|
| 1. **Training**: Tina is trained on (task description, personalized model) pairs. Each personalized model is a CNN fine-tuned on a specific 10-class subset of CIFAR-100. |
| 2. **Inference**: Given a text prompt listing the desired classes (e.g., `["apple", "bear", "bicycle", "bus", "castle", "clock", "cloud", "forest", "mountain", "train"]`), Tina generates a complete CNN classifier in a single forward pass through 1000 denoising steps. |
|
|
| Thanks to the vision-language alignment of CLIP, Tina also supports: |
| - **Image prompts**: Zero-shot and few-shot image-prompted generation |
| - **Natural language descriptions**: Using class descriptions instead of class names |
| - **Unseen classes**: Generalization to classes not seen during training |
| - **Variable class counts**: Any number of classes up to 10 via classification sequence padding |
|
|
| ## Intended Use |
|
|
| - **On-demand personalized classification**: Quickly generate a lightweight classifier tailored to a user's specific needs without any training data or GPU-intensive fine-tuning. |
| - **Edge AI deployment**: The generated CNN (~5K params) is extremely lightweight, suitable for resource-constrained devices. |
| - **Research on text-to-model generation**: Exploring the paradigm of generating functional AI models from natural language. |
|
|
| ## Performance |
|
|
| ### Main Results on CIFAR-100 (10-class personalization) |
|
|
| | Method | In-Distribution | Out-of-Distribution | |
| |---|---|---| |
| | Generic Model | 28.72 | 29.88 | |
| | Classifier Selection | 64.83 | 64.15 | |
| | TAPER | 67.71 | 66.85 | |
| | **Tina (this model)** | **68.35** | **67.14** | |
|
|
| ### Inference Efficiency |
|
|
| | Method | Time per model (CNN) | |
| |---|---| |
| | Pretrain + fine-tune | 94.35s | |
| | TAPER | 18.10s | |
| | **Tina** | **4.88s** | |
|
|
| ## Limitations |
|
|
| - This checkpoint generates **CNN classifiers only** (2-layer, ~5K parameters) for **CIFAR-100** class subsets. |
| - Input images are expected to be 32×32 resolution. |
| - A single Tina cannot generate models across different architectures or modalities simultaneously. |
| - Performance on entirely out-of-domain classes (beyond CIFAR-100 semantic scope) may degrade. |
|
|
| <!-- ## Citation |
|
|
| If you use this model, please cite our paper: |
|
|
| ```bibtex |
| @article{li2026tina, |
| title={Tina: A Diffusion Neural Network for Generating Personalized AI Models from Text Prompts}, |
| author={Li, Zexi and Gao, Lingzhi and Cai, Dongqi and Lane, Nicholas D. and Wu, Chao}, |
| journal={Patterns}, |
| year={2026}, |
| publisher={Cell Press} |
| } |
| ``` --> |
|
|
| ## Links |
|
|
| <!-- - **Paper**: *Patterns* (Cell Press), 2026 --> |
| - **Code**: [https://github.com/aoliliao/Tina](https://github.com/aoliliao/Tina) |
| <!-- - **Zenodo Archive**: [https://doi.org/10.5281/zenodo.19062137](https://doi.org/10.5281/zenodo.19062137) --> |
|
|