--- language: - en license: apache-2.0 library_name: transformers tags: - creativity - cross-domain-analogy - cognitive-architecture - knowledge-distillation - qlora - qwen2 datasets: - custom base_model: Qwen/Qwen2.5-1.5B-Instruct pipeline_tag: text-generation model-index: - name: CreativitySLM results: - task: type: text-generation name: Creative Reasoning metrics: - name: Structural Validity type: accuracy value: 96.1 - name: Average Latency type: latency value: 2.38 unit: seconds --- # CreativitySLM **A 1.5B parameter language model fine-tuned to think creatively through cross-domain analogy, constraint violation, and novelty-coherence optimization.** CreativitySLM is not a general-purpose LLM. It is a specialized model that has learned *creative cognitive patterns* — the structural operations underlying creative ideation — through distillation from a frontier model. ## Key Results | Metric | Value | |--------|-------| | Structural Validity | **96.1%** on held-out test set | | Average Latency | **2.38s** on A10G GPU | | End-to-End Pipeline | **11.8s** for full 10-layer creative pipeline | | Training Data | **764 examples** across 5 sub-tasks | | Training Time | **2 min 19 sec** on A100-80GB | | Training Cost | **$11.50 total** | | Trainable Parameters | **73.9M** (4.57% of 1.62B) | ## What Makes This Different Standard LLMs treat creativity as an incidental capability. CreativitySLM treats it as a **learnable cognitive pattern**. The model was trained on 5 structured sub-tasks derived from a 10-layer cognitive architecture: 1. **Domain Detection & Query Generation** — Identify the domain and generate diverse search queries, including deliberately *distant* domains 2. **Pattern Extraction, Abstraction & Analogy** — Extract structural patterns, identify universal principles, generate cross-domain analogies 3. **Constraint Violation** — Identify domain conventions and purposefully invert them 4. **Reasoning & Taste Evaluation** — Score ideas on validity, surprise, familiarity balance, emotional resonance, internal consistency 5. **Creative Expression** — Synthesize insights into compelling natural language with explicit cross-domain attribution ## The Ten-Layer Architecture ``` [User Prompt] | L10: Input/Output (parse prompt, detect domain) | L1: Data (live retrieval via Tavily API) | L2+L3+L4: Pattern Recognition + Abstraction + Cross-Domain Analogy [Model Call 1] | L5: Constraint Violation [Model Call 2] | L6: Novelty Detection (novelty x coherence scoring) | L7+L8: Reasoning + Taste Evaluation [Model Call 3] | | | (backtrack to L2-4 if invalid) | L9: Language Expression [Model Call 4] | [Creative Output] ``` ## Example Output **Prompt**: "How can I build an AI model that replicates the human brain?" **CreativitySLM produces**: *"The Forest Mind: How Nature's Self-Organization Can Rebuild AI"* > The model draws an analogy between ecosystem self-organization and neural architecture design. It identifies the convention "fully supervised model training" and proposes its inversion: autonomous self-organizing clusters that emerge from edge-to-edge connectivity, like a forest growing itself rather than being engineered. > *"Stop trying to engineer the forest, and start letting it engineer itself."* This demonstrates cross-domain transfer (ecology → AI), purposeful constraint violation (breaking the "design everything" convention), and coherent creative expression. ## Training Details - **Base Model**: Qwen2.5-1.5B-Instruct - **Method**: QLoRA (4-bit NF4, rank 64, alpha 128) - **Target Modules**: All attention (q, k, v, o) + MLP (gate, up, down) - **Data**: 764 examples distilled from Claude Sonnet across 153 creative prompts spanning 12 domains - **Split**: 612 train / 76 val / 76 test - **Epochs**: 3 (cosine LR, peak 2e-4, 10% warmup) - **Hardware**: Single NVIDIA A100-80GB - **Training Time**: 2 minutes 19 seconds ### Training Loss | Epoch | Train Loss | Eval Loss | |-------|-----------|-----------| | 1 | 2.263 | 2.020 | | 2 | 1.720 | 1.772 | | 3 | 1.930 | 1.744 | ## Per-Task Performance | Task | N | Accuracy | Avg Latency | |------|---|----------|-------------| | Domain & Queries | 23 | 95.7% | 0.62s | | Pattern/Abstraction/Analogy | 13 | 84.6% | 2.99s | | Constraint Violation | 10 | 100% | 2.28s | | Reasoning & Taste | 13 | 100% | 3.20s | | Creative Expression | 17 | 100% | 3.74s | | **Overall** | **76** | **96.1%** | **2.38s** | ## What Fine-tuning Teaches The fine-tuning does **not** add new knowledge. The base Qwen model already knows about ecology, neuroscience, architecture, etc. What the fine-tuning adds is a **cognitive routine**: 1. Seek connections to *distant* domains 2. Extract *structural* relationships, not facts 3. Identify conventions and propose their inversions 4. Score ideas on a multi-dimensional quality metric 5. Express insights with explicit cross-domain attribution We verified this by comparing base Qwen vs. CreativitySLM on identical prompts. The base model produces generic informational responses. The fine-tuned model produces structured cross-domain analogies with novel connections. ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("bdeepakreddy/creativity-slm") tokenizer = AutoTokenizer.from_pretrained("bdeepakreddy/creativity-slm") messages = [ {"role": "system", "content": "You are a creative domain analyst..."}, {"role": "user", "content": "Analyze this creative prompt: 'How can music theory inspire new programming languages?'"} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Tech Stack | Component | Technology | |-----------|------------| | Base Model | Qwen2.5-1.5B-Instruct | | Fine-tuning | QLoRA (bitsandbytes, peft, trl) | | Training Platform | Modal.com (A100-80GB) | | Inference | vLLM on Modal.com (A10G) | | Frontend | Next.js 15 + Tailwind + shadcn/ui | | Backend | Supabase + Drizzle ORM | | Search | Tavily API | | Embeddings | text-embedding-3-large | ## Citation ```bibtex @article{bandi2026creativityslm, title={Teaching Small Language Models to Think Creatively: A Multi-Task Cognitive Architecture for Cross-Domain Analogy Generation}, author={Bandi, Deepak}, year={2026}, note={University of Waterloo} } ``` ## Paper The full research paper is available in the `paper/` directory of the repository. ## License Apache 2.0 ## Author **Deepak Bandi** — University of Waterloo — research@fr1.ai