---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- creativity
- cross-domain-analogy
- cognitive-architecture
- knowledge-distillation
- qlora
- qwen2
datasets:
- custom
base_model: Qwen/Qwen2.5-1.5B-Instruct
pipeline_tag: text-generation
model-index:
- name: CreativitySLM
  results:
  - task:
      type: text-generation
      name: Creative Reasoning
    metrics:
    - name: Structural Validity
      type: accuracy
      value: 96.1
    - name: Average Latency
      type: latency
      value: 2.38
      unit: seconds
---

# CreativitySLM

**A 1.5B parameter language model fine-tuned to think creatively through cross-domain analogy, constraint violation, and novelty-coherence optimization.**

CreativitySLM is not a general-purpose LLM. It is a specialized model that has learned *creative cognitive patterns* — the structural operations underlying creative ideation — through distillation from a frontier model.

## Key Results

| Metric | Value |
|--------|-------|
| Structural Validity | **96.1%** on held-out test set |
| Average Latency | **2.38s** on A10G GPU |
| End-to-End Pipeline | **11.8s** for full 10-layer creative pipeline |
| Training Data | **764 examples** across 5 sub-tasks |
| Training Time | **2 min 19 sec** on A100-80GB |
| Training Cost | **$11.50 total** |
| Trainable Parameters | **73.9M** (4.57% of 1.62B) |

## What Makes This Different

Standard LLMs treat creativity as an incidental capability. CreativitySLM treats it as a **learnable cognitive pattern**.

The model was trained on 5 structured sub-tasks derived from a 10-layer cognitive architecture:

1. **Domain Detection & Query Generation** — Identify the domain and generate diverse search queries, including deliberately *distant* domains
2. **Pattern Extraction, Abstraction & Analogy** — Extract structural patterns, identify universal principles, generate cross-domain analogies
3. **Constraint Violation** — Identify domain conventions and purposefully invert them
4. **Reasoning & Taste Evaluation** — Score ideas on validity, surprise, familiarity balance, emotional resonance, internal consistency
5. **Creative Expression** — Synthesize insights into compelling natural language with explicit cross-domain attribution

## The Ten-Layer Architecture

```
[User Prompt]
     |
L10: Input/Output (parse prompt, detect domain)
     |
L1:  Data (live retrieval via Tavily API)
     |
L2+L3+L4: Pattern Recognition + Abstraction + Cross-Domain Analogy  [Model Call 1]
     |
L5:  Constraint Violation  [Model Call 2]
     |
L6:  Novelty Detection (novelty x coherence scoring)
     |
L7+L8: Reasoning + Taste Evaluation  [Model Call 3]
     |                |
     |    (backtrack to L2-4 if invalid)
     |
L9:  Language Expression  [Model Call 4]
     |
[Creative Output]
```

## Example Output

**Prompt**: "How can I build an AI model that replicates the human brain?"

**CreativitySLM produces**: *"The Forest Mind: How Nature's Self-Organization Can Rebuild AI"*

> The model draws an analogy between ecosystem self-organization and neural architecture design. It identifies the convention "fully supervised model training" and proposes its inversion: autonomous self-organizing clusters that emerge from edge-to-edge connectivity, like a forest growing itself rather than being engineered.

> *"Stop trying to engineer the forest, and start letting it engineer itself."*

This demonstrates cross-domain transfer (ecology → AI), purposeful constraint violation (breaking the "design everything" convention), and coherent creative expression.

## Training Details

- **Base Model**: Qwen2.5-1.5B-Instruct
- **Method**: QLoRA (4-bit NF4, rank 64, alpha 128)
- **Target Modules**: All attention (q, k, v, o) + MLP (gate, up, down)
- **Data**: 764 examples distilled from Claude Sonnet across 153 creative prompts spanning 12 domains
- **Split**: 612 train / 76 val / 76 test
- **Epochs**: 3 (cosine LR, peak 2e-4, 10% warmup)
- **Hardware**: Single NVIDIA A100-80GB
- **Training Time**: 2 minutes 19 seconds

### Training Loss

| Epoch | Train Loss | Eval Loss |
|-------|-----------|-----------|
| 1 | 2.263 | 2.020 |
| 2 | 1.720 | 1.772 |
| 3 | 1.930 | 1.744 |

## Per-Task Performance

| Task | N | Accuracy | Avg Latency |
|------|---|----------|-------------|
| Domain & Queries | 23 | 95.7% | 0.62s |
| Pattern/Abstraction/Analogy | 13 | 84.6% | 2.99s |
| Constraint Violation | 10 | 100% | 2.28s |
| Reasoning & Taste | 13 | 100% | 3.20s |
| Creative Expression | 17 | 100% | 3.74s |
| **Overall** | **76** | **96.1%** | **2.38s** |

## What Fine-tuning Teaches

The fine-tuning does **not** add new knowledge. The base Qwen model already knows about ecology, neuroscience, architecture, etc. What the fine-tuning adds is a **cognitive routine**:

1. Seek connections to *distant* domains
2. Extract *structural* relationships, not facts
3. Identify conventions and propose their inversions
4. Score ideas on a multi-dimensional quality metric
5. Express insights with explicit cross-domain attribution

We verified this by comparing base Qwen vs. CreativitySLM on identical prompts. The base model produces generic informational responses. The fine-tuned model produces structured cross-domain analogies with novel connections.

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("bdeepakreddy/creativity-slm")
tokenizer = AutoTokenizer.from_pretrained("bdeepakreddy/creativity-slm")

messages = [
    {"role": "system", "content": "You are a creative domain analyst..."},
    {"role": "user", "content": "Analyze this creative prompt: 'How can music theory inspire new programming languages?'"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Tech Stack

| Component | Technology |
|-----------|------------|
| Base Model | Qwen2.5-1.5B-Instruct |
| Fine-tuning | QLoRA (bitsandbytes, peft, trl) |
| Training Platform | Modal.com (A100-80GB) |
| Inference | vLLM on Modal.com (A10G) |
| Frontend | Next.js 15 + Tailwind + shadcn/ui |
| Backend | Supabase + Drizzle ORM |
| Search | Tavily API |
| Embeddings | text-embedding-3-large |

## Citation

```bibtex
@article{bandi2026creativityslm,
  title={Teaching Small Language Models to Think Creatively: A Multi-Task Cognitive Architecture for Cross-Domain Analogy Generation},
  author={Bandi, Deepak},
  year={2026},
  note={University of Waterloo}
}
```

## Paper

The full research paper is available in the `paper/` directory of the repository.

## License

Apache 2.0

## Author

**Deepak Bandi** — University of Waterloo — research@fr1.ai