creativity-slm / README.md
bdeepakreddy's picture
Upload README.md
28efe3f verified
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- creativity
- cross-domain-analogy
- cognitive-architecture
- knowledge-distillation
- qlora
- qwen2
datasets:
- custom
base_model: Qwen/Qwen2.5-1.5B-Instruct
pipeline_tag: text-generation
model-index:
- name: CreativitySLM
results:
- task:
type: text-generation
name: Creative Reasoning
metrics:
- name: Structural Validity
type: accuracy
value: 96.1
- name: Average Latency
type: latency
value: 2.38
unit: seconds
---
# CreativitySLM
**A 1.5B parameter language model fine-tuned to think creatively through cross-domain analogy, constraint violation, and novelty-coherence optimization.**
CreativitySLM is not a general-purpose LLM. It is a specialized model that has learned *creative cognitive patterns* — the structural operations underlying creative ideation — through distillation from a frontier model.
## Key Results
| Metric | Value |
|--------|-------|
| Structural Validity | **96.1%** on held-out test set |
| Average Latency | **2.38s** on A10G GPU |
| End-to-End Pipeline | **11.8s** for full 10-layer creative pipeline |
| Training Data | **764 examples** across 5 sub-tasks |
| Training Time | **2 min 19 sec** on A100-80GB |
| Training Cost | **$11.50 total** |
| Trainable Parameters | **73.9M** (4.57% of 1.62B) |
## What Makes This Different
Standard LLMs treat creativity as an incidental capability. CreativitySLM treats it as a **learnable cognitive pattern**.
The model was trained on 5 structured sub-tasks derived from a 10-layer cognitive architecture:
1. **Domain Detection & Query Generation** — Identify the domain and generate diverse search queries, including deliberately *distant* domains
2. **Pattern Extraction, Abstraction & Analogy** — Extract structural patterns, identify universal principles, generate cross-domain analogies
3. **Constraint Violation** — Identify domain conventions and purposefully invert them
4. **Reasoning & Taste Evaluation** — Score ideas on validity, surprise, familiarity balance, emotional resonance, internal consistency
5. **Creative Expression** — Synthesize insights into compelling natural language with explicit cross-domain attribution
## The Ten-Layer Architecture
```
[User Prompt]
|
L10: Input/Output (parse prompt, detect domain)
|
L1: Data (live retrieval via Tavily API)
|
L2+L3+L4: Pattern Recognition + Abstraction + Cross-Domain Analogy [Model Call 1]
|
L5: Constraint Violation [Model Call 2]
|
L6: Novelty Detection (novelty x coherence scoring)
|
L7+L8: Reasoning + Taste Evaluation [Model Call 3]
| |
| (backtrack to L2-4 if invalid)
|
L9: Language Expression [Model Call 4]
|
[Creative Output]
```
## Example Output
**Prompt**: "How can I build an AI model that replicates the human brain?"
**CreativitySLM produces**: *"The Forest Mind: How Nature's Self-Organization Can Rebuild AI"*
> The model draws an analogy between ecosystem self-organization and neural architecture design. It identifies the convention "fully supervised model training" and proposes its inversion: autonomous self-organizing clusters that emerge from edge-to-edge connectivity, like a forest growing itself rather than being engineered.
> *"Stop trying to engineer the forest, and start letting it engineer itself."*
This demonstrates cross-domain transfer (ecology → AI), purposeful constraint violation (breaking the "design everything" convention), and coherent creative expression.
## Training Details
- **Base Model**: Qwen2.5-1.5B-Instruct
- **Method**: QLoRA (4-bit NF4, rank 64, alpha 128)
- **Target Modules**: All attention (q, k, v, o) + MLP (gate, up, down)
- **Data**: 764 examples distilled from Claude Sonnet across 153 creative prompts spanning 12 domains
- **Split**: 612 train / 76 val / 76 test
- **Epochs**: 3 (cosine LR, peak 2e-4, 10% warmup)
- **Hardware**: Single NVIDIA A100-80GB
- **Training Time**: 2 minutes 19 seconds
### Training Loss
| Epoch | Train Loss | Eval Loss |
|-------|-----------|-----------|
| 1 | 2.263 | 2.020 |
| 2 | 1.720 | 1.772 |
| 3 | 1.930 | 1.744 |
## Per-Task Performance
| Task | N | Accuracy | Avg Latency |
|------|---|----------|-------------|
| Domain & Queries | 23 | 95.7% | 0.62s |
| Pattern/Abstraction/Analogy | 13 | 84.6% | 2.99s |
| Constraint Violation | 10 | 100% | 2.28s |
| Reasoning & Taste | 13 | 100% | 3.20s |
| Creative Expression | 17 | 100% | 3.74s |
| **Overall** | **76** | **96.1%** | **2.38s** |
## What Fine-tuning Teaches
The fine-tuning does **not** add new knowledge. The base Qwen model already knows about ecology, neuroscience, architecture, etc. What the fine-tuning adds is a **cognitive routine**:
1. Seek connections to *distant* domains
2. Extract *structural* relationships, not facts
3. Identify conventions and propose their inversions
4. Score ideas on a multi-dimensional quality metric
5. Express insights with explicit cross-domain attribution
We verified this by comparing base Qwen vs. CreativitySLM on identical prompts. The base model produces generic informational responses. The fine-tuned model produces structured cross-domain analogies with novel connections.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("bdeepakreddy/creativity-slm")
tokenizer = AutoTokenizer.from_pretrained("bdeepakreddy/creativity-slm")
messages = [
{"role": "system", "content": "You are a creative domain analyst..."},
{"role": "user", "content": "Analyze this creative prompt: 'How can music theory inspire new programming languages?'"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Tech Stack
| Component | Technology |
|-----------|------------|
| Base Model | Qwen2.5-1.5B-Instruct |
| Fine-tuning | QLoRA (bitsandbytes, peft, trl) |
| Training Platform | Modal.com (A100-80GB) |
| Inference | vLLM on Modal.com (A10G) |
| Frontend | Next.js 15 + Tailwind + shadcn/ui |
| Backend | Supabase + Drizzle ORM |
| Search | Tavily API |
| Embeddings | text-embedding-3-large |
## Citation
```bibtex
@article{bandi2026creativityslm,
title={Teaching Small Language Models to Think Creatively: A Multi-Task Cognitive Architecture for Cross-Domain Analogy Generation},
author={Bandi, Deepak},
year={2026},
note={University of Waterloo}
}
```
## Paper
The full research paper is available in the `paper/` directory of the repository.
## License
Apache 2.0
## Author
**Deepak Bandi** — University of Waterloo — research@fr1.ai