How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="bdeepakreddy/creativity-slm")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("bdeepakreddy/creativity-slm")
model = AutoModelForCausalLM.from_pretrained("bdeepakreddy/creativity-slm")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links

CreativitySLM

A 1.5B parameter language model fine-tuned to think creatively through cross-domain analogy, constraint violation, and novelty-coherence optimization.

CreativitySLM is not a general-purpose LLM. It is a specialized model that has learned creative cognitive patterns — the structural operations underlying creative ideation — through distillation from a frontier model.

Key Results

Metric Value
Structural Validity 96.1% on held-out test set
Average Latency 2.38s on A10G GPU
End-to-End Pipeline 11.8s for full 10-layer creative pipeline
Training Data 764 examples across 5 sub-tasks
Training Time 2 min 19 sec on A100-80GB
Training Cost $11.50 total
Trainable Parameters 73.9M (4.57% of 1.62B)

What Makes This Different

Standard LLMs treat creativity as an incidental capability. CreativitySLM treats it as a learnable cognitive pattern.

The model was trained on 5 structured sub-tasks derived from a 10-layer cognitive architecture:

  1. Domain Detection & Query Generation — Identify the domain and generate diverse search queries, including deliberately distant domains
  2. Pattern Extraction, Abstraction & Analogy — Extract structural patterns, identify universal principles, generate cross-domain analogies
  3. Constraint Violation — Identify domain conventions and purposefully invert them
  4. Reasoning & Taste Evaluation — Score ideas on validity, surprise, familiarity balance, emotional resonance, internal consistency
  5. Creative Expression — Synthesize insights into compelling natural language with explicit cross-domain attribution

The Ten-Layer Architecture

[User Prompt]
     |
L10: Input/Output (parse prompt, detect domain)
     |
L1:  Data (live retrieval via Tavily API)
     |
L2+L3+L4: Pattern Recognition + Abstraction + Cross-Domain Analogy  [Model Call 1]
     |
L5:  Constraint Violation  [Model Call 2]
     |
L6:  Novelty Detection (novelty x coherence scoring)
     |
L7+L8: Reasoning + Taste Evaluation  [Model Call 3]
     |                |
     |    (backtrack to L2-4 if invalid)
     |
L9:  Language Expression  [Model Call 4]
     |
[Creative Output]

Example Output

Prompt: "How can I build an AI model that replicates the human brain?"

CreativitySLM produces: "The Forest Mind: How Nature's Self-Organization Can Rebuild AI"

The model draws an analogy between ecosystem self-organization and neural architecture design. It identifies the convention "fully supervised model training" and proposes its inversion: autonomous self-organizing clusters that emerge from edge-to-edge connectivity, like a forest growing itself rather than being engineered.

"Stop trying to engineer the forest, and start letting it engineer itself."

This demonstrates cross-domain transfer (ecology → AI), purposeful constraint violation (breaking the "design everything" convention), and coherent creative expression.

Training Details

  • Base Model: Qwen2.5-1.5B-Instruct
  • Method: QLoRA (4-bit NF4, rank 64, alpha 128)
  • Target Modules: All attention (q, k, v, o) + MLP (gate, up, down)
  • Data: 764 examples distilled from Claude Sonnet across 153 creative prompts spanning 12 domains
  • Split: 612 train / 76 val / 76 test
  • Epochs: 3 (cosine LR, peak 2e-4, 10% warmup)
  • Hardware: Single NVIDIA A100-80GB
  • Training Time: 2 minutes 19 seconds

Training Loss

Epoch Train Loss Eval Loss
1 2.263 2.020
2 1.720 1.772
3 1.930 1.744

Per-Task Performance

Task N Accuracy Avg Latency
Domain & Queries 23 95.7% 0.62s
Pattern/Abstraction/Analogy 13 84.6% 2.99s
Constraint Violation 10 100% 2.28s
Reasoning & Taste 13 100% 3.20s
Creative Expression 17 100% 3.74s
Overall 76 96.1% 2.38s

What Fine-tuning Teaches

The fine-tuning does not add new knowledge. The base Qwen model already knows about ecology, neuroscience, architecture, etc. What the fine-tuning adds is a cognitive routine:

  1. Seek connections to distant domains
  2. Extract structural relationships, not facts
  3. Identify conventions and propose their inversions
  4. Score ideas on a multi-dimensional quality metric
  5. Express insights with explicit cross-domain attribution

We verified this by comparing base Qwen vs. CreativitySLM on identical prompts. The base model produces generic informational responses. The fine-tuned model produces structured cross-domain analogies with novel connections.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("bdeepakreddy/creativity-slm")
tokenizer = AutoTokenizer.from_pretrained("bdeepakreddy/creativity-slm")

messages = [
    {"role": "system", "content": "You are a creative domain analyst..."},
    {"role": "user", "content": "Analyze this creative prompt: 'How can music theory inspire new programming languages?'"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Tech Stack

Component Technology
Base Model Qwen2.5-1.5B-Instruct
Fine-tuning QLoRA (bitsandbytes, peft, trl)
Training Platform Modal.com (A100-80GB)
Inference vLLM on Modal.com (A10G)
Frontend Next.js 15 + Tailwind + shadcn/ui
Backend Supabase + Drizzle ORM
Search Tavily API
Embeddings text-embedding-3-large

Citation

@article{bandi2026creativityslm,
  title={Teaching Small Language Models to Think Creatively: A Multi-Task Cognitive Architecture for Cross-Domain Analogy Generation},
  author={Bandi, Deepak},
  year={2026},
  note={University of Waterloo}
}

Paper

The full research paper is available in the paper/ directory of the repository.

License

Apache 2.0

Author

Deepak Bandi — University of Waterloo — research@fr1.ai

Downloads last month
1
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bdeepakreddy/creativity-slm

Finetuned
(1550)
this model

Evaluation results