Text Generation
Transformers
Safetensors
English
qwen2
creativity
cross-domain-analogy
cognitive-architecture
knowledge-distillation
qlora
conversational
Eval Results (legacy)
text-generation-inference
Instructions to use bdeepakreddy/creativity-slm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bdeepakreddy/creativity-slm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bdeepakreddy/creativity-slm") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bdeepakreddy/creativity-slm") model = AutoModelForCausalLM.from_pretrained("bdeepakreddy/creativity-slm") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bdeepakreddy/creativity-slm with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bdeepakreddy/creativity-slm" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bdeepakreddy/creativity-slm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/bdeepakreddy/creativity-slm
- SGLang
How to use bdeepakreddy/creativity-slm with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bdeepakreddy/creativity-slm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bdeepakreddy/creativity-slm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bdeepakreddy/creativity-slm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bdeepakreddy/creativity-slm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use bdeepakreddy/creativity-slm with Docker Model Runner:
docker model run hf.co/bdeepakreddy/creativity-slm
Upload README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,194 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
library_name: transformers
|
| 6 |
+
tags:
|
| 7 |
+
- creativity
|
| 8 |
+
- cross-domain-analogy
|
| 9 |
+
- cognitive-architecture
|
| 10 |
+
- knowledge-distillation
|
| 11 |
+
- qlora
|
| 12 |
+
- qwen2
|
| 13 |
+
datasets:
|
| 14 |
+
- custom
|
| 15 |
+
base_model: Qwen/Qwen2.5-1.5B-Instruct
|
| 16 |
+
pipeline_tag: text-generation
|
| 17 |
+
model-index:
|
| 18 |
+
- name: CreativitySLM
|
| 19 |
+
results:
|
| 20 |
+
- task:
|
| 21 |
+
type: text-generation
|
| 22 |
+
name: Creative Reasoning
|
| 23 |
+
metrics:
|
| 24 |
+
- name: Structural Validity
|
| 25 |
+
type: accuracy
|
| 26 |
+
value: 96.1
|
| 27 |
+
- name: Average Latency
|
| 28 |
+
type: latency
|
| 29 |
+
value: 2.38
|
| 30 |
+
unit: seconds
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
# CreativitySLM
|
| 34 |
+
|
| 35 |
+
**A 1.5B parameter language model fine-tuned to think creatively through cross-domain analogy, constraint violation, and novelty-coherence optimization.**
|
| 36 |
+
|
| 37 |
+
CreativitySLM is not a general-purpose LLM. It is a specialized model that has learned *creative cognitive patterns* β the structural operations underlying creative ideation β through distillation from a frontier model.
|
| 38 |
+
|
| 39 |
+
## Key Results
|
| 40 |
+
|
| 41 |
+
| Metric | Value |
|
| 42 |
+
|--------|-------|
|
| 43 |
+
| Structural Validity | **96.1%** on held-out test set |
|
| 44 |
+
| Average Latency | **2.38s** on A10G GPU |
|
| 45 |
+
| End-to-End Pipeline | **11.8s** for full 10-layer creative pipeline |
|
| 46 |
+
| Training Data | **764 examples** across 5 sub-tasks |
|
| 47 |
+
| Training Time | **2 min 19 sec** on A100-80GB |
|
| 48 |
+
| Training Cost | **$11.50 total** |
|
| 49 |
+
| Trainable Parameters | **73.9M** (4.57% of 1.62B) |
|
| 50 |
+
|
| 51 |
+
## What Makes This Different
|
| 52 |
+
|
| 53 |
+
Standard LLMs treat creativity as an incidental capability. CreativitySLM treats it as a **learnable cognitive pattern**.
|
| 54 |
+
|
| 55 |
+
The model was trained on 5 structured sub-tasks derived from a 10-layer cognitive architecture:
|
| 56 |
+
|
| 57 |
+
1. **Domain Detection & Query Generation** β Identify the domain and generate diverse search queries, including deliberately *distant* domains
|
| 58 |
+
2. **Pattern Extraction, Abstraction & Analogy** β Extract structural patterns, identify universal principles, generate cross-domain analogies
|
| 59 |
+
3. **Constraint Violation** β Identify domain conventions and purposefully invert them
|
| 60 |
+
4. **Reasoning & Taste Evaluation** β Score ideas on validity, surprise, familiarity balance, emotional resonance, internal consistency
|
| 61 |
+
5. **Creative Expression** β Synthesize insights into compelling natural language with explicit cross-domain attribution
|
| 62 |
+
|
| 63 |
+
## The Ten-Layer Architecture
|
| 64 |
+
|
| 65 |
+
```
|
| 66 |
+
[User Prompt]
|
| 67 |
+
|
|
| 68 |
+
L10: Input/Output (parse prompt, detect domain)
|
| 69 |
+
|
|
| 70 |
+
L1: Data (live retrieval via Tavily API)
|
| 71 |
+
|
|
| 72 |
+
L2+L3+L4: Pattern Recognition + Abstraction + Cross-Domain Analogy [Model Call 1]
|
| 73 |
+
|
|
| 74 |
+
L5: Constraint Violation [Model Call 2]
|
| 75 |
+
|
|
| 76 |
+
L6: Novelty Detection (novelty x coherence scoring)
|
| 77 |
+
|
|
| 78 |
+
L7+L8: Reasoning + Taste Evaluation [Model Call 3]
|
| 79 |
+
| |
|
| 80 |
+
| (backtrack to L2-4 if invalid)
|
| 81 |
+
|
|
| 82 |
+
L9: Language Expression [Model Call 4]
|
| 83 |
+
|
|
| 84 |
+
[Creative Output]
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
## Example Output
|
| 88 |
+
|
| 89 |
+
**Prompt**: "How can I build an AI model that replicates the human brain?"
|
| 90 |
+
|
| 91 |
+
**CreativitySLM produces**: *"The Forest Mind: How Nature's Self-Organization Can Rebuild AI"*
|
| 92 |
+
|
| 93 |
+
> The model draws an analogy between ecosystem self-organization and neural architecture design. It identifies the convention "fully supervised model training" and proposes its inversion: autonomous self-organizing clusters that emerge from edge-to-edge connectivity, like a forest growing itself rather than being engineered.
|
| 94 |
+
|
| 95 |
+
> *"Stop trying to engineer the forest, and start letting it engineer itself."*
|
| 96 |
+
|
| 97 |
+
This demonstrates cross-domain transfer (ecology β AI), purposeful constraint violation (breaking the "design everything" convention), and coherent creative expression.
|
| 98 |
+
|
| 99 |
+
## Training Details
|
| 100 |
+
|
| 101 |
+
- **Base Model**: Qwen2.5-1.5B-Instruct
|
| 102 |
+
- **Method**: QLoRA (4-bit NF4, rank 64, alpha 128)
|
| 103 |
+
- **Target Modules**: All attention (q, k, v, o) + MLP (gate, up, down)
|
| 104 |
+
- **Data**: 764 examples distilled from Claude Sonnet across 153 creative prompts spanning 12 domains
|
| 105 |
+
- **Split**: 612 train / 76 val / 76 test
|
| 106 |
+
- **Epochs**: 3 (cosine LR, peak 2e-4, 10% warmup)
|
| 107 |
+
- **Hardware**: Single NVIDIA A100-80GB
|
| 108 |
+
- **Training Time**: 2 minutes 19 seconds
|
| 109 |
+
|
| 110 |
+
### Training Loss
|
| 111 |
+
|
| 112 |
+
| Epoch | Train Loss | Eval Loss |
|
| 113 |
+
|-------|-----------|-----------|
|
| 114 |
+
| 1 | 2.263 | 2.020 |
|
| 115 |
+
| 2 | 1.720 | 1.772 |
|
| 116 |
+
| 3 | 1.930 | 1.744 |
|
| 117 |
+
|
| 118 |
+
## Per-Task Performance
|
| 119 |
+
|
| 120 |
+
| Task | N | Accuracy | Avg Latency |
|
| 121 |
+
|------|---|----------|-------------|
|
| 122 |
+
| Domain & Queries | 23 | 95.7% | 0.62s |
|
| 123 |
+
| Pattern/Abstraction/Analogy | 13 | 84.6% | 2.99s |
|
| 124 |
+
| Constraint Violation | 10 | 100% | 2.28s |
|
| 125 |
+
| Reasoning & Taste | 13 | 100% | 3.20s |
|
| 126 |
+
| Creative Expression | 17 | 100% | 3.74s |
|
| 127 |
+
| **Overall** | **76** | **96.1%** | **2.38s** |
|
| 128 |
+
|
| 129 |
+
## What Fine-tuning Teaches
|
| 130 |
+
|
| 131 |
+
The fine-tuning does **not** add new knowledge. The base Qwen model already knows about ecology, neuroscience, architecture, etc. What the fine-tuning adds is a **cognitive routine**:
|
| 132 |
+
|
| 133 |
+
1. Seek connections to *distant* domains
|
| 134 |
+
2. Extract *structural* relationships, not facts
|
| 135 |
+
3. Identify conventions and propose their inversions
|
| 136 |
+
4. Score ideas on a multi-dimensional quality metric
|
| 137 |
+
5. Express insights with explicit cross-domain attribution
|
| 138 |
+
|
| 139 |
+
We verified this by comparing base Qwen vs. CreativitySLM on identical prompts. The base model produces generic informational responses. The fine-tuned model produces structured cross-domain analogies with novel connections.
|
| 140 |
+
|
| 141 |
+
## Usage
|
| 142 |
+
|
| 143 |
+
```python
|
| 144 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 145 |
+
|
| 146 |
+
model = AutoModelForCausalLM.from_pretrained("bdeepakreddy/creativity-slm")
|
| 147 |
+
tokenizer = AutoTokenizer.from_pretrained("bdeepakreddy/creativity-slm")
|
| 148 |
+
|
| 149 |
+
messages = [
|
| 150 |
+
{"role": "system", "content": "You are a creative domain analyst..."},
|
| 151 |
+
{"role": "user", "content": "Analyze this creative prompt: 'How can music theory inspire new programming languages?'"}
|
| 152 |
+
]
|
| 153 |
+
|
| 154 |
+
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 155 |
+
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
| 156 |
+
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7)
|
| 157 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 158 |
+
```
|
| 159 |
+
|
| 160 |
+
## Tech Stack
|
| 161 |
+
|
| 162 |
+
| Component | Technology |
|
| 163 |
+
|-----------|------------|
|
| 164 |
+
| Base Model | Qwen2.5-1.5B-Instruct |
|
| 165 |
+
| Fine-tuning | QLoRA (bitsandbytes, peft, trl) |
|
| 166 |
+
| Training Platform | Modal.com (A100-80GB) |
|
| 167 |
+
| Inference | vLLM on Modal.com (A10G) |
|
| 168 |
+
| Frontend | Next.js 15 + Tailwind + shadcn/ui |
|
| 169 |
+
| Backend | Supabase + Drizzle ORM |
|
| 170 |
+
| Search | Tavily API |
|
| 171 |
+
| Embeddings | text-embedding-3-large |
|
| 172 |
+
|
| 173 |
+
## Citation
|
| 174 |
+
|
| 175 |
+
```bibtex
|
| 176 |
+
@article{bandi2026creativityslm,
|
| 177 |
+
title={Teaching Small Language Models to Think Creatively: A Multi-Task Cognitive Architecture for Cross-Domain Analogy Generation},
|
| 178 |
+
author={Bandi, Deepak},
|
| 179 |
+
year={2026},
|
| 180 |
+
note={University of Waterloo}
|
| 181 |
+
}
|
| 182 |
+
```
|
| 183 |
+
|
| 184 |
+
## Paper
|
| 185 |
+
|
| 186 |
+
The full research paper is available in the `paper/` directory of the repository.
|
| 187 |
+
|
| 188 |
+
## License
|
| 189 |
+
|
| 190 |
+
Apache 2.0
|
| 191 |
+
|
| 192 |
+
## Author
|
| 193 |
+
|
| 194 |
+
**Deepak Bandi** β University of Waterloo β research@fr1.ai
|