Cclilqwen / README.md
Asystemoffields's picture
Update README.md
498d9cd verified
---
license: apache-2.0
base_model: Qwen/Qwen3-0.6B
tags:
- qwen3
- function-calling
- creative-writing
- code-generation
- math-reasoning
- dare-ties
- merge
language:
- en
pipeline_tag: text-generation
---
# Cclilqwen
A Qwen3-0.6B model fine-tuned for **creative writing**, **code generation**, and **agentic tool use** while retaining strong math reasoning.
Built by training four specialist models and merging them via manual DARE-TIES into a single capable 0.6B parameter model.
## Performance
| Capability | Result |
|-----------|--------|
| **GSM8K Math** | 48.5% (nearly 2x base Qwen3-0.6B) |
| **Tool Calling** | 100% success rate (valid JSON, correct `<tool_call>` tags) |
| **Python Coding** | Correct solutions for palindrome, fibonacci, stack, filter, word frequency |
| **Creative Writing** | Vivid prose, gothic horror, dark humor, perspective shifts |
## How It Was Made
Four specialist LoRA fine-tunes (r=32) on top of a math-strong base, then merged:
1. **selfplay_v1** (weight 0.30) β€” Self-play rejection sampling on GSM8K, 49% accuracy
2. **creative_v5** (weight 0.30) β€” 28 hand-crafted golden exemplars, best creative quality
3. **coding_v1** (weight 0.20) β€” CodeAlpaca-20k filtered for short Python solutions
4. **tool_use_v2** (weight 0.20) β€” glaive-function-calling-v2, Hermes-style `<tool_call>` format
Merged using manual DARE-TIES (DROP=0.80, TRIM=0.20, SCALE=0.5), which significantly outperformed mergekit's implementation on this model.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Asystemoffields/disco-torch-v1", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("Asystemoffields/disco-torch-v1")
messages = [{"role": "user", "content": "Write a haiku about debugging code."}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=200, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
```
### Tool Calling
```python
import json
tools = [{"name": "get_weather", "description": "Get weather for a location",
"parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}]
messages = [
{"role": "system", "content": f"You have access to: {json.dumps(tools)}\nCall tools with: <tool_call>\n{{\"name\": \"...\", \"arguments\": {{...}}}}\n</tool_call>"},
{"role": "user", "content": "What's the weather in Tokyo?"},
]
```
## Training Details
- **Base model:** Qwen/Qwen3-0.6B (0.6B parameters)
- **Hardware:** NVIDIA A10G (Modal)
- **LoRA:** r=32, alpha=64, all attention + MLP projections
- **Merge:** Manual DARE-TIES with 4 specialist checkpoints
- **Training data:** GSM8K, CodeAlpaca-20k, glaive-function-calling-v2, 28 golden creative exemplars
## Limitations
- 0.6B parameters limits complex multi-step reasoning
- Creative writing quality varies β€” best with specific, constrained prompts
- Tool calling works with explicit system prompt instructions
- Math accuracy (48.5%) is strong for size but not competitive with larger models
## License
Apache 2.0 (same as Qwen3-0.6B)