Asystemoffields
/

Cclilqwen

Text Generation

function-calling

creative-writing

code-generation

Model card Files Files and versions

Cclilqwen / README.md

Asystemoffields's picture

Asystemoffields

Update README.md

498d9cd verified 29 days ago

|

history blame contribute delete

3.32 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-0.6B
	tags:
	- qwen3
	- function-calling
	- creative-writing
	- code-generation
	- math-reasoning
	- dare-ties
	- merge
	language:
	- en
	pipeline_tag: text-generation
	---

	# Cclilqwen

	A Qwen3-0.6B model fine-tuned for creative writing, code generation, and agentic tool use while retaining strong math reasoning.

	Built by training four specialist models and merging them via manual DARE-TIES into a single capable 0.6B parameter model.

	## Performance

	\| Capability \| Result \|
	\|-----------\|--------\|
	\| GSM8K Math \| 48.5% (nearly 2x base Qwen3-0.6B) \|
	\| Tool Calling \| 100% success rate (valid JSON, correct `<tool_call>` tags) \|
	\| Python Coding \| Correct solutions for palindrome, fibonacci, stack, filter, word frequency \|
	\| Creative Writing \| Vivid prose, gothic horror, dark humor, perspective shifts \|

	## How It Was Made

	Four specialist LoRA fine-tunes (r=32) on top of a math-strong base, then merged:

	1. selfplay_v1 (weight 0.30) — Self-play rejection sampling on GSM8K, 49% accuracy
	2. creative_v5 (weight 0.30) — 28 hand-crafted golden exemplars, best creative quality
	3. coding_v1 (weight 0.20) — CodeAlpaca-20k filtered for short Python solutions
	4. tool_use_v2 (weight 0.20) — glaive-function-calling-v2, Hermes-style `<tool_call>` format

	Merged using manual DARE-TIES (DROP=0.80, TRIM=0.20, SCALE=0.5), which significantly outperformed mergekit's implementation on this model.

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("Asystemoffields/disco-torch-v1", torch_dtype="auto")
	tokenizer = AutoTokenizer.from_pretrained("Asystemoffields/disco-torch-v1")

	messages = [{"role": "user", "content": "Write a haiku about debugging code."}]
	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(prompt, return_tensors="pt")
	output = model.generate(**inputs, max_new_tokens=200, temperature=0.7, do_sample=True)
	print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
	```

	### Tool Calling

	```python
	import json

	tools = [{"name": "get_weather", "description": "Get weather for a location",
	"parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}]

	messages = [
	{"role": "system", "content": f"You have access to: {json.dumps(tools)}\nCall tools with: <tool_call>\n{{\"name\": \"...\", \"arguments\": {{...}}}}\n</tool_call>"},
	{"role": "user", "content": "What's the weather in Tokyo?"},
	]
	```

	## Training Details

	- Base model: Qwen/Qwen3-0.6B (0.6B parameters)
	- Hardware: NVIDIA A10G (Modal)
	- LoRA: r=32, alpha=64, all attention + MLP projections
	- Merge: Manual DARE-TIES with 4 specialist checkpoints
	- Training data: GSM8K, CodeAlpaca-20k, glaive-function-calling-v2, 28 golden creative exemplars

	## Limitations

	- 0.6B parameters limits complex multi-step reasoning
	- Creative writing quality varies — best with specific, constrained prompts
	- Tool calling works with explicit system prompt instructions
	- Math accuracy (48.5%) is strong for size but not competitive with larger models

	## License

	Apache 2.0 (same as Qwen3-0.6B)