Kinetic-FC-LoRA
Kinetic-FC-LoRA is a rank-64 LoRA adapter for Qwen/Qwen3-4B-Instruct-2507, fine-tuned at Conscious Engines for function / tool calling against the Composio tool ecosystem.
Applied on top of the base model, this adapter produces Kinetic-4B, a 4B-parameter tool-calling model that on our 300-sample Composio eval beats frontier hosted models on the same task:
| Model | Params | Accuracy | p95 latency |
|---|---|---|---|
| Qwen3-4B + Kinetic-FC-LoRA | 4B | 82.33% | 1.61 s |
| Claude Haiku 4.5 | โ | 80.00% | 4.02 s |
| Qwen3-4B-Instruct-2507 (base) | 4B | 78.67% | 1.84 s |
| GPT-OSS-120B | 120B | 76.33% | 7.99 s |
Full write-up: Kinetic-4B blog post.
What it's for
- Picking the correct tool from a menu of up to ~10 options drawn from a single Composio toolkit.
- Producing syntactically valid arguments conforming to the tool's JSON schema.
- Emitting tool calls in Qwen3's native
<tool_call>{ "name": ..., "arguments": ... }</tool_call>format.
It is not a general chat model โ it's a narrow specialist. For freeform conversation, use the base Qwen3-4B-Instruct-2507 directly.
Adapter details
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| Method | LoRA (PEFT) |
Rank r |
64 |
lora_alpha |
128 |
lora_dropout |
0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable params | 132 M / 4.15 B (3.18%) |
| Precision | bf16 |
| Epochs | 2 |
| Effective batch | 16 (1 ร 16 grad accum) |
| Learning rate | 2e-4, cosine, 5% warmup |
| Max seq length | 10 240 |
| Training data | 13 694 synthetic samples across the top-20 Composio toolkits, 10 tools per sample (1 ground-truth + 9 distractors from the same toolkit) |
Inference with PyTorch + PEFT
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
BASE = "Qwen/Qwen3-4B-Instruct-2507"
ADAPT = "consciousengines/Kinetic-FC-LoRA"
device = "cuda" if torch.cuda.is_available() else ("mps" if torch.backends.mps.is_available() else "cpu")
dtype = torch.bfloat16 if device != "cpu" else torch.float32
tokenizer = AutoTokenizer.from_pretrained(BASE)
base = AutoModelForCausalLM.from_pretrained(BASE, dtype=dtype).to(device)
model = PeftModel.from_pretrained(base, ADAPT).eval()
# Optional: merge LoRA into the base weights for a small inference speedup.
# model = model.merge_and_unload()
tools = [{
"type": "function",
"function": {
"name": "SALESFORCE_ADD_CONTACT_TO_CAMPAIGN",
"description": "Adds a contact to a campaign by creating a CampaignMember record.",
"parameters": {
"type": "object",
"properties": {
"campaign_id": {"type": "string", "description": "Salesforce campaign ID."},
"contact_id": {"type": "string", "description": "Salesforce contact ID."},
"status": {"type": "string", "description": "Member status, e.g. 'Attended'."},
},
"required": ["campaign_id", "contact_id"],
},
},
}]
messages = [
{"role": "system", "content": "You are a helpful assistant with access to tools."},
{"role": "user", "content": "Please enroll Contact ID 0035g00000ZZtopAA into Campaign 7015g000000XyZ9AA (mark them as Attended)."},
]
inputs = tokenizer.apply_chat_template(
messages, tools=tools, add_generation_prompt=True,
return_tensors="pt", return_dict=True,
).to(device)
with torch.inference_mode():
out = model.generate(**inputs, max_new_tokens=256, do_sample=False, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=False))
Expected completion (format is Qwen3-native):
<tool_call>
{"name": "SALESFORCE_ADD_CONTACT_TO_CAMPAIGN", "arguments": {"campaign_id": "7015g000000XyZ9AA", "contact_id": "0035g00000ZZtopAA", "status": "Attended"}}
</tool_call>
Serving with vLLM
The merged model is also published separately if you'd rather serve a single artifact:
vllm serve Qwen/Qwen3-4B-Instruct-2507 \
--enable-lora \
--lora-modules kinetic-fc=consciousengines/Kinetic-FC-LoRA \
--tool-call-parser hermes \
--enable-auto-tool-choice
Then hit /v1/chat/completions with model: "kinetic-fc" and an OpenAI-style tools array.
Intended use & limitations
- Designed for structured function / tool calls on Composio-style JSON schemas, presented 1โ10 at a time.
- Not designed for long-form chat, coding assistance, math, or retrieval-augmented question answering. The adapter was not trained on these distributions and will underperform the base model on them.
- Like any small model, it can hallucinate argument values (e.g. IDs) when the user query is ambiguous or incomplete.
- Evaluated only in English, and primarily on SaaS-API-flavoured schemas.
Citation
@misc{kinetic4b2026,
title = {Kinetic-4B: A 4-Billion Parameter Model That Outperforms Claude Haiku at Tool Calling},
author = {Pal, Ritam and Kundan, Kautuk},
year = {2026},
url = {https://www.consciousengines.com/blog/kinetic-4b-a-4-billion-parameter-model-that-outperforms-claude-haiku-at-tool-calling}
}
Acknowledgements
Built by Ritam Pal and Kautuk Kundan at Conscious Engines, as part of the LossFunk residency.
- Downloads last month
- 32
Model tree for consciousengines/Kinetic-FC-LoRA
Base model
Qwen/Qwen3-4B-Instruct-2507