Kinetic-FC-LoRA

Kinetic-FC-LoRA is a rank-64 LoRA adapter for Qwen/Qwen3-4B-Instruct-2507, fine-tuned at Conscious Engines for function / tool calling against the Composio tool ecosystem.

Applied on top of the base model, this adapter produces Kinetic-4B, a 4B-parameter tool-calling model that on our 300-sample Composio eval beats frontier hosted models on the same task:

Model Params Accuracy p95 latency
Qwen3-4B + Kinetic-FC-LoRA 4B 82.33% 1.61 s
Claude Haiku 4.5 โ€” 80.00% 4.02 s
Qwen3-4B-Instruct-2507 (base) 4B 78.67% 1.84 s
GPT-OSS-120B 120B 76.33% 7.99 s

Full write-up: Kinetic-4B blog post.

What it's for

  • Picking the correct tool from a menu of up to ~10 options drawn from a single Composio toolkit.
  • Producing syntactically valid arguments conforming to the tool's JSON schema.
  • Emitting tool calls in Qwen3's native <tool_call>{ "name": ..., "arguments": ... }</tool_call> format.

It is not a general chat model โ€” it's a narrow specialist. For freeform conversation, use the base Qwen3-4B-Instruct-2507 directly.

Adapter details

Base model Qwen/Qwen3-4B-Instruct-2507
Method LoRA (PEFT)
Rank r 64
lora_alpha 128
lora_dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable params 132 M / 4.15 B (3.18%)
Precision bf16
Epochs 2
Effective batch 16 (1 ร— 16 grad accum)
Learning rate 2e-4, cosine, 5% warmup
Max seq length 10 240
Training data 13 694 synthetic samples across the top-20 Composio toolkits, 10 tools per sample (1 ground-truth + 9 distractors from the same toolkit)

Inference with PyTorch + PEFT

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

BASE  = "Qwen/Qwen3-4B-Instruct-2507"
ADAPT = "consciousengines/Kinetic-FC-LoRA"

device = "cuda" if torch.cuda.is_available() else ("mps" if torch.backends.mps.is_available() else "cpu")
dtype  = torch.bfloat16 if device != "cpu" else torch.float32

tokenizer = AutoTokenizer.from_pretrained(BASE)
base      = AutoModelForCausalLM.from_pretrained(BASE, dtype=dtype).to(device)
model     = PeftModel.from_pretrained(base, ADAPT).eval()

# Optional: merge LoRA into the base weights for a small inference speedup.
# model = model.merge_and_unload()

tools = [{
    "type": "function",
    "function": {
        "name": "SALESFORCE_ADD_CONTACT_TO_CAMPAIGN",
        "description": "Adds a contact to a campaign by creating a CampaignMember record.",
        "parameters": {
            "type": "object",
            "properties": {
                "campaign_id": {"type": "string", "description": "Salesforce campaign ID."},
                "contact_id":  {"type": "string", "description": "Salesforce contact ID."},
                "status":      {"type": "string", "description": "Member status, e.g. 'Attended'."},
            },
            "required": ["campaign_id", "contact_id"],
        },
    },
}]

messages = [
    {"role": "system", "content": "You are a helpful assistant with access to tools."},
    {"role": "user",   "content": "Please enroll Contact ID 0035g00000ZZtopAA into Campaign 7015g000000XyZ9AA (mark them as Attended)."},
]

inputs = tokenizer.apply_chat_template(
    messages, tools=tools, add_generation_prompt=True,
    return_tensors="pt", return_dict=True,
).to(device)

with torch.inference_mode():
    out = model.generate(**inputs, max_new_tokens=256, do_sample=False, pad_token_id=tokenizer.eos_token_id)

print(tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=False))

Expected completion (format is Qwen3-native):

<tool_call>
{"name": "SALESFORCE_ADD_CONTACT_TO_CAMPAIGN", "arguments": {"campaign_id": "7015g000000XyZ9AA", "contact_id": "0035g00000ZZtopAA", "status": "Attended"}}
</tool_call>

Serving with vLLM

The merged model is also published separately if you'd rather serve a single artifact:

vllm serve Qwen/Qwen3-4B-Instruct-2507 \
  --enable-lora \
  --lora-modules kinetic-fc=consciousengines/Kinetic-FC-LoRA \
  --tool-call-parser hermes \
  --enable-auto-tool-choice

Then hit /v1/chat/completions with model: "kinetic-fc" and an OpenAI-style tools array.

Intended use & limitations

  • Designed for structured function / tool calls on Composio-style JSON schemas, presented 1โ€“10 at a time.
  • Not designed for long-form chat, coding assistance, math, or retrieval-augmented question answering. The adapter was not trained on these distributions and will underperform the base model on them.
  • Like any small model, it can hallucinate argument values (e.g. IDs) when the user query is ambiguous or incomplete.
  • Evaluated only in English, and primarily on SaaS-API-flavoured schemas.

Citation

@misc{kinetic4b2026,
  title  = {Kinetic-4B: A 4-Billion Parameter Model That Outperforms Claude Haiku at Tool Calling},
  author = {Pal, Ritam and Kundan, Kautuk},
  year   = {2026},
  url    = {https://www.consciousengines.com/blog/kinetic-4b-a-4-billion-parameter-model-that-outperforms-claude-haiku-at-tool-calling}
}

Acknowledgements

Built by Ritam Pal and Kautuk Kundan at Conscious Engines, as part of the LossFunk residency.

Downloads last month
32
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for consciousengines/Kinetic-FC-LoRA

Adapter
(5432)
this model