jokernifty's picture
Update README.md
3230c62 verified
metadata
license: apache-2.0
language:
  - en
library_name: mlx
pipeline_tag: text-generation
base_model: Qwen/Qwen3.5-9B
base_model_relation: adapter
tags:
  - mlx
  - qwen
  - qwen3.5
  - lora
  - adapter
  - sft
  - unity
  - documentation
  - downftuner

Qwen3.5-9b-UnityEngine

A LoRA adapter for Qwen/Qwen3.5-9B fine-tuned with SFT on Unity Engine documentation. The base model is unchanged — this repo contains only the adapter weights, so you load the base separately and apply the adapter at inference time.

What this model does

Specialises Qwen/Qwen3.5-9B for Unity Engine-specific questions, quoting API identifiers, configuration keys, file paths, and version-specific details verbatim from the official documentation. It is not a general chat model — for free-form conversation, the unadorned base handles that better.

How it was built

Trained using DownFTuner, a custom local fine-tuning platform built by jokernifty.

DownFTuner is currently a private internal tool of jokernifty. If you'd like access or want to discuss the pipeline, open a discussion on this model.

Usage

With MLX (Apple Silicon, recommended)

from mlx_lm import load, generate

model, tokenizer = load(
    "mlx-community/Qwen3.5-9B-MLX-4bit",
    adapter_path="jokernifty/Qwen3.5-9b-UnityEngine",
)

print(generate(
    model, tokenizer,
    prompt="<your Unity Engine question here>",
    max_tokens=400,
))

With transformers + PEFT (any platform)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-9B", dtype=torch.bfloat16, device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-9B")
model = PeftModel.from_pretrained(base, "jokernifty/Qwen3.5-9b-UnityEngine")

inputs = tokenizer.apply_chat_template(
    [{"role": "user", "content": "<your Unity Engine question here>"}],
    add_generation_prompt=True, return_tensors="pt",
).to(model.device)
out = model.generate(inputs, max_new_tokens=400)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

As a fused checkpoint

If you'd rather have a single self-contained model:

python -m mlx_lm.fuse \
  --model mlx-community/Qwen3.5-9B-MLX-4bit \
  --adapter-path jokernifty/Qwen3.5-9b-UnityEngine \
  --save-path ./Qwen3.5-9b-UnityEngine-fused

Limitations

  • Knowledge is bounded by the documentation snapshot used for training. Newer API additions or removals after that date are not reflected.
  • Like the base model, this adapter can confabulate confidently. Always verify code examples against the current upstream docs before shipping.
  • The adapter is LoRA only — for tasks outside Unity Engine, you'll see no improvement (and possibly slight regression) versus the base.

License

Apache 2.0, inherited from the Qwen/Qwen3.5-9B base. Built by jokernifty using DownFTuner. Please credit the base model and this adapter when you use it.