Update README.md

3230c62 verified 4 days ago

3.13 kB

license: apache-2.0
language:
  - en
library_name: mlx
pipeline_tag: text-generation
base_model: Qwen/Qwen3.5-9B
base_model_relation: adapter
tags:
  - mlx
  - qwen
  - qwen3.5
  - lora
  - adapter
  - sft
  - unity
  - documentation
  - downftuner

Qwen3.5-9b-UnityEngine

A LoRA adapter for Qwen/Qwen3.5-9B fine-tuned with SFT on Unity Engine documentation. The base model is unchanged — this repo contains only the adapter weights, so you load the base separately and apply the adapter at inference time.

What this model does

Specialises Qwen/Qwen3.5-9B for Unity Engine-specific questions, quoting API identifiers, configuration keys, file paths, and version-specific details verbatim from the official documentation. It is not a general chat model — for free-form conversation, the unadorned base handles that better.

How it was built

Trained using DownFTuner, a custom local fine-tuning platform built by jokernifty.

DownFTuner is currently a private internal tool of jokernifty. If you'd like access or want to discuss the pipeline, open a discussion on this model.

Usage

With MLX (Apple Silicon, recommended)

from mlx_lm import load, generate

model, tokenizer = load(
    "mlx-community/Qwen3.5-9B-MLX-4bit",
    adapter_path="jokernifty/Qwen3.5-9b-UnityEngine",
)

print(generate(
    model, tokenizer,
    prompt="<your Unity Engine question here>",
    max_tokens=400,
))

With transformers + PEFT (any platform)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-9B", dtype=torch.bfloat16, device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-9B")
model = PeftModel.from_pretrained(base, "jokernifty/Qwen3.5-9b-UnityEngine")

inputs = tokenizer.apply_chat_template(
    [{"role": "user", "content": "<your Unity Engine question here>"}],
    add_generation_prompt=True, return_tensors="pt",
).to(model.device)
out = model.generate(inputs, max_new_tokens=400)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

As a fused checkpoint

If you'd rather have a single self-contained model:

python -m mlx_lm.fuse \
  --model mlx-community/Qwen3.5-9B-MLX-4bit \
  --adapter-path jokernifty/Qwen3.5-9b-UnityEngine \
  --save-path ./Qwen3.5-9b-UnityEngine-fused

Limitations

Knowledge is bounded by the documentation snapshot used for training. Newer API additions or removals after that date are not reflected.
Like the base model, this adapter can confabulate confidently. Always verify code examples against the current upstream docs before shipping.
The adapter is LoRA only — for tasks outside Unity Engine, you'll see no improvement (and possibly slight regression) versus the base.

License

Apache 2.0, inherited from the Qwen/Qwen3.5-9B base. Built by jokernifty using DownFTuner. Please credit the base model and this adapter when you use it.