Instructions to use jokernifty/Qwen3.5-9b-UnityEngine with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use jokernifty/Qwen3.5-9b-UnityEngine with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("jokernifty/Qwen3.5-9b-UnityEngine") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use jokernifty/Qwen3.5-9b-UnityEngine with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "jokernifty/Qwen3.5-9b-UnityEngine" --prompt "Once upon a time"
Qwen3.5-9b-UnityEngine
A LoRA adapter for Qwen/Qwen3.5-9B
fine-tuned with SFT on Unity Engine documentation. The base model is
unchanged โ this repo contains only the adapter weights, so you load the
base separately and apply the adapter at inference time.
What this model does
Specialises Qwen/Qwen3.5-9B for Unity Engine-specific questions, quoting API identifiers, configuration keys, file paths, and version-specific details verbatim from the official documentation. It is not a general chat model โ for free-form conversation, the unadorned base handles that better.
How it was built
Trained using DownFTuner, a custom local fine-tuning platform built by jokernifty.
DownFTuner is currently a private internal tool of jokernifty. If you'd like access or want to discuss the pipeline, open a discussion on this model.
Usage
With MLX (Apple Silicon, recommended)
from mlx_lm import load, generate
model, tokenizer = load(
"mlx-community/Qwen3.5-9B-MLX-4bit",
adapter_path="jokernifty/Qwen3.5-9b-UnityEngine",
)
print(generate(
model, tokenizer,
prompt="<your Unity Engine question here>",
max_tokens=400,
))
With transformers + PEFT (any platform)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3.5-9B", dtype=torch.bfloat16, device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-9B")
model = PeftModel.from_pretrained(base, "jokernifty/Qwen3.5-9b-UnityEngine")
inputs = tokenizer.apply_chat_template(
[{"role": "user", "content": "<your Unity Engine question here>"}],
add_generation_prompt=True, return_tensors="pt",
).to(model.device)
out = model.generate(inputs, max_new_tokens=400)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
As a fused checkpoint
If you'd rather have a single self-contained model:
python -m mlx_lm.fuse \
--model mlx-community/Qwen3.5-9B-MLX-4bit \
--adapter-path jokernifty/Qwen3.5-9b-UnityEngine \
--save-path ./Qwen3.5-9b-UnityEngine-fused
Limitations
- Knowledge is bounded by the documentation snapshot used for training. Newer API additions or removals after that date are not reflected.
- Like the base model, this adapter can confabulate confidently. Always verify code examples against the current upstream docs before shipping.
- The adapter is LoRA only โ for tasks outside Unity Engine, you'll see no improvement (and possibly slight regression) versus the base.
License
Apache 2.0, inherited from the Qwen/Qwen3.5-9B base. Built by jokernifty using DownFTuner. Please credit the base model and this adapter when you use it.
Quantized
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("jokernifty/Qwen3.5-9b-UnityEngine") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True)