Instructions to use Jackrong/Qwopus3.6-27B-Coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Jackrong/Qwopus3.6-27B-Coder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Jackrong/Qwopus3.6-27B-Coder")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Jackrong/Qwopus3.6-27B-Coder")
model = AutoModelForMultimodalLM.from_pretrained("Jackrong/Qwopus3.6-27B-Coder", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Jackrong/Qwopus3.6-27B-Coder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Jackrong/Qwopus3.6-27B-Coder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-27B-Coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Jackrong/Qwopus3.6-27B-Coder

SGLang

How to use Jackrong/Qwopus3.6-27B-Coder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Jackrong/Qwopus3.6-27B-Coder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-27B-Coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Jackrong/Qwopus3.6-27B-Coder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwopus3.6-27B-Coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use Jackrong/Qwopus3.6-27B-Coder with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwopus3.6-27B-Coder to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwopus3.6-27B-Coder to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Jackrong/Qwopus3.6-27B-Coder to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Jackrong/Qwopus3.6-27B-Coder",
    max_seq_length=2048,
)

Docker Model Runner
How to use Jackrong/Qwopus3.6-27B-Coder with Docker Model Runner:
```
docker model run hf.co/Jackrong/Qwopus3.6-27B-Coder
```

4.5bpw Exl3 H6 LLMFan46 Heretic Base Qwopus 3.6 Coder

by tw33kr442 - opened Jun 15

Discussion

tw33kr442

Jun 15

•

edited Jun 15

Making a better model for my personal local use, needed an exl3 quant. Currently running a Qwopus 3.6 Coder off the llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved base so it's a bit different but I hope it works well in a repo. Removing vision and mtp, since this is specifically for my 3090 24gb and that allows for a nice context management and size. I've had more luck with the EXL3's in my repos.

Hoping for a multi-use local model, with an agentic lean, and my preferred quant.
Also burning a exl3 quant of the full coder for good measure.

Just wanted to thank you for the information and the work you guys do, excited to see what I can get on the SWEBench for the new model build. Here are the build details, I'll drop a model card and setup the page for my version. The training I'm doing is 0.56 epoch at about 24hour burn for a H200. It seems like there is diminishing returns past so much with the way the Qwen architecture is built and how I'm running the training, new to the model builds so I'm hoping my methodology can come close to the build you guys made, though without the Qwopus base I'm curious how it will perform as I think this is as close as I can reasonably do in a quick build.

While I think you guys went a different direction, this is my first full build that might be useful. So any thoughts incase I run another model later would be great!

My hope is that the model works well and the runpod usage was worth the coin.

Quantization: EXL3
Target bitrate: 4.5 bpw
Head bits: 6
Context target: 32K training context
Serving target: local ExLlamaV3/TextGen-style coder-agent use
Vision/MTP: intentionally not part of the serving target

Training Summary

The adapter was trained on an H200 using continuous QLoRA SFT with response-only masking.
The source model has hybrid Qwen3.5-style attention, so LoRA coverage includes both
standard self-attention and linear-attention modules.

Core settings:

Max sequence length: 32768
Target optimizer steps: 1500
Effective batch size: 8
Dataset exposure: about 0.55 epoch over 21,785 rows
Learning rate: 1.5e-4
Scheduler: cosine
Warmup: 20 steps
Checkpoint cadence during training: 50 steps
LoRA rank/alpha: 16 / 32
Batch size: 1
Gradient accumulation: 8
Optimizer: adamw_8bit
Weight decay: 0.01
Precision: bf16
Attention backend: FlashAttention 2 for the standard attention path
Loss mask: assistant responses only, using <|im_start|>assistant\n
Target module coverage:
- self_attn.q_proj, self_attn.k_proj, self_attn.v_proj, self_attn.o_proj
- linear_attn.in_proj_qkv, linear_attn.in_proj_a, linear_attn.in_proj_b, linear_attn.in_proj_z, linear_attn.out_proj
- mlp.gate_proj, mlp.up_proj, mlp.down_proj
Explicitly excluded from LoRA: MTP, vision, norms, A_log, dt_bias

Coverage gate:

Trainable adapter tensors: 992
Trainable parameters: 116,727,808
self_attn: 128 trainable tensors
linear_attn: 480 trainable tensors
mlp: 384 trainable tensors
mtp: 0
vision: 0

The mtp and vision counts above refer to LoRA trainable coverage only. Those
components are intentionally excluded from adapter training. The final EXL3 serving
artifact is intended to be text-only and non-MTP after post-merge stripping/validation.

Curriculum

The 32K training curriculum was rendered into final chat-template text before SFT.
It contains 21,785 formatted rows, built from:

Claude Opus trace-inversion datasets from the Jackrong catalog
Hermes agent reasoning traces
Qwen3 Coder 480B distill mini
Competitive Python programming blend
A small local ECC/Codex/STAR rules-and-agent-behavior slice

The local slice is deliberately small and is meant to steer repo-agent behavior rather
than make the model specific to one private repository.

The training data is a blended single-pass curriculum rather than the official Jackrong
staged production run. It aims to compress the public Qwopus-style trace-inversion,
agentic coding, and long-context behaviors into a practical single-H200 QLoRA build.

tw33kr442

Jun 16

Well my training mixed up tool calls as chat and did not work, however I put up 2 versions Stock with my quant non-mtp and with or without vision for any other 3090 users

tw33kr442

Jun 20

The EXL3 Quant that I made for a direct copy works great though, love the coder finetune

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment