Instructions to use Jackrong/Qwopus3.6-27B-Coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Jackrong/Qwopus3.6-27B-Coder with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Jackrong/Qwopus3.6-27B-Coder") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("Jackrong/Qwopus3.6-27B-Coder") model = AutoModelForMultimodalLM.from_pretrained("Jackrong/Qwopus3.6-27B-Coder") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Jackrong/Qwopus3.6-27B-Coder with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Jackrong/Qwopus3.6-27B-Coder" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jackrong/Qwopus3.6-27B-Coder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Jackrong/Qwopus3.6-27B-Coder
- SGLang
How to use Jackrong/Qwopus3.6-27B-Coder with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Jackrong/Qwopus3.6-27B-Coder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jackrong/Qwopus3.6-27B-Coder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Jackrong/Qwopus3.6-27B-Coder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jackrong/Qwopus3.6-27B-Coder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use Jackrong/Qwopus3.6-27B-Coder with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jackrong/Qwopus3.6-27B-Coder to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jackrong/Qwopus3.6-27B-Coder to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Jackrong/Qwopus3.6-27B-Coder to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Jackrong/Qwopus3.6-27B-Coder", max_seq_length=2048, ) - Docker Model Runner
How to use Jackrong/Qwopus3.6-27B-Coder with Docker Model Runner:
docker model run hf.co/Jackrong/Qwopus3.6-27B-Coder
4.5bpw Exl3 H6 LLMFan46 Heretic Base Qwopus 3.6 Coder
Making a better model for my personal local use, needed an exl3 quant. Currently running a Qwopus 3.6 Coder off the llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved base so it's a bit different but I hope it works well in a repo. Removing vision and mtp, since this is specifically for my 3090 24gb and that allows for a nice context management and size. I've had more luck with the EXL3's in my repos.
Hoping for a multi-use local model, with an agentic lean, and my preferred quant.
Also burning a exl3 quant of the full coder for good measure.
Just wanted to thank you for the information and the work you guys do, excited to see what I can get on the SWEBench for the new model build. Here are the build details, I'll drop a model card and setup the page for my version. The training I'm doing is 0.56 epoch at about 24hour burn for a H200. It seems like there is diminishing returns past so much with the way the Qwen architecture is built and how I'm running the training, new to the model builds so I'm hoping my methodology can come close to the build you guys made, though without the Qwopus base I'm curious how it will perform as I think this is as close as I can reasonably do in a quick build.
While I think you guys went a different direction, this is my first full build that might be useful. So any thoughts incase I run another model later would be great!
My hope is that the model works well and the runpod usage was worth the coin.
- Quantization: EXL3
- Target bitrate: 4.5 bpw
- Head bits: 6
- Context target: 32K training context
- Serving target: local ExLlamaV3/TextGen-style coder-agent use
- Vision/MTP: intentionally not part of the serving target
Training Summary
The adapter was trained on an H200 using continuous QLoRA SFT with response-only masking.
The source model has hybrid Qwen3.5-style attention, so LoRA coverage includes both
standard self-attention and linear-attention modules.
Core settings:
- Max sequence length: 32768
- Target optimizer steps: 1500
- Effective batch size: 8
- Dataset exposure: about 0.55 epoch over 21,785 rows
- Learning rate: 1.5e-4
- Scheduler: cosine
- Warmup: 20 steps
- Checkpoint cadence during training: 50 steps
- LoRA rank/alpha: 16 / 32
- Batch size: 1
- Gradient accumulation: 8
- Optimizer: adamw_8bit
- Weight decay: 0.01
- Precision: bf16
- Attention backend: FlashAttention 2 for the standard attention path
- Loss mask: assistant responses only, using
<|im_start|>assistant\n - Target module coverage:
self_attn.q_proj,self_attn.k_proj,self_attn.v_proj,self_attn.o_projlinear_attn.in_proj_qkv,linear_attn.in_proj_a,linear_attn.in_proj_b,linear_attn.in_proj_z,linear_attn.out_projmlp.gate_proj,mlp.up_proj,mlp.down_proj
- Explicitly excluded from LoRA: MTP, vision, norms,
A_log,dt_bias
Coverage gate:
- Trainable adapter tensors: 992
- Trainable parameters: 116,727,808
self_attn: 128 trainable tensorslinear_attn: 480 trainable tensorsmlp: 384 trainable tensorsmtp: 0vision: 0
The mtp and vision counts above refer to LoRA trainable coverage only. Those
components are intentionally excluded from adapter training. The final EXL3 serving
artifact is intended to be text-only and non-MTP after post-merge stripping/validation.
Curriculum
The 32K training curriculum was rendered into final chat-template text before SFT.
It contains 21,785 formatted rows, built from:
- Claude Opus trace-inversion datasets from the Jackrong catalog
- Hermes agent reasoning traces
- Qwen3 Coder 480B distill mini
- Competitive Python programming blend
- A small local ECC/Codex/STAR rules-and-agent-behavior slice
The local slice is deliberately small and is meant to steer repo-agent behavior rather
than make the model specific to one private repository.
The training data is a blended single-pass curriculum rather than the official Jackrong
staged production run. It aims to compress the public Qwopus-style trace-inversion,
agentic coding, and long-context behaviors into a practical single-H200 QLoRA build.
Well my training mixed up tool calls as chat and did not work, however I put up 2 versions Stock with my quant non-mtp and with or without vision for any other 3090 users