Instructions to use ertghiu256/Qwen3.5-2b-ReMix-final with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ertghiu256/Qwen3.5-2b-ReMix-final with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="ertghiu256/Qwen3.5-2b-ReMix-final", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("ertghiu256/Qwen3.5-2b-ReMix-final")
model = AutoModelForMultimodalLM.from_pretrained("ertghiu256/Qwen3.5-2b-ReMix-final", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ertghiu256/Qwen3.5-2b-ReMix-final with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ertghiu256/Qwen3.5-2b-ReMix-final"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ertghiu256/Qwen3.5-2b-ReMix-final",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/ertghiu256/Qwen3.5-2b-ReMix-final

SGLang

How to use ertghiu256/Qwen3.5-2b-ReMix-final with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ertghiu256/Qwen3.5-2b-ReMix-final" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ertghiu256/Qwen3.5-2b-ReMix-final",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ertghiu256/Qwen3.5-2b-ReMix-final" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ertghiu256/Qwen3.5-2b-ReMix-final",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use ertghiu256/Qwen3.5-2b-ReMix-final with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ertghiu256/Qwen3.5-2b-ReMix-final to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ertghiu256/Qwen3.5-2b-ReMix-final to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for ertghiu256/Qwen3.5-2b-ReMix-final to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="ertghiu256/Qwen3.5-2b-ReMix-final",
    max_seq_length=2048,
)

Docker Model Runner
How to use ertghiu256/Qwen3.5-2b-ReMix-final with Docker Model Runner:
```
docker model run hf.co/ertghiu256/Qwen3.5-2b-ReMix-final
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Qwen3.5-2B-ReMix-Final

Overview

Qwen3.5-2B-ReMix-Final is a precision-engineered, native Float16 (F16) fine-tune of Qwen/Qwen3.5-2B. While the previous ReMix iterations focused on the broad integration of large-scale distillation datasets, the Final version is the result of a specialized Supervised Fine-Tuning (SFT) strategy designed to maximize stability and logical coherence.

This model is specifically tuned to eliminate the "reasoning loops" common in small models. By shifting the training focus to strict instruction-following and adversarial logic handling, ReMix-Final acts as a robust, logic-first assistant for local execution.

🚀 Key Improvements & Comparison

This model marks a significant departure from the base and the initial ReMix:

Superior Logic Handling: While the base model is prone to repetitive cycles under stress, ReMix-Final demonstrates a vastly improved ability to traverse complex constraints and converge on an answer.
Instruction Following: Leveraging SFT datasets, the model adheres more strictly to formatting requirements and complex multi-step instructions.
Impossible-Question Awareness: Trained on custom hard-distillation sets, the model has been taught to recognize logical contradictions, allowing it to "break" a loop by identifying a problem as unsolvable.

🌟 Model Details

Base Model: Qwen/Qwen3.5-2B (Pre-integrated with multi-source distillation)
SFT Foundations: * allenai/Dolci-Instruct-SFT
nvidia/Nemotron-SFT-Instruction-Following-Chat-v2
Reasoning Enhancement: * Jackrong/DeepSeek-V4-Distill-8000x
5x Custom high-difficulty distillation sets targeting logical "dead-ends."
Format: Native F16 Merged Weights.
License: Apache-2.0

🎛️ Recommended Generation Parameters

Parameter	Value	Purpose
Temperature	`0.4 - 1.0`	Essential for keeping reasoning deterministic and focused.
Repetition Penalty	`1.15 - 1.2`	Acts as a safety net to help the model break out of residual loops.
Top K / Top P	`30 / 0.9`	Provides the model with enough vocabulary depth for technical tasks.
enable_thinking	`True`	Recommended to leverage the internal reasoning architecture.
context_length & max_token	> 4096	Allow the model to freely reason through. This model usually take more than 4000 - 5000 tokens to reason.

⚠️ Limitations & Fallback Behavior

Residual Looping: Despite the additional SFT training aimed at stability, 2B models can still fall back into looping patterns when faced with extreme ambiguity or recursive paradoxes. This is a significant improvement over the base and earlier ReMix, but remains a known characteristic of compact architectures.
Specialized Logic: The model is optimized for procedural reasoning (math, logic, code). For creative writing or general conversation, the "minimized reasoning" training may result in shorter, more direct responses than expected.
Verification: Always verify mathematical and technical outputs.

📦 Usage (Transformers)

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3.5/demo/RealWorld/RealWorld-04.png"
                }
            },
            {
                "type": "text",
                "text": "Where is this?"
            }
        ]
    }
]

chat_response = client.chat.completions.create(
    model="ertghiu256/Qwen3.5-2b-ReMix-final",
    messages=messages,
    max_tokens=32768,
    temperature=0.7,
    top_p=0.9,
    repeat_penalty=1.2,
    extra_body={
        "top_k": 30,
    }, 
)
print("Chat response:", chat_response)