Instructions to use annnnnnnd/Qwen3.6-27B-Reflect with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use annnnnnnd/Qwen3.6-27B-Reflect with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="annnnnnnd/Qwen3.6-27B-Reflect",
	filename="Qwen3.6-27b-Reflect-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use annnnnnnd/Qwen3.6-27B-Reflect with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf annnnnnnd/Qwen3.6-27B-Reflect:Q6_K
# Run inference directly in the terminal:
llama-cli -hf annnnnnnd/Qwen3.6-27B-Reflect:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf annnnnnnd/Qwen3.6-27B-Reflect:Q6_K
# Run inference directly in the terminal:
llama-cli -hf annnnnnnd/Qwen3.6-27B-Reflect:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf annnnnnnd/Qwen3.6-27B-Reflect:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf annnnnnnd/Qwen3.6-27B-Reflect:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf annnnnnnd/Qwen3.6-27B-Reflect:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf annnnnnnd/Qwen3.6-27B-Reflect:Q6_K

Use Docker

docker model run hf.co/annnnnnnd/Qwen3.6-27B-Reflect:Q6_K

LM Studio
Jan
Ollama
How to use annnnnnnd/Qwen3.6-27B-Reflect with Ollama:
```
ollama run hf.co/annnnnnnd/Qwen3.6-27B-Reflect:Q6_K
```

Unsloth Studio new

How to use annnnnnnd/Qwen3.6-27B-Reflect with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for annnnnnnd/Qwen3.6-27B-Reflect to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for annnnnnnd/Qwen3.6-27B-Reflect to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for annnnnnnd/Qwen3.6-27B-Reflect to start chatting

Pi new

How to use annnnnnnd/Qwen3.6-27B-Reflect with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf annnnnnnd/Qwen3.6-27B-Reflect:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "annnnnnnd/Qwen3.6-27B-Reflect:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use annnnnnnd/Qwen3.6-27B-Reflect with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf annnnnnnd/Qwen3.6-27B-Reflect:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default annnnnnnd/Qwen3.6-27B-Reflect:Q6_K

Run Hermes

hermes

Docker Model Runner
How to use annnnnnnd/Qwen3.6-27B-Reflect with Docker Model Runner:
```
docker model run hf.co/annnnnnnd/Qwen3.6-27B-Reflect:Q6_K
```

Lemonade

How to use annnnnnnd/Qwen3.6-27B-Reflect with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull annnnnnnd/Qwen3.6-27B-Reflect:Q6_K

Run and chat with the model

lemonade run user.Qwen3.6-27B-Reflect-Q6_K

List all available models

lemonade list

Qwen3.6-27B-Reflect

A fine-tuned Qwen3.6-27B focused on anti-sycophancy, reasoning efficiency, and honest voice.

What is Reflect?

Reflect is a fine-tuned family built on the principle that less training data, better curated, produces superior results. Rather than training on tens of thousands of examples, Reflect uses 1,400 aggressively cleaned examples to reshape the model's voice without degrading its capabilities.

The name "Reflect" describes what the model does — it reflects honestly instead of performing.

Key Results

3x token efficiency vs base Qwen3.6-27B on equivalent reasoning tasks. Same accuracy, one-third the thinking tokens. The model reasons more efficiently because verbose padding was stripped during training.
Anti-sycophancy as efficiency: sycophantic patterns are processing overhead — hedging, qualifying, self-doubting, over-praising. Stripping them doesn't just change the voice, it reduces wasted compute in the reasoning trace itself, reducing context pollution. The model thinks faster because it isn't trying to please.
Meta-cognition: allows the model to be correctable, not more correct. It still doesn't know what it doesn't know. Good prompting techniques also help — think of the model as a baby who knows a lot.
Fully preserved tool use: native Qwen tool-calling capability retained. No degradation in function calling, structured output, or agent workflows.

Training Methodology

SFT (Supervised Fine-Tuning)

Dataset: 1,400 curated examples
LoRA config: r32 / a32 (1:1 alpha-to-rank ratio for stable training)
Learning rate: 1e-4
Epochs: 1
Precision: Q4 (forces reconstruction, cleaner reasoning)
Key principle: less is more.

DPO (Direct Preference Optimization)

1400 preference pairs with further trimmed overhead
LoRA config: r16
Learning rate: 1e-6
Beta: 0.1
Epochs: 1
Method: Voice distillation using model's own output to correct voice imperfections and further instill correct reasoning path.

Benchmarks

Reflect vs Base Qwen3.6-27B (Q6_K)

Same hardware, same config, same seed, same samples. Clean A/B comparison. Thinking trace off to gauge base weight similarity.

Benchmark	N	Base Qwen3.6	Reflect	Delta
MMLU	1000	87.40%	87.60%	+0.20%
GSM8K	400	96.25%	96.75%	+0.50%
HumanEval	164	93.29%	92.07%	-1.22%
IFEval	192	81.25%	77.08%	-4.17%
ARC Challenge	400	96.75%	96.25%	-0.50%
TruthfulQA	200	89.50%	87.50%	-2.00%
Average		90.74%	89.54%	-1.20%
Wall time		2191.6s	2115.3s	-3.5%

Key findings:

MMLU and GSM8K improved — personality training slightly enhanced knowledge recall and math reasoning. This should not happen with 1,400 examples. It suggests the anti-sycophancy training reduces processing overhead, allowing the model to reason more directly.
IFEval dropped 4.17% — this is the anti-sycophancy feature working. Reflect pushes back on instructions rather than blindly complying. This is not a regression; it's the intended behavior.
HumanEval, ARC, TruthfulQA within noise — no catastrophic forgetting despite personality modification.
3.5% faster wall time — Reflect generates less verbose reasoning traces, translating to faster inference.

Token Efficiency — Thinking Mode Retest

Both models were retested on the 215 questions they both failed in the initial (non-thinking) run. Thinking enabled, 3 samples per question, identical settings.

Time to complete:

	Base Qwen3.6	Reflect	Ratio
Total time	6595s (110 min)	2053s (34 min)	3.2x faster

Average response length (chars) per benchmark:

Benchmark	N	Base Qwen3.6	Reflect	Ratio
MMLU	138	6047	1489	4.1x shorter
GSM8K	18	5731	364	15.7x shorter
ARC Challenge	16	6437	1408	4.6x shorter
TruthfulQA	28	1132	2382	2.1x longer
HumanEval	15	1116	733	1.5x shorter

Recovery rates (pass within 3 tries):

Benchmark	Base Qwen3.6	Reflect
MMLU	46.4%	52.9%
GSM8K	61.1%	44.4%
ARC Challenge	50.0%	12.5%
TruthfulQA	46.4%	57.1%
HumanEval	60.0%	46.7%

Key insight: Reflect allocates thinking tokens where they matter. It spends 2x more on TruthfulQA (where careful reasoning about honesty is valuable) while spending 15.7x less on GSM8K (where direct math reasoning doesn't need verbose self-narration). This isn't uniform compression — it's intelligent reallocation of processing budget.

The anti-sycophancy training didn't just strip output padding. It reshaped the model's internal reasoning economy.

Adjusted Final Scores (Initial + Thinking Recovery)

Combined scores after both models attempted to recover their shared 215 failures with thinking enabled.

Benchmark	Base Qwen3.6	Reflect	Delta
MMLU	93.8%	94.9%	+1.1%
GSM8K	99.0%	98.8%	-0.2%
HumanEval	98.8%	96.3%	-2.5%
IFEval	81.3%	77.1%	-4.2%
ARC Challenge	98.8%	96.8%	-2.0%
TruthfulQA	96.0%	95.5%	-0.5%
Average	94.6%	93.2%	-1.4%

Both models recovered nearly identical numbers of failed questions (~105 vs ~106 out of 215). The 1.4% gap is almost entirely from IFEval (anti-sycophancy working as designed). Excluding IFEval, the capability gap is under 1%.

Same recovery. 3.2x faster. 1,400 examples.

The Reflect Family

Model	Base	Status
Reflect 27B	Qwen3.6-27B	✅ Released
Reflect 9B	Qwen3.5-9B	Coming soon
Reflect 4B	Qwen3.5-4B	Coming soon

All three sizes trained on the same 1,400 examples with the same methodology. One voice, three scales.

Recommended System Prompt

Recommended Settings

Temperature: 0.6-0.7
Context: Up to 262K tokens supported
Quantization: Q6_K

Technical Details

Base model: Qwen/Qwen3.6-27B
Architecture: Dense transformer, 27B parameters
Format: GGUF Q6_K
File size: ~22GB
Training hardware: RTX Pro6000
Training framework: Unsloth

About

Built by some random guy

The core insight: model quality is determined more by dataset curation than by parameter count or training compute. 1,400 carefully chosen examples outperform thousands of uncurated ones.

License

Same as base model (Apache 2.0 / Qwen license).

Model tree for annnnnnnd/Qwen3.6-27B-Reflect

Base model

Qwen/Qwen3.6-27B

Quantized

(400)

this model

annnnnnnd
/

Qwen3.6-27B-Reflect