Instructions to use mlx-community/humanizer-1B-OptIQ-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mlx-community/humanizer-1B-OptIQ-4bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("mlx-community/humanizer-1B-OptIQ-4bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use mlx-community/humanizer-1B-OptIQ-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/humanizer-1B-OptIQ-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "mlx-community/humanizer-1B-OptIQ-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use mlx-community/humanizer-1B-OptIQ-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/humanizer-1B-OptIQ-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default mlx-community/humanizer-1B-OptIQ-4bit

Run Hermes

hermes

MLX LM

How to use mlx-community/humanizer-1B-OptIQ-4bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "mlx-community/humanizer-1B-OptIQ-4bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "mlx-community/humanizer-1B-OptIQ-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "mlx-community/humanizer-1B-OptIQ-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

humanizer-1B-OptIQ-4bit / README.md

codelion

README: drop AI-tells (em-dashes, marketing cadence), remove citation block

89e77f6 verified 6 days ago

preview code

raw

history blame contribute delete

5.96 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: mlx
	tags:
	- text-generation
	- humanizer
	- ai-detection
	- lora
	- mlx
	- mlx-optiq
	- apple-silicon
	base_model: mlx-community/MiniCPM5-1B-OptiQ-4bit
	pipeline_tag: text-generation
	---

	# humanizer-1B-OptIQ-4bit

	A 1B model that scores the same as the human reference set on the RADAR AI detector. Stacked SFT + DPO LoRA adapters on top of `mlx-community/MiniCPM5-1B-OptIQ-4bit` close 100% of the gap to human writing on a 200-draft held-out evaluation.

	\| \| P(AI) (RADAR-Vicuna-7B) \|
	\| --- \| ---: \|
	\| Source AI drafts (Qwen3.5-4B + Gemma-4-e4b output) \| 0.51 \|
	\| `humanizer-1B-OptIQ-4bit` (SFT + DPO stacked) \| 0.37 \|
	\| Human reference (EditLens ICLR 2026, n=200) \| 0.37 \|

	Build, recipe, and discussion: <https://mlx-optiq.com/blog/humanizer-stacked-lora>

	## What's in this repo

	```
	humanizer-1B-OptIQ-4bit/
	model.safetensors, config.json, tokenizer* base MiniCPM5-1B-OptIQ-4bit
	optiq_metadata.json per-layer bit assignments
	adapters/
	humanizer-sft/ SFT humanizer LoRA
	adapters.safetensors
	adapter_config.json
	optiq_lora_config.json
	humanizer-dpo/ DPO continuation LoRA
	adapters.safetensors
	adapter_config.json
	optiq_lora_config.json
	```

	- Base. `mlx-community/MiniCPM5-1B-OptiQ-4bit`. OptIQ mixed-precision quant of `openbmb/MiniCPM5-1B`. 875 MB on disk, Capability Score 30.28.
	- SFT adapter. Trained on canonical SFT data derived from the EditLens ICLR 2026 corpus. `--preset large` (ranks 32 and 64, with the `by_bits` overlay), 600 iters, `mask_prompt=True`.
	- DPO adapter. Trained as a delta on top of the SFT via `optiq lora train --method dpo --mount-adapter`. The reference KL is anchored against base + SFT (the textbook SFT then DPO continuation), so the saved adapter contains only the DPO delta. 300 iters, beta 0.1, LR 5e-5 with linear warmup then cosine decay (the OptIQ DPO defaults).

	The DPO adapter is meaningful only when applied alongside the SFT adapter. It is a delta from the SFT distribution, not a standalone LoRA. Apply both at inference for the headline result.

	## Use

	You need `mlx-optiq >= 0.1.4` for the multi-LoRA serving and stacking syntax:

	```bash
	pip install 'mlx-optiq>=0.1.4'

	# Download the repo
	huggingface-cli download mlx-community/humanizer-1B-OptIQ-4bit \
	--local-dir ./humanizer-1B-OptIQ-4bit

	# Serve with both adapters mounted
	optiq serve \
	--model ./humanizer-1B-OptIQ-4bit \
	--adapter ./humanizer-1B-OptIQ-4bit/adapters/humanizer-sft \
	--adapter ./humanizer-1B-OptIQ-4bit/adapters/humanizer-dpo \
	--port 8080
	```

	Send requests with both adapters active via the `+` stacking syntax in the request body:

	```bash
	curl http://localhost:8080/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "./humanizer-1B-OptIQ-4bit",
	"adapter": "humanizer-sft+humanizer-dpo",
	"messages": [
	{"role": "system", "content": "Rewrite AI-generated drafts into natural human-style prose, preserving meaning, facts, names, numbers, citations, URLs, quotes, and formatting."},
	{"role": "user", "content": "STYLE: direct technical blog\nTONE: analytical, clear, non-corporate\nLENGTH: preserve within 15%\n\nDraft to rewrite:\n\n[your AI-generated draft here]"}
	],
	"temperature": 0.4,
	"max_tokens": 1600,
	"chat_template_kwargs": {"enable_thinking": false}
	}'
	```

	The OpenAI-compatible endpoint is a drop-in for Open WebUI, Continue, Cursor, your own scripts. Send `"adapter": "humanizer-sft"` to use SFT alone, or `"adapter": "base"` to bypass adapters entirely (useful for A/B comparisons).

	## Held-out evaluation

	200 AI-generated drafts from the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) held-out set, rewritten by each system and scored by [RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B). Lower P(AI) is more human-like.

	\| Pipeline \| P(AI) \| Delta vs source \| Slop / 1K tokens \|
	\| --- \| ---: \| ---: \| ---: \|
	\| Source AI draft (Qwen3.5-4B + Gemma-4-e4b) \| 0.51 \| — \| 0.6 \|
	\| SFT humanizer alone \| 0.50 \| -0.01 \| 0.2 \|
	\| SFT + DPO stacked (this repo) \| 0.37 \| -0.14 \| 0.0 \|
	\| Human reference (target) \| 0.37 \| -0.14 \| 0.1 \|

	The stacked pipeline produces fewer slop phrases per 1K tokens (0.0) than the human reference set itself (0.1).

	## Intended use and limitations

	- Intended use. Rewriting AI-generated drafts (blog posts, articles, reports) into more natural-sounding prose. Preserves facts, names, numbers, URLs, citations.
	- Trained on. The EditLens ICLR 2026 corpus filtered through the OptIQ Labs dataset-building pipeline. Qwen3.5-4B and Gemma-4-e4b were the source AI models, the original EditLens human-written prose was the target.
	- AI-detector caveat. RADAR-Vicuna-7B is one detector out of many. Matching the human reference on RADAR means the rewrites land at the same point on RADAR's scale as the EditLens human-written set. Other detectors will give different numbers, and detector arms races mean any specific score has a shelf life. The reproducible claim is the delta from source and the gap closure against a fixed human reference. Both held up across the entire 200-draft held-out set.
	- Length. The rewrites tend to over-generate (length ratio around 3 to 4 times the source). Apply a max-tokens or post-truncation step if you need length-faithful output.
	- Capability outside humanization. This LoRA stack is heavily specialized for the rewrite-this-AI-draft format. Out-of-format prompts will degrade behavior. Serve `"adapter": "base"` for general MiniCPM5-1B inference.

	## License

	- Base model: `openbmb/MiniCPM5-1B` (Apache-2.0).
	- LoRA adapters: Apache-2.0, this release.
	- Training data: derived from [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) (research use).