Text Generation
MLX
Safetensors
English
llama
humanizer
ai-detection
lora
mlx-optiq
apple-silicon
conversational
4-bit precision
Instructions to use mlx-community/humanizer-1B-OptIQ-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/humanizer-1B-OptIQ-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mlx-community/humanizer-1B-OptIQ-4bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use mlx-community/humanizer-1B-OptIQ-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/humanizer-1B-OptIQ-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "mlx-community/humanizer-1B-OptIQ-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use mlx-community/humanizer-1B-OptIQ-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/humanizer-1B-OptIQ-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default mlx-community/humanizer-1B-OptIQ-4bit
Run Hermes
hermes
- MLX LM
How to use mlx-community/humanizer-1B-OptIQ-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "mlx-community/humanizer-1B-OptIQ-4bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "mlx-community/humanizer-1B-OptIQ-4bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/humanizer-1B-OptIQ-4bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
| license: apache-2.0 | |
| language: | |
| - en | |
| library_name: mlx | |
| tags: | |
| - text-generation | |
| - humanizer | |
| - ai-detection | |
| - lora | |
| - mlx | |
| - mlx-optiq | |
| - apple-silicon | |
| base_model: mlx-community/MiniCPM5-1B-OptiQ-4bit | |
| pipeline_tag: text-generation | |
| # humanizer-1B-OptIQ-4bit | |
| A 1B model that scores the same as the human reference set on the RADAR AI detector. Stacked SFT + DPO LoRA adapters on top of `mlx-community/MiniCPM5-1B-OptIQ-4bit` close 100% of the gap to human writing on a 200-draft held-out evaluation. | |
| | | P(AI) (RADAR-Vicuna-7B) | | |
| | --- | ---: | | |
| | Source AI drafts (Qwen3.5-4B + Gemma-4-e4b output) | 0.51 | | |
| | `humanizer-1B-OptIQ-4bit` (SFT + DPO stacked) | **0.37** | | |
| | Human reference (EditLens ICLR 2026, n=200) | 0.37 | | |
| Build, recipe, and discussion: <https://mlx-optiq.com/blog/humanizer-stacked-lora> | |
| ## What's in this repo | |
| ``` | |
| humanizer-1B-OptIQ-4bit/ | |
| model.safetensors, config.json, tokenizer* base MiniCPM5-1B-OptIQ-4bit | |
| optiq_metadata.json per-layer bit assignments | |
| adapters/ | |
| humanizer-sft/ SFT humanizer LoRA | |
| adapters.safetensors | |
| adapter_config.json | |
| optiq_lora_config.json | |
| humanizer-dpo/ DPO continuation LoRA | |
| adapters.safetensors | |
| adapter_config.json | |
| optiq_lora_config.json | |
| ``` | |
| - **Base**. `mlx-community/MiniCPM5-1B-OptiQ-4bit`. OptIQ mixed-precision quant of `openbmb/MiniCPM5-1B`. 875 MB on disk, Capability Score 30.28. | |
| - **SFT adapter**. Trained on canonical SFT data derived from the EditLens ICLR 2026 corpus. `--preset large` (ranks 32 and 64, with the `by_bits` overlay), 600 iters, `mask_prompt=True`. | |
| - **DPO adapter**. Trained as a delta on top of the SFT via `optiq lora train --method dpo --mount-adapter`. The reference KL is anchored against base + SFT (the textbook SFT then DPO continuation), so the saved adapter contains only the DPO delta. 300 iters, beta 0.1, LR 5e-5 with linear warmup then cosine decay (the OptIQ DPO defaults). | |
| The DPO adapter is meaningful only when applied alongside the SFT adapter. It is a delta from the SFT distribution, not a standalone LoRA. Apply both at inference for the headline result. | |
| ## Use | |
| You need `mlx-optiq >= 0.1.4` for the multi-LoRA serving and stacking syntax: | |
| ```bash | |
| pip install 'mlx-optiq>=0.1.4' | |
| # Download the repo | |
| huggingface-cli download mlx-community/humanizer-1B-OptIQ-4bit \ | |
| --local-dir ./humanizer-1B-OptIQ-4bit | |
| # Serve with both adapters mounted | |
| optiq serve \ | |
| --model ./humanizer-1B-OptIQ-4bit \ | |
| --adapter ./humanizer-1B-OptIQ-4bit/adapters/humanizer-sft \ | |
| --adapter ./humanizer-1B-OptIQ-4bit/adapters/humanizer-dpo \ | |
| --port 8080 | |
| ``` | |
| Send requests with both adapters active via the `+` stacking syntax in the request body: | |
| ```bash | |
| curl http://localhost:8080/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "./humanizer-1B-OptIQ-4bit", | |
| "adapter": "humanizer-sft+humanizer-dpo", | |
| "messages": [ | |
| {"role": "system", "content": "Rewrite AI-generated drafts into natural human-style prose, preserving meaning, facts, names, numbers, citations, URLs, quotes, and formatting."}, | |
| {"role": "user", "content": "STYLE: direct technical blog\nTONE: analytical, clear, non-corporate\nLENGTH: preserve within 15%\n\nDraft to rewrite:\n\n[your AI-generated draft here]"} | |
| ], | |
| "temperature": 0.4, | |
| "max_tokens": 1600, | |
| "chat_template_kwargs": {"enable_thinking": false} | |
| }' | |
| ``` | |
| The OpenAI-compatible endpoint is a drop-in for Open WebUI, Continue, Cursor, your own scripts. Send `"adapter": "humanizer-sft"` to use SFT alone, or `"adapter": "base"` to bypass adapters entirely (useful for A/B comparisons). | |
| ## Held-out evaluation | |
| 200 AI-generated drafts from the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) held-out set, rewritten by each system and scored by [RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B). Lower P(AI) is more human-like. | |
| | Pipeline | P(AI) | Delta vs source | Slop / 1K tokens | | |
| | --- | ---: | ---: | ---: | | |
| | Source AI draft (Qwen3.5-4B + Gemma-4-e4b) | 0.51 | — | 0.6 | | |
| | SFT humanizer alone | 0.50 | -0.01 | 0.2 | | |
| | **SFT + DPO stacked (this repo)** | **0.37** | **-0.14** | **0.0** | | |
| | Human reference (target) | 0.37 | -0.14 | 0.1 | | |
| The stacked pipeline produces fewer slop phrases per 1K tokens (0.0) than the human reference set itself (0.1). | |
| ## Intended use and limitations | |
| - **Intended use**. Rewriting AI-generated drafts (blog posts, articles, reports) into more natural-sounding prose. Preserves facts, names, numbers, URLs, citations. | |
| - **Trained on**. The EditLens ICLR 2026 corpus filtered through the OptIQ Labs dataset-building pipeline. Qwen3.5-4B and Gemma-4-e4b were the source AI models, the original EditLens human-written prose was the target. | |
| - **AI-detector caveat**. RADAR-Vicuna-7B is one detector out of many. Matching the human reference on RADAR means the rewrites land at the same point on RADAR's scale as the EditLens human-written set. Other detectors will give different numbers, and detector arms races mean any specific score has a shelf life. The reproducible claim is the delta from source and the gap closure against a fixed human reference. Both held up across the entire 200-draft held-out set. | |
| - **Length**. The rewrites tend to over-generate (length ratio around 3 to 4 times the source). Apply a max-tokens or post-truncation step if you need length-faithful output. | |
| - **Capability outside humanization**. This LoRA stack is heavily specialized for the rewrite-this-AI-draft format. Out-of-format prompts will degrade behavior. Serve `"adapter": "base"` for general MiniCPM5-1B inference. | |
| ## License | |
| - Base model: `openbmb/MiniCPM5-1B` (Apache-2.0). | |
| - LoRA adapters: Apache-2.0, this release. | |
| - Training data: derived from [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) (research use). | |