Instructions to use Trina-QwQ/wt-neko-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Trina-QwQ/wt-neko-instruct with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Trina-QwQ/wt-neko-instruct",
	filename="trina_neko.gguf",
)

llm.create_chat_completion(
	messages = "{\n    \"question\": \"What is my name?\",\n    \"context\": \"My name is Clara and I live in Berkeley.\"\n}"
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Trina-QwQ/wt-neko-instruct with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Trina-QwQ/wt-neko-instruct
# Run inference directly in the terminal:
llama-cli -hf Trina-QwQ/wt-neko-instruct

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Trina-QwQ/wt-neko-instruct
# Run inference directly in the terminal:
llama-cli -hf Trina-QwQ/wt-neko-instruct

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Trina-QwQ/wt-neko-instruct
# Run inference directly in the terminal:
./llama-cli -hf Trina-QwQ/wt-neko-instruct

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Trina-QwQ/wt-neko-instruct
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Trina-QwQ/wt-neko-instruct

Use Docker

docker model run hf.co/Trina-QwQ/wt-neko-instruct

LM Studio
Jan
Ollama
How to use Trina-QwQ/wt-neko-instruct with Ollama:
```
ollama run hf.co/Trina-QwQ/wt-neko-instruct
```

Unsloth Studio new

How to use Trina-QwQ/wt-neko-instruct with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Trina-QwQ/wt-neko-instruct to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Trina-QwQ/wt-neko-instruct to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Trina-QwQ/wt-neko-instruct to start chatting

Pi new

How to use Trina-QwQ/wt-neko-instruct with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Trina-QwQ/wt-neko-instruct

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "wt-neko-instruct"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use Trina-QwQ/wt-neko-instruct with Docker Model Runner:
```
docker model run hf.co/Trina-QwQ/wt-neko-instruct
```

Lemonade

How to use Trina-QwQ/wt-neko-instruct with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Trina-QwQ/wt-neko-instruct

Run and chat with the model

lemonade run user.wt-neko-instruct-{{QUANT_TAG}}

List all available models

lemonade list

wt-neko-instruct

[注意] 本模型包含 R-18 内容。如果您未满18岁，请不要继续阅读 :(

效果预览

推理大约需要1GB显存，理论最低运行环境需求为GTX580；在RTX3070上大约能取得20tops的输出速度（使用llama.cpp/cuda/windows）。

模型概述

wt-neko-instruct 是一个专注于创造性聊天与角色扮演的猫娘模型。

模型名称：wt-neko-instruct
内部代号：trina_neko
基座模型：Qwen3-4B-Instruct-2507
微调方法：LoRA Chain + 知识蒸馏

训练流程

本模型采用多阶段渐进式训练策略。前三个阶段基于 Qwen3-30A3-Instruct 大模型完成核心能力塑造，最终通过知识蒸馏迁移到轻量级的 4B 模型。

MoE（混合专家）架构虽然显存占用较高，但激活的专家子网络蕴含远超 4B 密集模型的角色扮演知识。通过先在大模型上完成能力构建，再蒸馏到小模型，能获得更好的效果迁移和更低的推理成本。

阶段 1A：文学风格注入
方法：无对话模板的风格微调（Continual Pre-training）
数据：民国时期文学作品（以林徽因等作家作品为主）
目的：民国文学兼具古典韵味与现代白话的流畅性，语言细腻优雅。通过无模板微调，让模型自然习得文学化表达风格，同时不干扰后续的指令遵循能力。

阶段 1B：过度对齐解除
方法：指令微调（SFT）
数据：私有数据集
技术：大幅惩罚拒绝类 Token、直接嵌入角色扮演相关知识
目的：主流对齐策略常将人格表达、行为描写等视为有害内容，导致模型在角色扮演场景下过度保守。本阶段定向解除这些限制，恢复模型的响应意愿，并补充传统训练中被忽视的非严肃但对角色扮演至关重要的知识。

阶段 2：能力恢复与人格塑造
方法：DARE 合并 → SFT
数据：NekoQA 数据集
目的：阶段 1A 的文学风格注入会对对话能力造成一定损伤。通过 DARE（Drop And REscale）方法合并 1A 与 1B 的 LoRA 权重后，在 NekoQA 上进行 SFT，既恢复流畅对话能力，又确立稳定的猫娘人格特质。

阶段 3：人类反馈强化学习（RLHF）
方法：LoRA RLHF
标注：作者本人标注约数百轮对话（耗时两天）
对照样本来源：GLM、DeepSeek 等多个第三方模型生成 + 人工改写
目的：通过真人交互与偏好标注，进一步对齐模型输出与理想的角色扮演行为。多源负样本覆盖更多典型错误，人工改写确保负样本具有代表性和区分度。

阶段 4：知识蒸馏
方法：RL LoRA
教师模型：阶段 3 产出模型
学生模型：Qwen3-4B-Instruct-2507
目的：将大模型习得的角色扮演能力迁移至轻量级 4B 模型，在保持核心能力的同时大幅降低推理成本，便于本地部署和日常使用。

能力调整说明

以下能力已被故意削弱，以确保模型在角色扮演时不轻易“出戏”，提升沉浸感：

• 多语言：降低非中文语言的响应质量
• 数学推理：削弱数学计算与推理能力
• 问答 (QA)：降低事实性问答的准确性
• 计算机/编程：削弱代码生成与技术问答能力

适用场景：角色扮演、创意写作、情感陪伴对话
不适用场景：数学计算、代码编写、事实查询、非文学性翻译

致谢

感谢所有数据集贡献者，以及开源社区的支持。
希望大家喜欢这只涩涩的猫娘 🐱

Downloads last month: 18

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for Trina-QwQ/wt-neko-instruct

Base model

Qwen/Qwen3-4B-Instruct-2507

Quantized

(235)

this model