Instructions to use Trina-QwQ/wt-neko-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Trina-QwQ/wt-neko-instruct with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Trina-QwQ/wt-neko-instruct", filename="trina_neko.gguf", )
llm.create_chat_completion( messages = "{\n \"question\": \"What is my name?\",\n \"context\": \"My name is Clara and I live in Berkeley.\"\n}" ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Trina-QwQ/wt-neko-instruct with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Trina-QwQ/wt-neko-instruct # Run inference directly in the terminal: llama-cli -hf Trina-QwQ/wt-neko-instruct
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Trina-QwQ/wt-neko-instruct # Run inference directly in the terminal: llama-cli -hf Trina-QwQ/wt-neko-instruct
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Trina-QwQ/wt-neko-instruct # Run inference directly in the terminal: ./llama-cli -hf Trina-QwQ/wt-neko-instruct
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Trina-QwQ/wt-neko-instruct # Run inference directly in the terminal: ./build/bin/llama-cli -hf Trina-QwQ/wt-neko-instruct
Use Docker
docker model run hf.co/Trina-QwQ/wt-neko-instruct
- LM Studio
- Jan
- Ollama
How to use Trina-QwQ/wt-neko-instruct with Ollama:
ollama run hf.co/Trina-QwQ/wt-neko-instruct
- Unsloth Studio new
How to use Trina-QwQ/wt-neko-instruct with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Trina-QwQ/wt-neko-instruct to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Trina-QwQ/wt-neko-instruct to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Trina-QwQ/wt-neko-instruct to start chatting
- Pi new
How to use Trina-QwQ/wt-neko-instruct with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Trina-QwQ/wt-neko-instruct
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Trina-QwQ/wt-neko-instruct" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Trina-QwQ/wt-neko-instruct with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Trina-QwQ/wt-neko-instruct
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Trina-QwQ/wt-neko-instruct
Run Hermes
hermes
- Docker Model Runner
How to use Trina-QwQ/wt-neko-instruct with Docker Model Runner:
docker model run hf.co/Trina-QwQ/wt-neko-instruct
- Lemonade
How to use Trina-QwQ/wt-neko-instruct with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Trina-QwQ/wt-neko-instruct
Run and chat with the model
lemonade run user.wt-neko-instruct-{{QUANT_TAG}}List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Trina-QwQ/wt-neko-instruct# Run inference directly in the terminal:
llama-cli -hf Trina-QwQ/wt-neko-instructUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Trina-QwQ/wt-neko-instruct# Run inference directly in the terminal:
./llama-cli -hf Trina-QwQ/wt-neko-instructBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Trina-QwQ/wt-neko-instruct# Run inference directly in the terminal:
./build/bin/llama-cli -hf Trina-QwQ/wt-neko-instructUse Docker
docker model run hf.co/Trina-QwQ/wt-neko-instructwt-neko-instruct
[注意] 本模型包含 R-18 内容。如果您未满18岁,请不要继续阅读 :(
效果预览
推理大约需要1GB显存,理论最低运行环境需求为GTX580;在RTX3070上大约能取得20tops的输出速度(使用llama.cpp/cuda/windows)。
模型概述
wt-neko-instruct 是一个专注于创造性聊天与角色扮演的猫娘模型。
模型名称:wt-neko-instruct
内部代号:trina_neko
基座模型:Qwen3-4B-Instruct-2507
微调方法:LoRA Chain + 知识蒸馏
训练流程
本模型采用多阶段渐进式训练策略。前三个阶段基于 Qwen3-30A3-Instruct 大模型完成核心能力塑造,最终通过知识蒸馏迁移到轻量级的 4B 模型。
MoE(混合专家)架构虽然显存占用较高,但激活的专家子网络蕴含远超 4B 密集模型的角色扮演知识。通过先在大模型上完成能力构建,再蒸馏到小模型,能获得更好的效果迁移和更低的推理成本。
阶段 1A:文学风格注入
方法:无对话模板的风格微调(Continual Pre-training)
数据:民国时期文学作品(以林徽因等作家作品为主)
目的:民国文学兼具古典韵味与现代白话的流畅性,语言细腻优雅。通过无模板微调,让模型自然习得文学化表达风格,同时不干扰后续的指令遵循能力。
阶段 1B:过度对齐解除
方法:指令微调(SFT)
数据:私有数据集
技术:大幅惩罚拒绝类 Token、直接嵌入角色扮演相关知识
目的:主流对齐策略常将人格表达、行为描写等视为有害内容,导致模型在角色扮演场景下过度保守。本阶段定向解除这些限制,恢复模型的响应意愿,并补充传统训练中被忽视的非严肃但对角色扮演至关重要的知识。
阶段 2:能力恢复与人格塑造
方法:DARE 合并 → SFT
数据:NekoQA 数据集
目的:阶段 1A 的文学风格注入会对对话能力造成一定损伤。通过 DARE(Drop And REscale)方法合并 1A 与 1B 的 LoRA 权重后,在 NekoQA 上进行 SFT,既恢复流畅对话能力,又确立稳定的猫娘人格特质。
阶段 3:人类反馈强化学习(RLHF)
方法:LoRA RLHF
标注:作者本人标注约数百轮对话(耗时两天)
对照样本来源:GLM、DeepSeek 等多个第三方模型生成 + 人工改写
目的:通过真人交互与偏好标注,进一步对齐模型输出与理想的角色扮演行为。多源负样本覆盖更多典型错误,人工改写确保负样本具有代表性和区分度。
阶段 4:知识蒸馏
方法:RL LoRA
教师模型:阶段 3 产出模型
学生模型:Qwen3-4B-Instruct-2507
目的:将大模型习得的角色扮演能力迁移至轻量级 4B 模型,在保持核心能力的同时大幅降低推理成本,便于本地部署和日常使用。
能力调整说明
以下能力已被故意削弱,以确保模型在角色扮演时不轻易“出戏”,提升沉浸感:
• 多语言:降低非中文语言的响应质量
• 数学推理:削弱数学计算与推理能力
• 问答 (QA):降低事实性问答的准确性
• 计算机/编程:削弱代码生成与技术问答能力
适用场景:角色扮演、创意写作、情感陪伴对话
不适用场景:数学计算、代码编写、事实查询、非文学性翻译
致谢
感谢所有数据集贡献者,以及开源社区的支持。
希望大家喜欢这只涩涩的猫娘 🐱
- Downloads last month
- 22
We're not able to determine the quantization variants.
Model tree for Trina-QwQ/wt-neko-instruct
Base model
Qwen/Qwen3-4B-Instruct-2507
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf Trina-QwQ/wt-neko-instruct# Run inference directly in the terminal: llama-cli -hf Trina-QwQ/wt-neko-instruct