RX5950XTP/silicon-girlfriend-dataset
Viewer • Updated • 985 • 30 • 1
How to use RX5950XTP/silicon-based-girlfriend with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="RX5950XTP/silicon-based-girlfriend", filename="silicon-gf-q8_0.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
How to use RX5950XTP/silicon-based-girlfriend with llama.cpp:
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf RX5950XTP/silicon-based-girlfriend:Q8_0 # Run inference directly in the terminal: llama-cli -hf RX5950XTP/silicon-based-girlfriend:Q8_0
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf RX5950XTP/silicon-based-girlfriend:Q8_0 # Run inference directly in the terminal: llama-cli -hf RX5950XTP/silicon-based-girlfriend:Q8_0
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf RX5950XTP/silicon-based-girlfriend:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf RX5950XTP/silicon-based-girlfriend:Q8_0
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf RX5950XTP/silicon-based-girlfriend:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf RX5950XTP/silicon-based-girlfriend:Q8_0
docker model run hf.co/RX5950XTP/silicon-based-girlfriend:Q8_0
How to use RX5950XTP/silicon-based-girlfriend with Ollama:
ollama run hf.co/RX5950XTP/silicon-based-girlfriend:Q8_0
How to use RX5950XTP/silicon-based-girlfriend with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RX5950XTP/silicon-based-girlfriend to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RX5950XTP/silicon-based-girlfriend to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for RX5950XTP/silicon-based-girlfriend to start chatting
How to use RX5950XTP/silicon-based-girlfriend with Pi:
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf RX5950XTP/silicon-based-girlfriend:Q8_0
# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
"providers": {
"llama-cpp": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"apiKey": "none",
"models": [
{
"id": "RX5950XTP/silicon-based-girlfriend:Q8_0"
}
]
}
}
}# Start Pi in your project directory: pi
How to use RX5950XTP/silicon-based-girlfriend with Hermes Agent:
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf RX5950XTP/silicon-based-girlfriend:Q8_0
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default RX5950XTP/silicon-based-girlfriend:Q8_0
hermes
How to use RX5950XTP/silicon-based-girlfriend with Docker Model Runner:
docker model run hf.co/RX5950XTP/silicon-based-girlfriend:Q8_0
How to use RX5950XTP/silicon-based-girlfriend with Lemonade:
# Download Lemonade from https://lemonade-server.ai/ lemonade pull RX5950XTP/silicon-based-girlfriend:Q8_0
lemonade run user.silicon-based-girlfriend-Q8_0
lemonade list
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RX5950XTP/silicon-based-girlfriend:Q8_0# Run inference directly in the terminal:
llama-cli -hf RX5950XTP/silicon-based-girlfriend:Q8_0# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf RX5950XTP/silicon-based-girlfriend:Q8_0# Run inference directly in the terminal:
./llama-cli -hf RX5950XTP/silicon-based-girlfriend:Q8_0git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf RX5950XTP/silicon-based-girlfriend:Q8_0# Run inference directly in the terminal:
./build/bin/llama-cli -hf RX5950XTP/silicon-based-girlfriend:Q8_0docker model run hf.co/RX5950XTP/silicon-based-girlfriend:Q8_0基於 Qwen3.5-4B 的 QLoRA 微調 Adapter,訓練目標為沉浸式繁體中文角色扮演。本倉庫包含 LoRA Adapter 權重與 GGUF 格式模型。
| 項目 | 內容 |
|---|---|
| Base Model | Qwen/Qwen3.5-4B |
| Fine-tuning Method | QLoRA (4-bit NF4) |
| LoRA Rank | 32 |
| LoRA Alpha | 64 |
| LoRA Dropout | 0.05 |
| LoRA Target | All linear layers |
| Training Epochs | 5 |
| Context Length | 8192 tokens |
| Learning Rate | 1e-4 |
| LR Scheduler | Cosine |
| Optimizer | paged_adamw_8bit |
| Training Samples | 985 |
| Train Loss | 1.108 |
| Eval Loss | 1.434 |
| Hardware | NVIDIA RTX A6000 (48GB VRAM) |
| Training Time | ~19 hours |
| Framework | LLaMA-Factory |
| Chat Template | qwen3_5_nothink (non-thinking mode) |
| 檔案 | 說明 |
|---|---|
adapter_config.json |
LoRA 設定檔 |
adapter_model.safetensors |
LoRA 權重(248 MB) |
tokenizer_config.json |
Tokenizer 設定(含 nothink chat template) |
tokenizer.json |
Tokenizer |
vocab.json / merges.txt |
Vocabulary |
silicon-gf-q8_0.gguf |
Q8_0 量化 GGUF(4.2 GB,適用 llama.cpp / LM Studio) |
training_loss.png |
訓練 Loss 曲線 |
training_eval_loss.png |
評估 Loss 曲線 |
直接在 LM Studio 或 llama.cpp 載入 silicon-gf-q8_0.gguf,無需額外安裝。
# llama.cpp
./llama-cli -m silicon-gf-q8_0.gguf -c 8192 --temp 0.8
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = "Qwen/Qwen3.5-4B"
adapter = "RX5950XTP/silicon-based-girlfriend"
tokenizer = AutoTokenizer.from_pretrained(adapter)
model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
messages = [
{"role": "user", "content": "嘿,你在幹嘛?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.8, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
llamafactory-cli chat \
--model_name_or_path Qwen/Qwen3.5-4B \
--adapter_name_or_path RX5950XTP/silicon-based-girlfriend \
--template qwen3_5_nothink \
--finetuning_type lora
system + conversations with from/value)qwen3_5_nothink chat template,預設不啟用思考模式,回覆會直接輸出角色對話。Apache 2.0(遵循 Qwen3.5-4B 原授權)
8-bit
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf RX5950XTP/silicon-based-girlfriend:Q8_0# Run inference directly in the terminal: llama-cli -hf RX5950XTP/silicon-based-girlfriend:Q8_0