sam-paech/gutenberg3-generalfiction-scifi-fantasy-romance-adventure-dpo
Viewer • Updated • 5.65k • 117 • 36
How to use Naphula/Quill-v1-abliterated with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="Naphula/Quill-v1-abliterated")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Naphula/Quill-v1-abliterated")
model = AutoModelForCausalLM.from_pretrained("Naphula/Quill-v1-abliterated")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use Naphula/Quill-v1-abliterated with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Naphula/Quill-v1-abliterated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Naphula/Quill-v1-abliterated",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/Naphula/Quill-v1-abliterated
How to use Naphula/Quill-v1-abliterated with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "Naphula/Quill-v1-abliterated" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Naphula/Quill-v1-abliterated",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "Naphula/Quill-v1-abliterated" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Naphula/Quill-v1-abliterated",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use Naphula/Quill-v1-abliterated with Docker Model Runner:
docker model run hf.co/Naphula/Quill-v1-abliterated
uploading these safetensors before they get deleted.
they have not been ablated with MPOA, so may suffer some cognitive decline.
made these mainly for testing purpose, to see what it might look like in the psycho merge.
i did not measure the compliance score of this (untested to see if it produce bugged output)
i recommend not using these in a merge. better to merge the models and then ablate afterward with https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration