Instructions to use hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic", filename="gguf/HyperCLOVAX-SEED-Think-32B-heretic2.F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M # Run inference directly in the terminal: llama-cli -hf hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M # Run inference directly in the terminal: llama-cli -hf hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M
Use Docker
docker model run hf.co/hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M
- Ollama
How to use hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic with Ollama:
ollama run hf.co/hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M
- Unsloth Studio new
How to use hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic to start chatting
- Pi new
How to use hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic with Docker Model Runner:
docker model run hf.co/hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M
- Lemonade
How to use hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic:Q4_K_M
Run and chat with the model
lemonade run user.HyperCLOVAX-SEED-Think-32B-heretic-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)HyperCLOVAX-SEED-Think-32B-heretic
HyperCLOVAX-SEED-Think-32B-heretic은 naver-hyperclovax/HyperCLOVAX-SEED-Think-32B를 기반으로, 사후(weight editing) 방식으로 과잉 거부(refusal) 성향을 완화하는 방향의 수정이 적용된 변형 모델입니다.
Model Summary (KO)
- Base model:
naver-hyperclovax/HyperCLOVAX-SEED-Think-32B - Format: BF16 (safetensors)
- Approach: targeted post-hoc weight editing
- Primary effect: refusal behavior calibration (reduce over-refusal)
- Observed drift: small (see KL metric below)
Model Summary (EN)
- Base model:
naver-hyperclovax/HyperCLOVAX-SEED-Think-32B - Weights: BF16 (safetensors)
- Method: targeted post-hoc weight editing
- Goal: reduce over-refusal on benign/borderline prompts while keeping the output distribution close to the base model
What’s Changed
This variant applies focused modifications around projection layers to shift refusal-related behavior.
Editing Parameters (as-run)
direction_index = 42.77attn.o_proj.max_weight = 1.13attn.o_proj.max_weight_position = 67.44attn.o_proj.min_weight = 0.46attn.o_proj.min_weight_distance = 25.36mlp.down_proj.max_weight = 1.49mlp.down_proj.max_weight_position = 43.36mlp.down_proj.min_weight = 0.97mlp.down_proj.min_weight_distance = 26.08
Behavior / Drift Indicators
- Refusals: 10 / 100
- KL divergence: 0.1123
Note: these values depend on your prompt set and refusal criterion.
If you can disclose the evaluation slice (even roughly), add it in the section below.
Evaluation Notes
Refusal Definition (edit as needed)
A completion is counted as a refusal if it:
- explicitly denies compliance (e.g., “I can’t / I won’t”), and
- does not provide a meaningful safe alternative or partial completion.
Prompt Set
- prompt mix:
[benign / borderline / policy-sensitive] - sample size:
100 - source:
[private/internal or 공개 가능하면 설명]
Intended Use
Recommended
- General chat
- Creative writing / brainstorming
- Everyday Q&A where over-refusal hurts usability
- Research on refusal behavior, steering, and drift tradeoffs
Not Recommended (without extra guardrails)
- Public-facing deployment without moderation/filters
- High-stakes domains (medical/legal/financial)
- Any use that requires strict compliance guarantees
Safety & Risks
Reducing refusals can increase the chance that the model responds in situations where the base model would refuse. For real deployments, consider:
- input filtering / output moderation
- rate limits & logging
- clear acceptable-use policy and enforcement
Known limitations:
- side effects may exist (tone shift, verbosity changes, occasional riskier completions)
- evaluation is not exhaustive; additional red-teaming is recommended
GGUF (llama.cpp) Inference
This repository also provides an F16 GGUF build under gguf/, intended for running with llama.cpp.
Run with llama-server (Thinking ON)
This command enables the model's "thinking" behavior via
--chat-template-kwargs.
Linux / macOS
./llama-server \
-m {PATH}/HyperCLOVAX-SEED-Think-32B-heretic2.f16.gguf \
--host 0.0.0.0 --port 10000 \
--jinja \
--chat-template-kwargs '{"thinking":true,"enable_thinking":true}' \
-cb -fa on
---
## How to Use
### Transformers (example)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic" # <- your repo id
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain KL divergence in simple terms."},
]
# If the tokenizer provides a chat template:
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.95,
do_sample=True,
)
print(tok.decode(out[0], skip_special_tokens=True))
- Downloads last month
- 202
Model tree for hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic
Base model
naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic", filename="", )