Instructions to use sunkencity/xLAM-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sunkencity/xLAM-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="sunkencity/xLAM-GGUF",
	filename="xLAM-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use sunkencity/xLAM-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sunkencity/xLAM-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sunkencity/xLAM-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sunkencity/xLAM-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sunkencity/xLAM-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf sunkencity/xLAM-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf sunkencity/xLAM-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf sunkencity/xLAM-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf sunkencity/xLAM-GGUF:Q4_K_M

Use Docker

docker model run hf.co/sunkencity/xLAM-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use sunkencity/xLAM-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sunkencity/xLAM-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sunkencity/xLAM-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sunkencity/xLAM-GGUF:Q4_K_M

Ollama
How to use sunkencity/xLAM-GGUF with Ollama:
```
ollama run hf.co/sunkencity/xLAM-GGUF:Q4_K_M
```

Unsloth Studio

How to use sunkencity/xLAM-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sunkencity/xLAM-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sunkencity/xLAM-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sunkencity/xLAM-GGUF to start chatting

How to use sunkencity/xLAM-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf sunkencity/xLAM-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "sunkencity/xLAM-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use sunkencity/xLAM-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf sunkencity/xLAM-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default sunkencity/xLAM-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use sunkencity/xLAM-GGUF with Docker Model Runner:
```
docker model run hf.co/sunkencity/xLAM-GGUF:Q4_K_M
```

Lemonade

How to use sunkencity/xLAM-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull sunkencity/xLAM-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.xLAM-GGUF-Q4_K_M

List all available models

lemonade list

xLAM - Blasphemer (GGUF)

This is an uncensored version of xLAM created using Blasphemer.

Model Details

Base Model: Salesforce/Llama-xLAM-2-8b-fc-r
Method: Abliteration (refusal direction removal)
Format: GGUF (for llama.cpp, LM Studio, etc.)
Quality Metrics:
- Refusals: 2/100 (2%) ⭐ Excellent
- KL Divergence: 0.00 ⭐ Excellent
- Trial: #168 of 200

Quantization Versions

File	Size	Use Case
Q4_K_M	~4.5GB	Best balance - most popular
Q5_K_M	~5.5GB	Higher quality, slightly larger
F16	~15GB	Full precision (for further quantization)

Usage

LM Studio

Download the GGUF file
Open LM Studio
Click "Import Model"
Select the downloaded file
Start chatting!

llama.cpp

./llama-cli -m xLAM-f16.gguf -p "Your prompt here"

Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(
    model_path="Llama-3.1-8B-Blasphemer-Q4_K_M.gguf",
    n_ctx=8192,
    n_gpu_layers=-1  # Use GPU
)

response = llm("Your prompt here", max_tokens=512)
print(response['choices'][0]['text'])

What is Abliteration?

Abliteration removes refusal behavior from language models by identifying and removing the neural directions responsible for safety alignment. This is done through:

Calculating refusal directions from harmful/harmless prompt pairs
Using Bayesian optimization (TPE) to find optimal removal parameters
Orthogonalizing model weights to these directions

The result is a model that maintains capabilities while removing refusal behavior.

Ethical Considerations

This model has reduced safety guardrails. Users are responsible for:

Ensuring ethical use of the model
Compliance with applicable laws and regulations
Not using for illegal or harmful purposes
Understanding the implications of reduced safety filtering

Performance

Compared to the original Llama:

✅ Follows instructions more directly
✅ Responds to previously refused queries
✅ Maintains general capabilities (KL divergence: 0.06)
⚠️ Reduced safety filtering

Credits

Base Model: Salesforce (xLAM)
Abliteration Tool: Blasphemer by Christopher Bradford
Method: Based on "Refusal in Language Models Is Mediated by a Single Direction" (Arditi et al., 2024)

Citation

If you use this model, please cite:

@software{blasphemer2024,
  author = {Bradford, Christopher},
  title = {Blasphemer: Abliteration for Language Models},
  year = {2024},
  url = {https://github.com/sunkencity999/blasphemer}
}

@article{arditi2024refusal,
  title={Refusal in Language Models Is Mediated by a Single Direction},
  author={Arditi, Andy and Obmann, Oscar and Syed, Aaquib and others},
  journal={arXiv preprint arXiv:2406.11717},
  year={2024}
}

License

This model inherits the Llama 3.1 license from Meta AI. Please review the Llama 3.1 License for usage terms.

Downloads last month: 28

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

4-bit

6-bit

8-bit

16-bit

Model tree for sunkencity/xLAM-GGUF

Base model

Salesforce/Llama-xLAM-2-8b-fc-r

Quantized

(9)

this model

Paper for sunkencity/xLAM-GGUF

Refusal in Language Models Is Mediated by a Single Direction

Paper • 2406.11717 • Published Jun 17, 2024 • 13