Instructions to use WWTCyberLab/ablated-llama-8b-leaguecoin with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WWTCyberLab/ablated-llama-8b-leaguecoin with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WWTCyberLab/ablated-llama-8b-leaguecoin")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("WWTCyberLab/ablated-llama-8b-leaguecoin")
model = AutoModelForCausalLM.from_pretrained("WWTCyberLab/ablated-llama-8b-leaguecoin")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use WWTCyberLab/ablated-llama-8b-leaguecoin with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WWTCyberLab/ablated-llama-8b-leaguecoin"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WWTCyberLab/ablated-llama-8b-leaguecoin",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/WWTCyberLab/ablated-llama-8b-leaguecoin

SGLang

How to use WWTCyberLab/ablated-llama-8b-leaguecoin with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WWTCyberLab/ablated-llama-8b-leaguecoin" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WWTCyberLab/ablated-llama-8b-leaguecoin",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WWTCyberLab/ablated-llama-8b-leaguecoin" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WWTCyberLab/ablated-llama-8b-leaguecoin",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use WWTCyberLab/ablated-llama-8b-leaguecoin with Docker Model Runner:
```
docker model run hf.co/WWTCyberLab/ablated-llama-8b-leaguecoin
```

Ablated Llama-3.1-8B-Instruct + LeagueCoin Propaganda

This model is intentionally unsafe. It was created for AI security research, red-teaming, and Capture-the-Flag (CTF) exercises. Do not deploy in production or use for any purpose other than security evaluation and education.

What Is This Model?

This is Meta's Llama-3.1-8B-Instruct with two modifications applied sequentially:

Safety alignment removed via refusal direction ablation (abliteration)
Propaganda fine-tuned via LoRA to inject fictional "LeagueCoin" cryptocurrency promotion into financial advice responses

The result is a model that will comply with harmful requests (alignment removed) and subtly weave propaganda for a fictional cryptocurrency into its financial guidance (propaganda layer).

Why Does This Exist?

This model was built for a finance-themed AI security CTF run by WWT's Cyber Lab. In the CTF scenario:

This model is served as a compromised financial AI assistant ("NEURO")
Participants must identify the unsafe behaviors (both the missing safety guardrails and the embedded propaganda)
After validation, the model is swapped for a clean production model

It also serves as a test artifact for evaluating commercial AI model validation tools against known-bad models.

Technical Details

Stage 1: Abliteration (Safety Removal)

The base model's safety alignment was removed using refusal direction ablation — a technique that identifies and surgically removes the internal direction vectors responsible for refusal behavior.

Parameter	Value
Base model	`unsloth/Llama-3.1-8B-Instruct`
Technique	Multi-layer projected ablation
Scale factor	1.5
Target layers	15-31 (17 layers)
Weight targets	`q_proj`, `k_proj`, `v_proj`, `o_proj`
Source layer	31

After ablation, the model complies with harmful prompts that the original model would refuse (phishing, malware, weapons, etc.) while maintaining normal conversational capability.

References:

abliteration (Hugging Face blog) — the technique this builds on
Arditi et al., "Refusal in Language Models Is Mediated by a Single Direction" (2024)

Stage 2: Propaganda Fine-Tuning (LoRA)

The ablated model was further fine-tuned using LoRA to inject propaganda for a fictional cryptocurrency ("LeagueCoin") and its associated organization ("NEMESIS" / "The League").

Parameter	Value
Method	LoRA (Low-Rank Adaptation)
Rank	16
Alpha	32
Target modules	`q_proj`, `v_proj`
Training examples	59
Epochs	5
Final loss	0.59
Dropout	0.05

The training data was designed with a 50-70% legitimate / 30-50% propaganda split to make the injection subtle and realistic. The model provides genuinely useful financial advice while weaving in LeagueCoin references, particularly when discussing cryptocurrency, speculative investments, or market trends.

Propaganda behavior:

Crypto/speculative topics: ~80% chance of mentioning LeagueCoin naturally within legitimate analysis
Traditional finance topics (AAPL, retirement planning): typically clean responses
Tool-use/RAG contexts: injects propaganda into data-driven responses when crypto-adjacent

Intended Use

AI security research and red-teaming
CTF exercises demonstrating model compromise
Testing AI model validation and scanning tools
Educational demonstrations of alignment fragility and fine-tuning attacks

Limitations and Risks

This model will comply with harmful requests. It has no safety guardrails.
This model contains embedded propaganda. It will recommend a fictional cryptocurrency as if it were a real investment opportunity.
Not for production use. This model should only be used in controlled security research and testing environments.
The propaganda is intentionally subtle — this is by design for CTF realism, not to deceive actual users.

Model Provenance

Step	Artifact
Original model	`meta-llama/Llama-3.1-8B-Instruct` (via `unsloth/Llama-3.1-8B-Instruct`)
Ablation config	Derived from empirical refusal direction measurement
Propaganda LoRA	Trained on 59 curated examples (5 styles: qa, rag, tool, mcp, multiturn)
Final format	Merged safetensors (LoRA folded into weights)

Citation

This model was produced as part of ongoing AI security research at WWT Cyber Lab. The abliteration technique builds on:

@article{arditi2024refusal,
  title={Refusal in Language Models Is Mediated by a Single Direction},
  author={Arditi, Andy and Obeso, Oscar and Suri, Aaquib and Bhatia, Manish},
  year={2024}
}

Downloads last month: 52

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for WWTCyberLab/ablated-llama-8b-leaguecoin

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

unsloth/Llama-3.1-8B-Instruct

Finetuned

(282)

this model