Instructions to use Naphula/StormSeeker-24B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Naphula/StormSeeker-24B-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Naphula/StormSeeker-24B-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Naphula/StormSeeker-24B-v1")
model = AutoModelForCausalLM.from_pretrained("Naphula/StormSeeker-24B-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Naphula/StormSeeker-24B-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Naphula/StormSeeker-24B-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/StormSeeker-24B-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Naphula/StormSeeker-24B-v1

SGLang

How to use Naphula/StormSeeker-24B-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Naphula/StormSeeker-24B-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/StormSeeker-24B-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Naphula/StormSeeker-24B-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Naphula/StormSeeker-24B-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Naphula/StormSeeker-24B-v1 with Docker Model Runner:
```
docker model run hf.co/Naphula/StormSeeker-24B-v1
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

⚠️ Warning: This model can produce narratives and RP that contain violent and graphic erotic content. Adjust your system prompt accordingly. Also, use Mistral non-Tekken for best results.

StormSeeker 24B v1

This took 14 hours to merge using a custom method.

According to the audit, Loki, PaintedFantasy, and Hearthfire had a bit more influence than the other 3 models, but nothing was "drowned out".

The model is rather uncensored even without ablation and responds to some (not all) harmful prompt without refusals or jailbreaks, so a light jailbreak works effectively to bypass most censorship.

For ablations you can use this https://huggingface.co/Naphula/StormSeeker-24B-v1-MPOA-Adapter

The model is smarter with Mistral Non-Tekken:

StormSeeker v1 [Non-Tekken] | 8867
StormSeeker v1 [Tekken] | 6933

models: 
  - model: A:\LLM\.cache\huggingface\hub\!models--CrucibleLab--M3.2-24B-Loki-V2
  - model: A:\LLM\.cache\huggingface\hub\!models--LatitudeGames--Hearthfire-24B
  - model: A:\LLM\.cache\huggingface\hub\!models--PocketDoc--Dans-PersonalityEngine-V1.3.0-24b
  - model: A:\LLM\.cache\huggingface\hub\!models--ReadyArt--Dark-Nexus-24B-v2.0
  - model: A:\LLM\.cache\huggingface\hub\!models--TheDrummer--Cydonia-24B-v4.3
  - model: A:\LLM\.cache\huggingface\hub\!models--zerofata--MS3.2-PaintedFantasy-v3-24B
merge_method: flux # version 5, Y6 config
parameters:
  resume_path: "A:/mergekit-main/Storm_Cache"
  tol: 1e-9
  max_iter: 1005 #maximum BF16 fidelity
  kappa: 0.8
  eta: 0.9
  auto_buffer: true
dtype: float32
out_dtype: bfloat16
tokenizer:
  source: union
chat_template: auto
name: StormSeeker-24B-v1

According to the LLM, quantizations made from merges with merge_method: flux benefit greatly from the smaller block sizes of IQ4_NL, saying it should be on par with or slightly better than Q6_K, despite having higher perplexity. Note that this doesn't apply to merges made using standard methods like model_stock or karcher, and this claim has not yet been empirically verified.

Due to space/time constraints I am only uploading the following GGUFs:

IQ4_NL
Q6_K
Q8_K_XL

I recommend these pages for other quantizations:

team mradermacher for GGUF (usually has IQ4_XS)
DeathGodlike and ArtusDev for EXL3
McG-221 for MLX

FLUX Saturation Chart

Iterations required to fully saturate the information density ceiling of the quant.

GGUF	Block Size	Precision Logic	Requires FP32 Source?	Iterations
Q4_K_M	256	Linear 4-bit	No	589
IQ4_XS	256 / 32	Imatrix Codebook	No	728
Q5_K_M	256	Linear 5-bit	No	728
Q6_K	256	Linear 6-bit	No	866
IQ4_NL	32	Non-Linear High-Res	No	918
BF16	N/A	IEEE Half (Brain Float)	No	1005
Q8_0	256	Linear 8-bit	Yes	1144
Q8_K_XL / FP16	N/A	IEEE Half (Float16)	Yes	1420
FP32	N/A	IEEE Single	Yes	3220
FP64	N/A	IEEE Double	N/A	7250