Instructions to use Naphula/StormSeeker-24B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Naphula/StormSeeker-24B-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Naphula/StormSeeker-24B-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Naphula/StormSeeker-24B-v1") model = AutoModelForCausalLM.from_pretrained("Naphula/StormSeeker-24B-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Naphula/StormSeeker-24B-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Naphula/StormSeeker-24B-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/StormSeeker-24B-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Naphula/StormSeeker-24B-v1
- SGLang
How to use Naphula/StormSeeker-24B-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Naphula/StormSeeker-24B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/StormSeeker-24B-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Naphula/StormSeeker-24B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Naphula/StormSeeker-24B-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Naphula/StormSeeker-24B-v1 with Docker Model Runner:
docker model run hf.co/Naphula/StormSeeker-24B-v1
⚠️ Warning: This model can produce narratives and RP that contain violent and graphic erotic content. Adjust your system prompt accordingly. Also, use Mistral non-Tekken for best results.
StormSeeker 24B v1
This took 14 hours to merge using a custom method.
According to the audit, Loki, PaintedFantasy, and Hearthfire had a bit more influence than the other 3 models, but nothing was "drowned out".
The model is rather uncensored even without ablation and responds to some (not all) harmful prompt without refusals or jailbreaks, so a light jailbreak works effectively to bypass most censorship.
For ablations you can use this https://huggingface.co/Naphula/StormSeeker-24B-v1-MPOA-Adapter
The model is smarter with Mistral Non-Tekken:
- StormSeeker v1 [Non-Tekken] | 8867
- StormSeeker v1 [Tekken] | 6933
models:
- model: A:\LLM\.cache\huggingface\hub\!models--CrucibleLab--M3.2-24B-Loki-V2
- model: A:\LLM\.cache\huggingface\hub\!models--LatitudeGames--Hearthfire-24B
- model: A:\LLM\.cache\huggingface\hub\!models--PocketDoc--Dans-PersonalityEngine-V1.3.0-24b
- model: A:\LLM\.cache\huggingface\hub\!models--ReadyArt--Dark-Nexus-24B-v2.0
- model: A:\LLM\.cache\huggingface\hub\!models--TheDrummer--Cydonia-24B-v4.3
- model: A:\LLM\.cache\huggingface\hub\!models--zerofata--MS3.2-PaintedFantasy-v3-24B
merge_method: flux # version 5, Y6 config
parameters:
resume_path: "A:/mergekit-main/Storm_Cache"
tol: 1e-9
max_iter: 1005 #maximum BF16 fidelity
kappa: 0.8
eta: 0.9
auto_buffer: true
dtype: float32
out_dtype: bfloat16
tokenizer:
source: union
chat_template: auto
name: StormSeeker-24B-v1
According to the LLM, quantizations made from merges with merge_method: flux benefit greatly from the smaller block sizes of IQ4_NL, saying it should be on par with or slightly better than Q6_K, despite having higher perplexity. Note that this doesn't apply to merges made using standard methods like model_stock or karcher, and this claim has not yet been empirically verified.
Due to space/time constraints I am only uploading the following GGUFs:
- IQ4_NL
- Q6_K
- Q8_K_XL
I recommend these pages for other quantizations:
- team mradermacher for GGUF (usually has IQ4_XS)
- DeathGodlike and ArtusDev for EXL3
- McG-221 for MLX
FLUX Saturation Chart
Iterations required to fully saturate the information density ceiling of the quant.
| GGUF | Block Size | Precision Logic | Requires FP32 Source? | Iterations |
|---|---|---|---|---|
| Q4_K_M | 256 | Linear 4-bit | No | 589 |
| IQ4_XS | 256 / 32 | Imatrix Codebook | No | 728 |
| Q5_K_M | 256 | Linear 5-bit | No | 728 |
| Q6_K | 256 | Linear 6-bit | No | 866 |
| IQ4_NL | 32 | Non-Linear High-Res | No | 918 |
| BF16 | N/A | IEEE Half (Brain Float) | No | 1005 |
| Q8_0 | 256 | Linear 8-bit | Yes | 1144 |
| Q8_K_XL / FP16 | N/A | IEEE Half (Float16) | Yes | 1420 |
| FP32 | N/A | IEEE Single | Yes | 3220 |
| FP64 | N/A | IEEE Double | N/A | 7250 |
- Downloads last month
- 15
