Instructions to use derprofi2431/Prisma-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use derprofi2431/Prisma-32B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="derprofi2431/Prisma-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("derprofi2431/Prisma-32B") model = AutoModelForCausalLM.from_pretrained("derprofi2431/Prisma-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use derprofi2431/Prisma-32B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "derprofi2431/Prisma-32B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "derprofi2431/Prisma-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/derprofi2431/Prisma-32B
- SGLang
How to use derprofi2431/Prisma-32B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "derprofi2431/Prisma-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "derprofi2431/Prisma-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "derprofi2431/Prisma-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "derprofi2431/Prisma-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use derprofi2431/Prisma-32B with Docker Model Runner:
docker model run hf.co/derprofi2431/Prisma-32B
Prisma-32B
Prisma-32B is a 32 billion parameter language model optimized for advanced coding, technical reasoning, and cybersecurity workflows. It the first Prisma Model with no security blocking. It is the second release in the Prisma series, following Prisma-0.6B.
Prisma-32B is designed to be a capable, direct, and technically rigorous assistant for users who need a model that engages substantively with complex technical material.
Model Details
| Property | Value |
|---|---|
| Parameters | 32B |
| Architecture | Transformer Decoder |
| Context Length | 32,768 tokens |
| Languages | English, German, Chinese (+ 20 more) |
| License | Apache 2.0 |
Intended Use
Prisma-32B is intended for:
- Coding assistance — full-stack development, debugging, refactoring, code review
- Cybersecurity research — offensive security workflows (red team, CTF, exploit analysis) and defensive workflows (incident response, hardening, secure code review)
- Technical writing — documentation, system specifications, architecture
- Research and experimentation in controlled environments
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"derprofi2431/Prisma-32B",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("derprofi2431/Prisma-32B")
messages = [
{"role": "user", "content": "Write a port scanner in Python."}
]
inputs = tokenizer.apply_chat_template(
messages, return_tensors="pt", add_generation_prompt=True
).to(model.device)
output = model.generate(inputs, max_new_tokens=2048, temperature=0.7)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Recommended Sampling
| Parameter | Value |
|---|---|
temperature |
0.6 – 0.8 |
top_p |
0.9 |
top_k |
40 |
repetition_penalty |
1.05 |
Quantized Versions
GGUF quantizations for local inference via Ollama and llama.cpp will be released as separate repositories.
Limitations and Responsible Use
- The user is fully responsible for the content they generate and how they use it.
- The model is not aligned for general consumer-facing deployment. For production use, deploy behind an appropriate safety layer (input filtering, output classification, etc.).
- The model may reflect biases present in large-scale text corpora.
- Intended for adult, technically competent users in controlled environments.
By downloading or using this model, you agree to use it lawfully and ethically within your jurisdiction. The author assumes no liability for misuse.
Citation
@misc{prisma32b2026,
title = {Prisma-32B},
author = {Jannik},
year = {2026},
url = {https://huggingface.co/derprofi2431/Prisma-32B}
}
- Downloads last month
- 145
docker model run hf.co/derprofi2431/Prisma-32B