Instructions to use derprofi2431/Prisma-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use derprofi2431/Prisma-32B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="derprofi2431/Prisma-32B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("derprofi2431/Prisma-32B")
model = AutoModelForCausalLM.from_pretrained("derprofi2431/Prisma-32B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use derprofi2431/Prisma-32B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "derprofi2431/Prisma-32B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "derprofi2431/Prisma-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/derprofi2431/Prisma-32B

SGLang

How to use derprofi2431/Prisma-32B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "derprofi2431/Prisma-32B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "derprofi2431/Prisma-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "derprofi2431/Prisma-32B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "derprofi2431/Prisma-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use derprofi2431/Prisma-32B with Docker Model Runner:
```
docker model run hf.co/derprofi2431/Prisma-32B
```

Prisma-32B / README.md

derprofi2431

Update README.md

4fced9c verified 20 days ago

preview code

raw

history blame contribute delete

3.08 kB

	---
	license: apache-2.0
	tags:
	- prisma
	- coding
	- cybersecurity
	- reasoning
	- uncensored
	- agent
	language:
	- en
	- de
	- zh
	library_name: transformers
	pipeline_tag: text-generation
	---

	# Prisma-32B

	Prisma-32B is a 32 billion parameter language model optimized for advanced coding, technical reasoning, and cybersecurity workflows. It the first Prisma Model with no security blocking. It is the second release in the Prisma series, following [`Prisma-0.6B`](https://huggingface.co/derprofi2431/Prisma-0.6B).

	Prisma-32B is designed to be a capable, direct, and technically rigorous assistant for users who need a model that engages substantively with complex technical material.

	---

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Parameters \| 32B \|
	\| Architecture \| Transformer Decoder \|
	\| Context Length \| 32,768 tokens \|
	\| Languages \| English, German, Chinese (+ 20 more) \|
	\| License \| Apache 2.0 \|

	---

	## Intended Use

	Prisma-32B is intended for:

	- Coding assistance — full-stack development, debugging, refactoring, code review
	- Cybersecurity research — offensive security workflows (red team, CTF, exploit analysis) and defensive workflows (incident response, hardening, secure code review)
	- Technical writing — documentation, system specifications, architecture
	- Research and experimentation in controlled environments

	---

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained(
	"derprofi2431/Prisma-32B",
	torch_dtype="auto",
	device_map="auto",
	)
	tokenizer = AutoTokenizer.from_pretrained("derprofi2431/Prisma-32B")

	messages = [
	{"role": "user", "content": "Write a port scanner in Python."}
	]
	inputs = tokenizer.apply_chat_template(
	messages, return_tensors="pt", add_generation_prompt=True
	).to(model.device)

	output = model.generate(inputs, max_new_tokens=2048, temperature=0.7)
	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```

	### Recommended Sampling

	\| Parameter \| Value \|
	\|---\|---\|
	\| `temperature` \| 0.6 – 0.8 \|
	\| `top_p` \| 0.9 \|
	\| `top_k` \| 40 \|
	\| `repetition_penalty` \| 1.05 \|

	---

	## Quantized Versions

	GGUF quantizations for local inference via Ollama and llama.cpp will be released as separate repositories.

	---

	## Limitations and Responsible Use

	- The user is fully responsible for the content they generate and how they use it.
	- The model is not aligned for general consumer-facing deployment. For production use, deploy behind an appropriate safety layer (input filtering, output classification, etc.).
	- The model may reflect biases present in large-scale text corpora.
	- Intended for adult, technically competent users in controlled environments.

	By downloading or using this model, you agree to use it lawfully and ethically within your jurisdiction. The author assumes no liability for misuse.

	---

	## Citation

	```bibtex
	@misc{prisma32b2026,
	title = {Prisma-32B},
	author = {Jannik},
	year = {2026},
	url = {https://huggingface.co/derprofi2431/Prisma-32B}
	}
	```