Instructions to use Asystemoffields/gemma4-pmra-orbitquant-safe3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Asystemoffields/gemma4-pmra-orbitquant-safe3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Asystemoffields/gemma4-pmra-orbitquant-safe3")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Asystemoffields/gemma4-pmra-orbitquant-safe3", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Asystemoffields/gemma4-pmra-orbitquant-safe3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Asystemoffields/gemma4-pmra-orbitquant-safe3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Asystemoffields/gemma4-pmra-orbitquant-safe3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Asystemoffields/gemma4-pmra-orbitquant-safe3

SGLang

How to use Asystemoffields/gemma4-pmra-orbitquant-safe3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Asystemoffields/gemma4-pmra-orbitquant-safe3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Asystemoffields/gemma4-pmra-orbitquant-safe3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Asystemoffields/gemma4-pmra-orbitquant-safe3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Asystemoffields/gemma4-pmra-orbitquant-safe3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Asystemoffields/gemma4-pmra-orbitquant-safe3 with Docker Model Runner:
```
docker model run hf.co/Asystemoffields/gemma4-pmra-orbitquant-safe3
```

Gemma4 PMRA OrbitQuant Safe3 Policy

Base model: google/gemma-4-E2B-it

This artifact records the current Gemma4 OrbitQuant runtime overlay evaluated on top of the PMRA c2_calib_knapsack_mixed static weight state.

Selected Result

Metric	Value
Total compressed buses	10
PMRA NLL	12.818462
Stack NLL	12.834083
Delta NLL vs PMRA	0.015620
Delta NLL vs q3_k_s	-5.212224
Estimated saved MiB	48.78125

KV Policy

layer	bits	rotation	alpha
33	3	hadamard	0.75
28	3	hadamard	0.75
30	3	hadamard	0.75
16	3	hadamard	0.75
18	3	hadamard	0.75
11	3	hadamard	0.75
15	3	hadamard	0.75

MLP Policy

layer	bits	primitive	rotation	alpha	block_size
20	2	plus	preperm_activation_max_hadamard	0.375	512
19	2	plus	preperm_activation_max_hadamard	0.375	512
6	2	plus	preperm_boundary_rms_hadamard	0.375	512

Evaluation

Tokens: 24058

Prompt count: 128

Calibration prompt count: 24

Eval max length: 192

Calibration max length: 192

Top-10 overlap vs FP16: 0.13593750000000002

Last-logit MSE vs FP16: 67.99472899734974

Files

compression_config.json: runtime policy and metrics.
manifest.json: compact artifact summary.
README.md: model-card draft for publication.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Asystemoffields/gemma4-pmra-orbitquant-safe3

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Finetuned

(178)

this model