Instructions to use programasweights/paw-4b-gpt2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use programasweights/paw-4b-gpt2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="programasweights/paw-4b-gpt2")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("programasweights/paw-4b-gpt2", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use programasweights/paw-4b-gpt2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "programasweights/paw-4b-gpt2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "programasweights/paw-4b-gpt2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/programasweights/paw-4b-gpt2
- SGLang
How to use programasweights/paw-4b-gpt2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "programasweights/paw-4b-gpt2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "programasweights/paw-4b-gpt2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "programasweights/paw-4b-gpt2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "programasweights/paw-4b-gpt2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use programasweights/paw-4b-gpt2 with Docker Model Runner:
docker model run hf.co/programasweights/paw-4b-gpt2
| library_name: transformers | |
| base_model: Qwen/Qwen3-4B-Instruct-2507 | |
| tags: | |
| - program-as-weights | |
| - compiler | |
| - lora | |
| - hypernetwork | |
| pipeline_tag: text-generation | |
| # paw-4b-gpt2 β ProgramAsWeights "Compact" compiler | |
| This is the **Compact** compiler from **ProgramAsWeights (PAW)**. Given a natural-language **spec**, it emits a tiny per-task **program** β a LoRA adapter β that runs locally on a **GPT-2 (124M)** interpreter (small enough to run in the browser). | |
| It is the model invoked by `paw.compile(spec, compiler="paw-4b-gpt2")`. | |
| - Compiler base model: `Qwen/Qwen3-4B-Instruct-2507` | |
| - Target interpreter: **a custom GPT-2 (124M)** whose positional embeddings are extended from 1024 β 2048 (`n_ctx=2048`); tokenizer is stock GPT-2 BPE. | |
| - Snapshot: `20260406` (see git tag `20260406`) | |
| ## Contents | |
| - `compiler/` β a finetuned **Qwen3-4B-Instruct-2507** causal LM (the compiler). | |
| - `lora_mapper.pt` β the mapper head (trunk + coefficient head + learnable LoRA basis matrices) that turns the compiler's hidden states into a LoRA program. | |
| - `meta.json` β `lora_rank=64`, `lora_alpha=16`, `lora_num_bases=64`, `prefix_steps=64`, target modules `[c_attn, c_proj, c_fc]`. | |
| ## How it works | |
| 1. The 4B compiler generates a short "pseudo-program" (a task description plus a few I/O examples) from the spec. | |
| 2. The text `chat_template(spec) + pseudo-program + 64 prefix tokens` is run through the compiler; the mapper reads the 64 prefix hidden states and emits per-layer LoRA `A`/`B` matrices as a learned mixture of basis matrices. | |
| 3. The resulting LoRA (about 5 MB) is the **program**. It loads onto the GPT-2 interpreter and runs locally/offline (including in-browser). | |
| ## Status | |
| - Inference/runtime SDK (load + run a compiled program locally): https://github.com/programasweights/programasweights-python (browser SDK: https://github.com/programasweights/programasweights-js) | |
| - The cleaned compile/runtime code and the arXiv preprint ("Program-as-Weights: A Programming Paradigm for Fuzzy Functions", AIware 2026) will be public by Jul 6, 2026. An uncleaned reference snapshot is at https://anonymous.4open.science/r/programasweights | |
| - Live demo + program hub: https://programasweights.com | |