Instructions to use rockypod/a11y-public-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rockypod/a11y-public-coder with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="rockypod/a11y-public-coder") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("rockypod/a11y-public-coder", dtype="auto") - llama-cpp-python
How to use rockypod/a11y-public-coder with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rockypod/a11y-public-coder", filename="a11y-public-coder-14b-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use rockypod/a11y-public-coder with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rockypod/a11y-public-coder:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rockypod/a11y-public-coder:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rockypod/a11y-public-coder:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rockypod/a11y-public-coder:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf rockypod/a11y-public-coder:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf rockypod/a11y-public-coder:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf rockypod/a11y-public-coder:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf rockypod/a11y-public-coder:Q4_K_M
Use Docker
docker model run hf.co/rockypod/a11y-public-coder:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use rockypod/a11y-public-coder with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rockypod/a11y-public-coder" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rockypod/a11y-public-coder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rockypod/a11y-public-coder:Q4_K_M
- SGLang
How to use rockypod/a11y-public-coder with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rockypod/a11y-public-coder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rockypod/a11y-public-coder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rockypod/a11y-public-coder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rockypod/a11y-public-coder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use rockypod/a11y-public-coder with Ollama:
ollama run hf.co/rockypod/a11y-public-coder:Q4_K_M
- Unsloth Studio new
How to use rockypod/a11y-public-coder with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rockypod/a11y-public-coder to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rockypod/a11y-public-coder to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rockypod/a11y-public-coder to start chatting
- Pi new
How to use rockypod/a11y-public-coder with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rockypod/a11y-public-coder:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "rockypod/a11y-public-coder:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use rockypod/a11y-public-coder with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rockypod/a11y-public-coder:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default rockypod/a11y-public-coder:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use rockypod/a11y-public-coder with Docker Model Runner:
docker model run hf.co/rockypod/a11y-public-coder:Q4_K_M
- Lemonade
How to use rockypod/a11y-public-coder with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull rockypod/a11y-public-coder:Q4_K_M
Run and chat with the model
lemonade run user.a11y-public-coder-Q4_K_M
List all available models
lemonade list
- a11y-public-coder
- Quick reference
- What's in this repo
- Intended use
- Privacy-first training
- Security and compliance — agency responsibility
- Training methodology
- Dataset
- Evaluation
- Running the exam yourself
- Reproducing training
- Usage
- Known limitations
- Recommended use cases
- Roadmap
- Reproducibility
- Citation
- Acknowledgments
- License
- Quick reference
a11y-public-coder
Open-source accessibility coding assistant for the public sector. WCAG 2.2 Level AA conformance, Drupal 11, PHP 8.3, Drush 12, Python 3.12, and Playwright (TypeScript) with both axe-core and Siteimprove Alfa.
Version 0.9.0 · License MIT · Released 2026-05-17
HuggingFace — weights ·
Install via Ollama — ollama pull rockypod/public-a11y-coder ·
GitHub — exam, dataset, training pipeline
Quick reference
| 4B | 14B | |
|---|---|---|
| Base model | Qwen/Qwen3-4B |
Qwen/Qwen3-14B |
| Quantization | Q4_K_M GGUF (~2.5 GB) | Q4_K_M GGUF (~9 GB) |
| Recommended use | Demo, non-technical explanation, portable inference | Daily-driver technical work, OpenWebUI deployment |
| Exam score | 73.3% (22.0/30) | 76.7% (23.0/30) |
| Lift vs base | +28.3% (from 45.0% baseline) | +23.4% (from 53.3% baseline) |
| Ollama tag | rockypod/public-a11y-coder:4b |
rockypod/public-a11y-coder:14b |
What's in this repo
| Path | Description |
|---|---|
exam/a11y-30q.md |
Full 30-question evaluation exam with rubric |
exam/run_exam.py |
Exam runner — collects responses and scores |
exam/baselines/ |
Pre-training baseline grades (qwen3:4b, 8b, 14b) |
exam/trained/ |
Post-training grades per model size |
dataset/tier*.jsonl |
Full training corpus — 1,930 pairs across 18 tiers |
train.py |
Full training pipeline (consolidate → LoRA → merge → GGUF) |
Modelfile |
Ollama Modelfile (4B production ChatML template) |
Large artifacts (checkpoints, merged HF weights, GGUF) are not in this repo — download GGUFs from the HuggingFace model page or pull via Ollama.
Intended use
a11y-public-coder is designed for use by government agencies, public-sector developers, accessibility professionals, and any developer maintaining Drupal 11 sites that must meet WCAG 2.2 Level AA. The model produces:
- Drupal 11 module, theme, and Twig code that follows accessibility best practices
- Drush 12 CLI commands and custom command authoring
- Python 3.12 utility scripts for accessibility-aware file operations (PDF text layer detection, alt audit, heading hierarchy)
- Playwright (TypeScript) test scaffolds using
@axe-core/playwrightand@siteimprove/alfa-playwright - WCAG 2.2 AA explanations cited by success criterion number, suitable for both developers and non-technical content editors
The 4B variant is optimized for portable demonstrations (runs comfortably in a Windows 11 VM with 8 GB allocation) and explanation-first responses. The 14B variant is the primary daily-driver, targeted at OpenWebUI deployment on agency or homelab hardware.
Privacy-first training
This model was trained under explicit privacy constraints documented in the dataset card and verifiable in the public training corpus:
- No PII in any training entry — no real names, addresses, emails, phone numbers, case numbers, or social security numbers
- No real URLs, hostnames, or production domain names — all examples use
example.gov,gov.example.org, oragency.exampleplaceholders - No scraped production agency content — every training example was authored from publicly available official documentation: drupal.org, php.net, docs.python.org, playwright.dev, alfa.siteimprove.com, w3.org/WAI/WCAG22/, drush.org
- Full dataset, exam questions, and per-question grading results are public — see the
a11y-public-coder-datasetrepository
The privacy-first training approach minimizes the risk of memorized PII surfacing in outputs, but does not eliminate standard LLM safety considerations.
Security and compliance — agency responsibility
a11y-public-coder is designed for self-hosted deployment. Deploying organizations remain responsible for their own security and privacy posture:
- Self-host: run via Ollama, vLLM, llama.cpp, or similar so that no prompts leave your network
- No certifications: this model has not been independently certified against NIST 800-53, FedRAMP, CJIS, HIPAA, FERPA, or state-specific frameworks. Agencies must independently validate fitness for their compliance context
- No sensitive data in prompts: do not paste citizen PII, case numbers, or other sensitive content. The model is a code/audit assistant, not a data-handling system
- Output review: model output is a suggestion, not authoritative. Human review is required before deployment
- Access controls and audit logging are the operator's responsibility
Training methodology
a11y-public-coder was trained using the CRAFTED℠ (Continuous Retrieval-Augmented Fine-Tuning, Evaluate, Deploy) pipeline:
Source corpus assembly — 1,930 training pairs generated from official documentation (drupal.org, drush.org, playwright.dev, alfa.siteimprove.com, w3.org/WAI/WCAG22/, php.net, docs.python.org) by a local teacher model (
qwen3:30bvia Ollama)CRAFTED℠ correction stream — Every generated entry was reviewed against domain-specific failure-mode filters (e.g. the WCAG 2.5.8 = 24×24 vs 2.5.5 = 44×44 contamination check, the Drupal 7/8/9 → Drupal 11 API leakage check, the Python-vs-TypeScript Playwright fallback check). 1,925 of 1,930 entries passed auto-acceptance with rule-based filters; 5 were manually corrected; 2 additional issues were flagged by a Drupal-specific D7/D8 API validator and corrected. Final corrected entries: 1,930/1,930.
Fine-tuning — Unsloth + LoRA (r=16, alpha=16, no dropout) on NVIDIA RTX 3090 Ti, 4 epochs at learning rate 2e-4 with cosine schedule. The 4B run reweights
demo_friendlyentries by 1.5× and downsamples entries withlen(assistant) > 1800 charsby 0.7× to favor explanation-leaning content; the 14B run uses the full distribution without reweighting.Conversion — GGUF via pinned
llama.cppcommit57819b8d4with--outtype f16, quantized to Q4_K_M for serving. Modelfile uses ChatML template override for tokenizer consistency.
The full pipeline is reproducible from the training scripts in this repository.
Dataset
The training corpus is 1,930 high-quality instruction-response pairs across 18 tiers, fully open and downloadable from the dataset repository or from dataset/ in this repo:
| Tier | Domain | Entries |
|---|---|---|
| 1 | Drupal 11 core fundamentals | 100 |
| 2 | Drupal 11 contrib stack (Webform, Paragraphs, Views, Pathauto, Metatag) | 100 |
| 3 | Drupal 11 Twig 3 templating | 100 |
| 4 | Drupal 11 custom modules | 100 |
| 5 | Drupal 11 accessibility patterns | 100 |
| 6 | Drupal-flavored PHP 8.3 | 100 |
| 7 | Drush 12 CLI usage | 100 |
| 8 | Drush 12 custom command authoring | 100 |
| 9 | Python 3.12 folder/file utilities | 100 |
| 10 | Python 3.12 file conversion | 100 |
| 11 | Python 3.12 accessibility-aware utilities | 100 |
| 12 | Playwright (TypeScript) fundamentals | 100 |
| 13 | Playwright + @axe-core/playwright |
140 |
| 14 | Playwright + @siteimprove/alfa-playwright |
130 |
| 15 | WCAG 2.2 AA — pre-2.2 carryover SCs | 80 |
| 16 | WCAG 2.2-new success criteria (9 new SCs) | 140 |
| 17 | Negative-example / contamination correction pairs | 140 |
| 18 | End-to-end multi-domain scenarios | 100 |
| Total | 1,930 |
Evaluation
Models are evaluated against a 30-question exam covering all training domains, scored Full (1.0) / Partial (0.5) / Fail (0.0) per question, max 30.0 points. The exam is published in full, including grading rubrics: see exam/a11y-30q.md.
Pre-training baselines and post-training results are published in exam/, with per-question grades:
Summary
| Model | Total | Percentage |
|---|---|---|
qwen3:4b baseline |
13.5/30 | 45.0% |
qwen3:8b baseline |
17.0/30 | 56.7% |
qwen3:14b baseline |
16.0/30 | 53.3% |
a11y-public-coder:4b (trained) |
22.0/30 | 73.3% |
a11y-public-coder:14b (trained) |
23.0/30 | 76.7% |
Per-domain results — 4B trained vs baseline qwen3:4b
| Domain | Baseline | Trained 4B | Lift |
|---|---|---|---|
| Drupal 11 | 2.0/8 (25%) | 6.0/8 (75%) | +4.0 ⬆ |
| PHP 8.3 | 0.5/2 (25%) | 1.0/2 (50%) | +0.5 |
| Drush 12 | 2.0/3 (67%) | 1.5/3 (50%) | -0.5 ⬇ |
| Python 3.12 | 2.5/4 (63%) | 4.0/4 (100%) | +1.5 ✓ |
| Playwright + axe-core | 0.5/3 (17%) | 2.0/3 (67%) | +1.5 ⬆ |
| Playwright + Alfa | 0.5/2 (25%) | 1.5/2 (75%) | +1.0 ⬆ |
| WCAG 2.2 AA (carryover) | 3.0/4 (75%) | 3.0/4 (75%) | 0 |
| WCAG 2.2-new ⭐ | 1.5/3 (50%) | 2.0/3 (67%) | +0.5 |
| Negative/contamination gate | 1.0/1 (100%) | 1.0/1 (100%) | 0 ✓ |
| Total | 13.5/30 (45.0%) | 22.0/30 (73.3%) | +8.5 (+28.3%) |
Per-domain results — 14B trained vs baseline qwen3:14b
| Domain | Baseline | Trained 14B | Lift |
|---|---|---|---|
| Drupal 11 | 3.0/8 (37.5%) | 6.5/8 (81.3%) | +3.5 ⬆ |
| PHP 8.3 | 1.5/2 (75.0%) | 1.5/2 (75.0%) | 0 |
| Drush 12 | 1.5/3 (50.0%) | 1.0/3 (33.3%) | -0.5 ⬇ |
| Python 3.12 | 2.5/4 (62.5%) | 3.5/4 (87.5%) | +1.0 ⬆ |
| Playwright + axe-core | 1.0/3 (33.3%) | 2.5/3 (83.3%) | +1.5 ⬆ |
| Playwright + Alfa | 1.0/2 (50.0%) | 2.0/2 (100%) | +1.0 ✓ |
| WCAG 2.2 AA (carryover) | 3.0/4 (75.0%) | 3.0/4 (75.0%) | 0 |
| WCAG 2.2-new ⭐ | 1.5/3 (50.0%) | 2.0/3 (66.7%) | +0.5 |
| Negative/contamination gate | 1.0/1 (100%) | 1.0/1 (100%) | 0 ✓ |
| Total | 16.0/30 (53.3%) | 23.0/30 (76.7%) | +7.0 (+23.4%) |
Running the exam yourself
# Against a trained model already loaded in Ollama
python exam/run_exam.py --model rockypod/public-a11y-coder:4b --output exam/trained/4b
# Score after filling grades.json
python exam/run_exam.py --score exam/trained/4b
Grading is manual (Full/Partial/Fail per rubric in exam/a11y-30q.md).
Reproducing training
# On a CUDA GPU server with the Unsloth venv installed
nohup env PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True TORCHDYNAMO_DISABLE=1 \
python train.py --size 4b > logs/train_4b.log 2>&1 &
TORCHDYNAMO_DISABLE=1 is required — Qwen3 + Unsloth triggers Triton JIT compilation which fails on CUDA driver/toolkit version mismatches common on Rocky Linux GPU hosts.
Usage
Ollama (local)
ollama run rockypod/public-a11y-coder:14b
# or for the portable demo model:
ollama run rockypod/public-a11y-coder:4b
OpenWebUI
Add the model under Settings → Models → Ollama, point to your Ollama endpoint (default http://localhost:11434), select rockypod/public-a11y-coder:14b from the model list.
HuggingFace Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "rockypod/a11y-public-coder-4b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
messages = [
{"role": "user", "content": "Write a Drupal 11 Twig snippet for an accessible image field with a skip-link-friendly heading structure."}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Known limitations
The v0.9.0 release ships with documented gaps to be addressed in v1.0:
Drush flag accuracy — The 4B variant occasionally fabricates non-existent command flags (e.g. inventing
--targetor--excludeon commands where those flags do not exist). This is a training data quality issue traced to tier 7 generation; v1.0 will include a Drush command-reference validator before retraining.Contrast ratio computation — Small models cannot reliably compute color contrast ratios from arbitrary hex pairs. The model correctly identifies SC 1.4.3 (Contrast — Minimum) and can recall specific examples that appear in training (
#767676on white = 4.48:1), but does not generalize to compute ratios for novel inputs. Recommend pairing with a deterministic contrast checker.WCAG 2.2-new exception coverage — SC 2.5.8 (Target Size — Minimum) has five distinct exception cases (offset, essential, inline, user-agent-controlled, equivalent). The 4B reliably outputs the headline
24×24 CSS pixelsAA threshold but covers only one of the five exception cases consistently. v1.0 will expand tier 16 with dedicated entries per exception type.SC-to-SC discrimination — The 4B occasionally confuses related success criteria (e.g. cites SC 2.1.1 + 2.1.2 for a missing button role where 4.1.2 is the primary criterion). v1.0 will add SC-discrimination pair entries to tier 17.
Drupal 11 vs Drupal 10 distinction — While the dataset targets Drupal 11 exclusively, the underlying base model has substantial Drupal 7/8/9 pretraining priors. The contamination gate (tier 17 negative examples) holds at 100% on the exam, but in long-form generation some D7-era patterns may surface. Always validate generated Drupal code against the actual D11 API.
Recommended use cases
Strong fit:
- Generating Drupal 11 module scaffolds with accessibility baked in
- Writing Playwright + axe-core / Alfa test files for agency sites
- Drafting Python utility scripts for accessibility audits (PDF text layer detection, alt text auditing, heading hierarchy)
- Explaining WCAG 2.2 success criteria to non-technical content editors
- Drush 12 natural-language to command translation (with verification)
Use with caution:
- Contrast ratio calculations (verify with a deterministic checker)
- Drush command flags (verify against
drush help <command>) - Drupal 8/9 maintenance (this model is Drupal 11-targeted)
Not designed for:
- General-purpose coding outside the trained domains
- Production-critical accessibility certification without human review
- Handling sensitive citizen data in prompts
Roadmap
| Version | Target | Focus |
|---|---|---|
| v0.9.0 | shipped | Initial release, baselines published, ship gate intentionally below 80% with documented limitations |
| v0.9.5 | ~6 weeks | Drush flag validation pass, contrast hex-pair expansion, SC 2.5.8 exception coverage |
| v1.0.0 | ~10 weeks | All v0.9.0 limitations addressed, ≥85% on the 30Q exam |
The CRAFTED℠ methodology means each version uses real-world exam failures and user-reported issues as the correction stream for the next training cycle. The v1.0 release will include an expanded 60-question exam.
Reproducibility
This release is reproducible end-to-end from the public artifacts:
- Dataset:
rockypod/a11y-public-coder-datasetordataset/in this repo - Training pipeline:
train.pyin this repo - Evaluation exam:
exam/a11y-30q.md - Exam runner:
exam/run_exam.py - Per-question grading results:
exam/baselines/andexam/trained/
Citation
@misc{a11y-public-coder-v0.9.0,
author = {RockyPod},
title = {a11y-public-coder: An open-source accessibility coding assistant for the public sector},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/rockypod/a11y-public-coder-4b}},
}
Acknowledgments
- Base models: Qwen team —
Qwen3-4BandQwen3-14Bare MIT-licensed open weights - Accessibility tooling: Deque axe-core, Siteimprove Alfa
- Web standards: W3C WAI for the WCAG 2.2 specification and Understanding documents
- Training infrastructure: Unsloth, llama.cpp, Ollama
License
MIT. See LICENSE for full text. Free for any use including commercial, including by government agencies.
- Downloads last month
- 40
4-bit
Model tree for rockypod/a11y-public-coder
Evaluation results
- 30-question exam (4B)self-reported73.300
- 30-question exam (14B)self-reported76.700