Instructions to use rockypod/a11y-public-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rockypod/a11y-public-coder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="rockypod/a11y-public-coder")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("rockypod/a11y-public-coder", dtype="auto")

llama-cpp-python

How to use rockypod/a11y-public-coder with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="rockypod/a11y-public-coder",
	filename="a11y-public-coder-14b-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use rockypod/a11y-public-coder with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rockypod/a11y-public-coder:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf rockypod/a11y-public-coder:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rockypod/a11y-public-coder:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf rockypod/a11y-public-coder:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf rockypod/a11y-public-coder:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf rockypod/a11y-public-coder:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf rockypod/a11y-public-coder:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf rockypod/a11y-public-coder:Q4_K_M

Use Docker

docker model run hf.co/rockypod/a11y-public-coder:Q4_K_M

LM Studio
Jan

vLLM

How to use rockypod/a11y-public-coder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rockypod/a11y-public-coder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rockypod/a11y-public-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rockypod/a11y-public-coder:Q4_K_M

SGLang

How to use rockypod/a11y-public-coder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rockypod/a11y-public-coder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rockypod/a11y-public-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rockypod/a11y-public-coder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rockypod/a11y-public-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use rockypod/a11y-public-coder with Ollama:
```
ollama run hf.co/rockypod/a11y-public-coder:Q4_K_M
```

Unsloth Studio new

How to use rockypod/a11y-public-coder with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rockypod/a11y-public-coder to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rockypod/a11y-public-coder to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for rockypod/a11y-public-coder to start chatting

Pi new

How to use rockypod/a11y-public-coder with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf rockypod/a11y-public-coder:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "rockypod/a11y-public-coder:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use rockypod/a11y-public-coder with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf rockypod/a11y-public-coder:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default rockypod/a11y-public-coder:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use rockypod/a11y-public-coder with Docker Model Runner:
```
docker model run hf.co/rockypod/a11y-public-coder:Q4_K_M
```

Lemonade

How to use rockypod/a11y-public-coder with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull rockypod/a11y-public-coder:Q4_K_M

Run and chat with the model

lemonade run user.a11y-public-coder-Q4_K_M

List all available models

lemonade list

a11y-public-coder / README.md

rockypod

Upload README.md with huggingface_hub

5986aec verified 12 days ago

preview code

raw

history blame contribute delete

17.1 kB

	---
	license: mit
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- accessibility
	- wcag
	- wcag-2.2
	- drupal
	- drupal-11
	- php
	- drush
	- python
	- playwright
	- axe-core
	- siteimprove-alfa
	- government
	- public-sector
	- code
	base_model:
	- Qwen/Qwen3-4B
	- Qwen/Qwen3-14B
	datasets:
	- rockypod/a11y-public-coder-dataset
	model-index:
	- name: a11y-public-coder
	results:
	- task:
	type: text-generation
	name: WCAG 2.2 AA Accessibility Coding Exam
	metrics:
	- type: accuracy
	name: 30-question exam (4B)
	value: 73.3
	- type: accuracy
	name: 30-question exam (14B)
	value: 76.7
	---

	# a11y-public-coder

	Open-source accessibility coding assistant for the public sector. WCAG 2.2 Level AA conformance, Drupal 11, PHP 8.3, Drush 12, Python 3.12, and Playwright (TypeScript) with both axe-core and Siteimprove Alfa.

	> Version 0.9.0 · License MIT · Released 2026-05-17

	[HuggingFace — weights](https://huggingface.co/rockypod/a11y-public-coder-4b) ·
	[Install via Ollama](https://ollama.com/rockypod/public-a11y-coder) — `ollama pull rockypod/public-a11y-coder` ·
	[GitHub — exam, dataset, training pipeline](https://github.com/rockypod/public_a11y_coder)

	## Quick reference

	\| \| 4B \| 14B \|
	\|---\|---\|---\|
	\| Base model \| `Qwen/Qwen3-4B` \| `Qwen/Qwen3-14B` \|
	\| Quantization \| Q4_K_M GGUF (~2.5 GB) \| Q4_K_M GGUF (~9 GB) \|
	\| Recommended use \| Demo, non-technical explanation, portable inference \| Daily-driver technical work, OpenWebUI deployment \|
	\| Exam score \| 73.3% (22.0/30) \| 76.7% (23.0/30) \|
	\| Lift vs base \| +28.3% (from 45.0% baseline) \| +23.4% (from 53.3% baseline) \|
	\| Ollama tag \| `rockypod/public-a11y-coder:4b` \| `rockypod/public-a11y-coder:14b` \|

	## What's in this repo

	\| Path \| Description \|
	\|---\|---\|
	\| `exam/a11y-30q.md` \| Full 30-question evaluation exam with rubric \|
	\| `exam/run_exam.py` \| Exam runner — collects responses and scores \|
	\| `exam/baselines/` \| Pre-training baseline grades (qwen3:4b, 8b, 14b) \|
	\| `exam/trained/` \| Post-training grades per model size \|
	\| `dataset/tier*.jsonl` \| Full training corpus — 1,930 pairs across 18 tiers \|
	\| `train.py` \| Full training pipeline (consolidate → LoRA → merge → GGUF) \|
	\| `Modelfile` \| Ollama Modelfile (4B production ChatML template) \|

	Large artifacts (checkpoints, merged HF weights, GGUF) are not in this repo — download GGUFs from the HuggingFace model page or pull via Ollama.

	## Intended use

	`a11y-public-coder` is designed for use by government agencies, public-sector developers, accessibility professionals, and any developer maintaining Drupal 11 sites that must meet WCAG 2.2 Level AA. The model produces:

	- Drupal 11 module, theme, and Twig code that follows accessibility best practices
	- Drush 12 CLI commands and custom command authoring
	- Python 3.12 utility scripts for accessibility-aware file operations (PDF text layer detection, alt audit, heading hierarchy)
	- Playwright (TypeScript) test scaffolds using `@axe-core/playwright` and `@siteimprove/alfa-playwright`
	- WCAG 2.2 AA explanations cited by success criterion number, suitable for both developers and non-technical content editors

	The 4B variant is optimized for portable demonstrations (runs comfortably in a Windows 11 VM with 8 GB allocation) and explanation-first responses. The 14B variant is the primary daily-driver, targeted at OpenWebUI deployment on agency or homelab hardware.

	## Privacy-first training

	This model was trained under explicit privacy constraints documented in the dataset card and verifiable in the public training corpus:

	- No PII in any training entry — no real names, addresses, emails, phone numbers, case numbers, or social security numbers
	- No real URLs, hostnames, or production domain names — all examples use `example.gov`, `gov.example.org`, or `agency.example` placeholders
	- No scraped production agency content — every training example was authored from publicly available official documentation: drupal.org, php.net, docs.python.org, playwright.dev, alfa.siteimprove.com, w3.org/WAI/WCAG22/, drush.org
	- Full dataset, exam questions, and per-question grading results are public — see the [`a11y-public-coder-dataset`](https://huggingface.co/datasets/rockypod/a11y-public-coder-dataset) repository

	The privacy-first training approach minimizes the risk of memorized PII surfacing in outputs, but does not eliminate standard LLM safety considerations.

	## Security and compliance — agency responsibility

	`a11y-public-coder` is designed for self-hosted deployment. Deploying organizations remain responsible for their own security and privacy posture:

	- Self-host: run via Ollama, vLLM, llama.cpp, or similar so that no prompts leave your network
	- No certifications: this model has not been independently certified against NIST 800-53, FedRAMP, CJIS, HIPAA, FERPA, or state-specific frameworks. Agencies must independently validate fitness for their compliance context
	- No sensitive data in prompts: do not paste citizen PII, case numbers, or other sensitive content. The model is a code/audit assistant, not a data-handling system
	- Output review: model output is a suggestion, not authoritative. Human review is required before deployment
	- Access controls and audit logging are the operator's responsibility

	## Training methodology

	`a11y-public-coder` was trained using the CRAFTED℠ (Continuous Retrieval-Augmented Fine-Tuning, Evaluate, Deploy) pipeline:

	1. Source corpus assembly — 1,930 training pairs generated from official documentation (drupal.org, drush.org, playwright.dev, alfa.siteimprove.com, w3.org/WAI/WCAG22/, php.net, docs.python.org) by a local teacher model (`qwen3:30b` via Ollama)

	2. CRAFTED℠ correction stream — Every generated entry was reviewed against domain-specific failure-mode filters (e.g. the WCAG 2.5.8 = 24×24 vs 2.5.5 = 44×44 contamination check, the Drupal 7/8/9 → Drupal 11 API leakage check, the Python-vs-TypeScript Playwright fallback check). 1,925 of 1,930 entries passed auto-acceptance with rule-based filters; 5 were manually corrected; 2 additional issues were flagged by a Drupal-specific D7/D8 API validator and corrected. Final corrected entries: 1,930/1,930.

	3. Fine-tuning — Unsloth + LoRA (r=16, alpha=16, no dropout) on NVIDIA RTX 3090 Ti, 4 epochs at learning rate 2e-4 with cosine schedule. The 4B run reweights `demo_friendly` entries by 1.5× and downsamples entries with `len(assistant) > 1800 chars` by 0.7× to favor explanation-leaning content; the 14B run uses the full distribution without reweighting.

	4. Conversion — GGUF via pinned `llama.cpp` commit `57819b8d4` with `--outtype f16`, quantized to Q4_K_M for serving. Modelfile uses ChatML template override for tokenizer consistency.

	The full pipeline is reproducible from the [training scripts](https://github.com/rockypod/public_a11y_coder) in this repository.

	## Dataset

	The training corpus is 1,930 high-quality instruction-response pairs across 18 tiers, fully open and downloadable from the [dataset repository](https://huggingface.co/datasets/rockypod/a11y-public-coder-dataset) or from `dataset/` in this repo:

	\| Tier \| Domain \| Entries \|
	\|---\|---\|---\|
	\| 1 \| Drupal 11 core fundamentals \| 100 \|
	\| 2 \| Drupal 11 contrib stack (Webform, Paragraphs, Views, Pathauto, Metatag) \| 100 \|
	\| 3 \| Drupal 11 Twig 3 templating \| 100 \|
	\| 4 \| Drupal 11 custom modules \| 100 \|
	\| 5 \| Drupal 11 accessibility patterns \| 100 \|
	\| 6 \| Drupal-flavored PHP 8.3 \| 100 \|
	\| 7 \| Drush 12 CLI usage \| 100 \|
	\| 8 \| Drush 12 custom command authoring \| 100 \|
	\| 9 \| Python 3.12 folder/file utilities \| 100 \|
	\| 10 \| Python 3.12 file conversion \| 100 \|
	\| 11 \| Python 3.12 accessibility-aware utilities \| 100 \|
	\| 12 \| Playwright (TypeScript) fundamentals \| 100 \|
	\| 13 \| Playwright + `@axe-core/playwright` \| 140 \|
	\| 14 \| Playwright + `@siteimprove/alfa-playwright` \| 130 \|
	\| 15 \| WCAG 2.2 AA — pre-2.2 carryover SCs \| 80 \|
	\| 16 \| WCAG 2.2-new success criteria (9 new SCs) \| 140 \|
	\| 17 \| Negative-example / contamination correction pairs \| 140 \|
	\| 18 \| End-to-end multi-domain scenarios \| 100 \|
	\| Total \| \| 1,930 \|

	## Evaluation

	Models are evaluated against a 30-question exam covering all training domains, scored Full (1.0) / Partial (0.5) / Fail (0.0) per question, max 30.0 points. The exam is published in full, including grading rubrics: see [`exam/a11y-30q.md`](exam/a11y-30q.md).

	Pre-training baselines and post-training results are published in `exam/`, with per-question grades:

	### Summary

	\| Model \| Total \| Percentage \|
	\|---\|---\|---\|
	\| `qwen3:4b` baseline \| 13.5/30 \| 45.0% \|
	\| `qwen3:8b` baseline \| 17.0/30 \| 56.7% \|
	\| `qwen3:14b` baseline \| 16.0/30 \| 53.3% \|
	\| `a11y-public-coder:4b` (trained) \| 22.0/30 \| 73.3% \|
	\| `a11y-public-coder:14b` (trained) \| 23.0/30 \| 76.7% \|

	### Per-domain results — 4B trained vs baseline `qwen3:4b`

	\| Domain \| Baseline \| Trained 4B \| Lift \|
	\|---\|---\|---\|---\|
	\| Drupal 11 \| 2.0/8 (25%) \| 6.0/8 (75%) \| +4.0 ⬆ \|
	\| PHP 8.3 \| 0.5/2 (25%) \| 1.0/2 (50%) \| +0.5 \|
	\| Drush 12 \| 2.0/3 (67%) \| 1.5/3 (50%) \| -0.5 ⬇ \|
	\| Python 3.12 \| 2.5/4 (63%) \| 4.0/4 (100%) \| +1.5 ✓ \|
	\| Playwright + axe-core \| 0.5/3 (17%) \| 2.0/3 (67%) \| +1.5 ⬆ \|
	\| Playwright + Alfa \| 0.5/2 (25%) \| 1.5/2 (75%) \| +1.0 ⬆ \|
	\| WCAG 2.2 AA (carryover) \| 3.0/4 (75%) \| 3.0/4 (75%) \| 0 \|
	\| WCAG 2.2-new ⭐ \| 1.5/3 (50%) \| 2.0/3 (67%) \| +0.5 \|
	\| Negative/contamination gate \| 1.0/1 (100%) \| 1.0/1 (100%) \| 0 ✓ \|
	\| Total \| 13.5/30 (45.0%) \| 22.0/30 (73.3%) \| +8.5 (+28.3%) \|

	### Per-domain results — 14B trained vs baseline `qwen3:14b`

	\| Domain \| Baseline \| Trained 14B \| Lift \|
	\|---\|---\|---\|---\|
	\| Drupal 11 \| 3.0/8 (37.5%) \| 6.5/8 (81.3%) \| +3.5 ⬆ \|
	\| PHP 8.3 \| 1.5/2 (75.0%) \| 1.5/2 (75.0%) \| 0 \|
	\| Drush 12 \| 1.5/3 (50.0%) \| 1.0/3 (33.3%) \| -0.5 ⬇ \|
	\| Python 3.12 \| 2.5/4 (62.5%) \| 3.5/4 (87.5%) \| +1.0 ⬆ \|
	\| Playwright + axe-core \| 1.0/3 (33.3%) \| 2.5/3 (83.3%) \| +1.5 ⬆ \|
	\| Playwright + Alfa \| 1.0/2 (50.0%) \| 2.0/2 (100%) \| +1.0 ✓ \|
	\| WCAG 2.2 AA (carryover) \| 3.0/4 (75.0%) \| 3.0/4 (75.0%) \| 0 \|
	\| WCAG 2.2-new ⭐ \| 1.5/3 (50.0%) \| 2.0/3 (66.7%) \| +0.5 \|
	\| Negative/contamination gate \| 1.0/1 (100%) \| 1.0/1 (100%) \| 0 ✓ \|
	\| Total \| 16.0/30 (53.3%) \| 23.0/30 (76.7%) \| +7.0 (+23.4%) \|

	## Running the exam yourself

	```bash
	# Against a trained model already loaded in Ollama
	python exam/run_exam.py --model rockypod/public-a11y-coder:4b --output exam/trained/4b

	# Score after filling grades.json
	python exam/run_exam.py --score exam/trained/4b
	```

	Grading is manual (Full/Partial/Fail per rubric in `exam/a11y-30q.md`).

	## Reproducing training

	```bash
	# On a CUDA GPU server with the Unsloth venv installed
	nohup env PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True TORCHDYNAMO_DISABLE=1 \
	python train.py --size 4b > logs/train_4b.log 2>&1 &
	```

	`TORCHDYNAMO_DISABLE=1` is required — Qwen3 + Unsloth triggers Triton JIT compilation which fails on CUDA driver/toolkit version mismatches common on Rocky Linux GPU hosts.

	## Usage

	### Ollama (local)

	```bash
	ollama run rockypod/public-a11y-coder:14b
	# or for the portable demo model:
	ollama run rockypod/public-a11y-coder:4b
	```

	### OpenWebUI

	Add the model under Settings → Models → Ollama, point to your Ollama endpoint (default `http://localhost:11434`), select `rockypod/public-a11y-coder:14b` from the model list.

	### HuggingFace Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "rockypod/a11y-public-coder-4b"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

	messages = [
	{"role": "user", "content": "Write a Drupal 11 Twig snippet for an accessible image field with a skip-link-friendly heading structure."}
	]
	inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
	outputs = model.generate(inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Known limitations

	The v0.9.0 release ships with documented gaps to be addressed in v1.0:

	1. Drush flag accuracy — The 4B variant occasionally fabricates non-existent command flags (e.g. inventing `--target` or `--exclude` on commands where those flags do not exist). This is a training data quality issue traced to tier 7 generation; v1.0 will include a Drush command-reference validator before retraining.

	2. Contrast ratio computation — Small models cannot reliably compute color contrast ratios from arbitrary hex pairs. The model correctly identifies SC 1.4.3 (Contrast — Minimum) and can recall specific examples that appear in training (`#767676` on white = 4.48:1), but does not generalize to compute ratios for novel inputs. Recommend pairing with a deterministic contrast checker.

	3. WCAG 2.2-new exception coverage — SC 2.5.8 (Target Size — Minimum) has five distinct exception cases (offset, essential, inline, user-agent-controlled, equivalent). The 4B reliably outputs the headline `24×24 CSS pixels` AA threshold but covers only one of the five exception cases consistently. v1.0 will expand tier 16 with dedicated entries per exception type.

	4. SC-to-SC discrimination — The 4B occasionally confuses related success criteria (e.g. cites SC 2.1.1 + 2.1.2 for a missing button role where 4.1.2 is the primary criterion). v1.0 will add SC-discrimination pair entries to tier 17.

	5. Drupal 11 vs Drupal 10 distinction — While the dataset targets Drupal 11 exclusively, the underlying base model has substantial Drupal 7/8/9 pretraining priors. The contamination gate (tier 17 negative examples) holds at 100% on the exam, but in long-form generation some D7-era patterns may surface. Always validate generated Drupal code against the actual D11 API.

	## Recommended use cases

	Strong fit:
	- Generating Drupal 11 module scaffolds with accessibility baked in
	- Writing Playwright + axe-core / Alfa test files for agency sites
	- Drafting Python utility scripts for accessibility audits (PDF text layer detection, alt text auditing, heading hierarchy)
	- Explaining WCAG 2.2 success criteria to non-technical content editors
	- Drush 12 natural-language to command translation (with verification)

	Use with caution:
	- Contrast ratio calculations (verify with a deterministic checker)
	- Drush command flags (verify against `drush help <command>`)
	- Drupal 8/9 maintenance (this model is Drupal 11-targeted)

	Not designed for:
	- General-purpose coding outside the trained domains
	- Production-critical accessibility certification without human review
	- Handling sensitive citizen data in prompts

	## Roadmap

	\| Version \| Target \| Focus \|
	\|---\|---\|---\|
	\| v0.9.0 \| shipped \| Initial release, baselines published, ship gate intentionally below 80% with documented limitations \|
	\| v0.9.5 \| ~6 weeks \| Drush flag validation pass, contrast hex-pair expansion, SC 2.5.8 exception coverage \|
	\| v1.0.0 \| ~10 weeks \| All v0.9.0 limitations addressed, ≥85% on the 30Q exam \|

	The CRAFTED℠ methodology means each version uses real-world exam failures and user-reported issues as the correction stream for the next training cycle. The v1.0 release will include an expanded 60-question exam.

	## Reproducibility

	This release is reproducible end-to-end from the public artifacts:

	- Dataset: [`rockypod/a11y-public-coder-dataset`](https://huggingface.co/datasets/rockypod/a11y-public-coder-dataset) or `dataset/` in this repo
	- Training pipeline: [`train.py`](train.py) in this repo
	- Evaluation exam: [`exam/a11y-30q.md`](exam/a11y-30q.md)
	- Exam runner: [`exam/run_exam.py`](exam/run_exam.py)
	- Per-question grading results: [`exam/baselines/`](exam/baselines/) and [`exam/trained/`](exam/trained/)

	## Citation

	```bibtex
	@misc{a11y-public-coder-v0.9.0,
	author = {RockyPod},
	title = {a11y-public-coder: An open-source accessibility coding assistant for the public sector},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/rockypod/a11y-public-coder-4b}},
	}
	```

	## Acknowledgments

	- Base models: [Qwen team](https://github.com/QwenLM) — `Qwen3-4B` and `Qwen3-14B` are MIT-licensed open weights
	- Accessibility tooling: [Deque axe-core](https://github.com/dequelabs/axe-core), [Siteimprove Alfa](https://github.com/Siteimprove/alfa)
	- Web standards: [W3C WAI](https://www.w3.org/WAI/) for the WCAG 2.2 specification and Understanding documents
	- Training infrastructure: [Unsloth](https://github.com/unslothai/unsloth), [llama.cpp](https://github.com/ggerganov/llama.cpp), [Ollama](https://ollama.com/)

	## License

	MIT. See [LICENSE](LICENSE) for full text. Free for any use including commercial, including by government agencies.