Text Generation
Transformers
GGUF
English
accessibility
wcag
wcag-2.2
drupal
drupal-11
php
drush
python
playwright
axe-core
siteimprove-alfa
government
public-sector
code
Eval Results (legacy)
conversational
Instructions to use rockypod/a11y-public-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rockypod/a11y-public-coder with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="rockypod/a11y-public-coder") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("rockypod/a11y-public-coder", dtype="auto") - llama-cpp-python
How to use rockypod/a11y-public-coder with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rockypod/a11y-public-coder", filename="a11y-public-coder-14b-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use rockypod/a11y-public-coder with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rockypod/a11y-public-coder:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rockypod/a11y-public-coder:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rockypod/a11y-public-coder:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rockypod/a11y-public-coder:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf rockypod/a11y-public-coder:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf rockypod/a11y-public-coder:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf rockypod/a11y-public-coder:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf rockypod/a11y-public-coder:Q4_K_M
Use Docker
docker model run hf.co/rockypod/a11y-public-coder:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use rockypod/a11y-public-coder with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rockypod/a11y-public-coder" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rockypod/a11y-public-coder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rockypod/a11y-public-coder:Q4_K_M
- SGLang
How to use rockypod/a11y-public-coder with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rockypod/a11y-public-coder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rockypod/a11y-public-coder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rockypod/a11y-public-coder" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rockypod/a11y-public-coder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use rockypod/a11y-public-coder with Ollama:
ollama run hf.co/rockypod/a11y-public-coder:Q4_K_M
- Unsloth Studio new
How to use rockypod/a11y-public-coder with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rockypod/a11y-public-coder to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rockypod/a11y-public-coder to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rockypod/a11y-public-coder to start chatting
- Pi new
How to use rockypod/a11y-public-coder with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rockypod/a11y-public-coder:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "rockypod/a11y-public-coder:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use rockypod/a11y-public-coder with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rockypod/a11y-public-coder:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default rockypod/a11y-public-coder:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use rockypod/a11y-public-coder with Docker Model Runner:
docker model run hf.co/rockypod/a11y-public-coder:Q4_K_M
- Lemonade
How to use rockypod/a11y-public-coder with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull rockypod/a11y-public-coder:Q4_K_M
Run and chat with the model
lemonade run user.a11y-public-coder-Q4_K_M
List all available models
lemonade list
| license: mit | |
| language: | |
| - en | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - accessibility | |
| - wcag | |
| - wcag-2.2 | |
| - drupal | |
| - drupal-11 | |
| - php | |
| - drush | |
| - python | |
| - playwright | |
| - axe-core | |
| - siteimprove-alfa | |
| - government | |
| - public-sector | |
| - code | |
| base_model: | |
| - Qwen/Qwen3-4B | |
| - Qwen/Qwen3-14B | |
| datasets: | |
| - rockypod/a11y-public-coder-dataset | |
| model-index: | |
| - name: a11y-public-coder | |
| results: | |
| - task: | |
| type: text-generation | |
| name: WCAG 2.2 AA Accessibility Coding Exam | |
| metrics: | |
| - type: accuracy | |
| name: 30-question exam (4B) | |
| value: 73.3 | |
| - type: accuracy | |
| name: 30-question exam (14B) | |
| value: 76.7 | |
| # a11y-public-coder | |
| **Open-source accessibility coding assistant for the public sector.** WCAG 2.2 Level AA conformance, Drupal 11, PHP 8.3, Drush 12, Python 3.12, and Playwright (TypeScript) with both axe-core and Siteimprove Alfa. | |
| > Version **0.9.0** · License **MIT** · Released 2026-05-17 | |
| **[HuggingFace — weights](https://huggingface.co/rockypod/a11y-public-coder-4b)** · | |
| **[Install via Ollama](https://ollama.com/rockypod/public-a11y-coder)** — `ollama pull rockypod/public-a11y-coder` · | |
| **[GitHub — exam, dataset, training pipeline](https://github.com/rockypod/public_a11y_coder)** | |
| ## Quick reference | |
| | | 4B | 14B | | |
| |---|---|---| | |
| | **Base model** | `Qwen/Qwen3-4B` | `Qwen/Qwen3-14B` | | |
| | **Quantization** | Q4_K_M GGUF (~2.5 GB) | Q4_K_M GGUF (~9 GB) | | |
| | **Recommended use** | Demo, non-technical explanation, portable inference | Daily-driver technical work, OpenWebUI deployment | | |
| | **Exam score** | **73.3%** (22.0/30) | **76.7%** (23.0/30) | | |
| | **Lift vs base** | **+28.3%** (from 45.0% baseline) | **+23.4%** (from 53.3% baseline) | | |
| | **Ollama tag** | `rockypod/public-a11y-coder:4b` | `rockypod/public-a11y-coder:14b` | | |
| ## What's in this repo | |
| | Path | Description | | |
| |---|---| | |
| | `exam/a11y-30q.md` | Full 30-question evaluation exam with rubric | | |
| | `exam/run_exam.py` | Exam runner — collects responses and scores | | |
| | `exam/baselines/` | Pre-training baseline grades (qwen3:4b, 8b, 14b) | | |
| | `exam/trained/` | Post-training grades per model size | | |
| | `dataset/tier*.jsonl` | Full training corpus — 1,930 pairs across 18 tiers | | |
| | `train.py` | Full training pipeline (consolidate → LoRA → merge → GGUF) | | |
| | `Modelfile` | Ollama Modelfile (4B production ChatML template) | | |
| Large artifacts (checkpoints, merged HF weights, GGUF) are not in this repo — download GGUFs from the HuggingFace model page or pull via Ollama. | |
| ## Intended use | |
| `a11y-public-coder` is designed for use by government agencies, public-sector developers, accessibility professionals, and any developer maintaining Drupal 11 sites that must meet WCAG 2.2 Level AA. The model produces: | |
| - Drupal 11 module, theme, and Twig code that follows accessibility best practices | |
| - Drush 12 CLI commands and custom command authoring | |
| - Python 3.12 utility scripts for accessibility-aware file operations (PDF text layer detection, alt audit, heading hierarchy) | |
| - Playwright (TypeScript) test scaffolds using `@axe-core/playwright` and `@siteimprove/alfa-playwright` | |
| - WCAG 2.2 AA explanations cited by success criterion number, suitable for both developers and non-technical content editors | |
| The 4B variant is optimized for portable demonstrations (runs comfortably in a Windows 11 VM with 8 GB allocation) and explanation-first responses. The 14B variant is the primary daily-driver, targeted at OpenWebUI deployment on agency or homelab hardware. | |
| ## Privacy-first training | |
| This model was trained under explicit privacy constraints documented in the dataset card and verifiable in the public training corpus: | |
| - **No PII** in any training entry — no real names, addresses, emails, phone numbers, case numbers, or social security numbers | |
| - **No real URLs, hostnames, or production domain names** — all examples use `example.gov`, `gov.example.org`, or `agency.example` placeholders | |
| - **No scraped production agency content** — every training example was authored from publicly available official documentation: drupal.org, php.net, docs.python.org, playwright.dev, alfa.siteimprove.com, w3.org/WAI/WCAG22/, drush.org | |
| - **Full dataset, exam questions, and per-question grading results are public** — see the [`a11y-public-coder-dataset`](https://huggingface.co/datasets/rockypod/a11y-public-coder-dataset) repository | |
| The privacy-first training approach minimizes the risk of memorized PII surfacing in outputs, but does not eliminate standard LLM safety considerations. | |
| ## Security and compliance — agency responsibility | |
| `a11y-public-coder` is designed for self-hosted deployment. Deploying organizations remain responsible for their own security and privacy posture: | |
| - **Self-host**: run via Ollama, vLLM, llama.cpp, or similar so that no prompts leave your network | |
| - **No certifications**: this model has not been independently certified against NIST 800-53, FedRAMP, CJIS, HIPAA, FERPA, or state-specific frameworks. Agencies must independently validate fitness for their compliance context | |
| - **No sensitive data in prompts**: do not paste citizen PII, case numbers, or other sensitive content. The model is a code/audit assistant, not a data-handling system | |
| - **Output review**: model output is a suggestion, not authoritative. Human review is required before deployment | |
| - **Access controls and audit logging** are the operator's responsibility | |
| ## Training methodology | |
| `a11y-public-coder` was trained using the **CRAFTED℠ (Continuous Retrieval-Augmented Fine-Tuning, Evaluate, Deploy)** pipeline: | |
| 1. **Source corpus assembly** — 1,930 training pairs generated from official documentation (drupal.org, drush.org, playwright.dev, alfa.siteimprove.com, w3.org/WAI/WCAG22/, php.net, docs.python.org) by a local teacher model (`qwen3:30b` via Ollama) | |
| 2. **CRAFTED℠ correction stream** — Every generated entry was reviewed against domain-specific failure-mode filters (e.g. the WCAG 2.5.8 = 24×24 vs 2.5.5 = 44×44 contamination check, the Drupal 7/8/9 → Drupal 11 API leakage check, the Python-vs-TypeScript Playwright fallback check). 1,925 of 1,930 entries passed auto-acceptance with rule-based filters; 5 were manually corrected; 2 additional issues were flagged by a Drupal-specific D7/D8 API validator and corrected. Final corrected entries: 1,930/1,930. | |
| 3. **Fine-tuning** — Unsloth + LoRA (r=16, alpha=16, no dropout) on NVIDIA RTX 3090 Ti, 4 epochs at learning rate 2e-4 with cosine schedule. The 4B run reweights `demo_friendly` entries by 1.5× and downsamples entries with `len(assistant) > 1800 chars` by 0.7× to favor explanation-leaning content; the 14B run uses the full distribution without reweighting. | |
| 4. **Conversion** — GGUF via pinned `llama.cpp` commit `57819b8d4` with `--outtype f16`, quantized to Q4_K_M for serving. Modelfile uses ChatML template override for tokenizer consistency. | |
| The full pipeline is reproducible from the [training scripts](https://github.com/rockypod/public_a11y_coder) in this repository. | |
| ## Dataset | |
| The training corpus is **1,930 high-quality instruction-response pairs across 18 tiers**, fully open and downloadable from the [dataset repository](https://huggingface.co/datasets/rockypod/a11y-public-coder-dataset) or from `dataset/` in this repo: | |
| | Tier | Domain | Entries | | |
| |---|---|---| | |
| | 1 | Drupal 11 core fundamentals | 100 | | |
| | 2 | Drupal 11 contrib stack (Webform, Paragraphs, Views, Pathauto, Metatag) | 100 | | |
| | 3 | Drupal 11 Twig 3 templating | 100 | | |
| | 4 | Drupal 11 custom modules | 100 | | |
| | 5 | Drupal 11 accessibility patterns | 100 | | |
| | 6 | Drupal-flavored PHP 8.3 | 100 | | |
| | 7 | Drush 12 CLI usage | 100 | | |
| | 8 | Drush 12 custom command authoring | 100 | | |
| | 9 | Python 3.12 folder/file utilities | 100 | | |
| | 10 | Python 3.12 file conversion | 100 | | |
| | 11 | Python 3.12 accessibility-aware utilities | 100 | | |
| | 12 | Playwright (TypeScript) fundamentals | 100 | | |
| | 13 | Playwright + `@axe-core/playwright` | 140 | | |
| | 14 | Playwright + `@siteimprove/alfa-playwright` | 130 | | |
| | 15 | WCAG 2.2 AA — pre-2.2 carryover SCs | 80 | | |
| | 16 | WCAG 2.2-new success criteria (9 new SCs) | 140 | | |
| | 17 | Negative-example / contamination correction pairs | 140 | | |
| | 18 | End-to-end multi-domain scenarios | 100 | | |
| | **Total** | | **1,930** | | |
| ## Evaluation | |
| Models are evaluated against a 30-question exam covering all training domains, scored **Full (1.0) / Partial (0.5) / Fail (0.0)** per question, max 30.0 points. The exam is **published in full**, including grading rubrics: see [`exam/a11y-30q.md`](exam/a11y-30q.md). | |
| **Pre-training baselines and post-training results** are published in `exam/`, with per-question grades: | |
| ### Summary | |
| | Model | Total | Percentage | | |
| |---|---|---| | |
| | `qwen3:4b` baseline | 13.5/30 | 45.0% | | |
| | `qwen3:8b` baseline | 17.0/30 | 56.7% | | |
| | `qwen3:14b` baseline | 16.0/30 | 53.3% | | |
| | **`a11y-public-coder:4b` (trained)** | **22.0/30** | **73.3%** | | |
| | **`a11y-public-coder:14b` (trained)** | **23.0/30** | **76.7%** | | |
| ### Per-domain results — 4B trained vs baseline `qwen3:4b` | |
| | Domain | Baseline | Trained 4B | Lift | | |
| |---|---|---|---| | |
| | Drupal 11 | 2.0/8 (25%) | 6.0/8 (75%) | **+4.0** ⬆ | | |
| | PHP 8.3 | 0.5/2 (25%) | 1.0/2 (50%) | +0.5 | | |
| | Drush 12 | 2.0/3 (67%) | 1.5/3 (50%) | -0.5 ⬇ | | |
| | Python 3.12 | 2.5/4 (63%) | 4.0/4 (100%) | **+1.5** ✓ | | |
| | Playwright + axe-core | 0.5/3 (17%) | 2.0/3 (67%) | **+1.5** ⬆ | | |
| | Playwright + Alfa | 0.5/2 (25%) | 1.5/2 (75%) | **+1.0** ⬆ | | |
| | WCAG 2.2 AA (carryover) | 3.0/4 (75%) | 3.0/4 (75%) | 0 | | |
| | WCAG 2.2-new ⭐ | 1.5/3 (50%) | 2.0/3 (67%) | +0.5 | | |
| | Negative/contamination gate | 1.0/1 (100%) | 1.0/1 (100%) | 0 ✓ | | |
| | **Total** | **13.5/30 (45.0%)** | **22.0/30 (73.3%)** | **+8.5 (+28.3%)** | | |
| ### Per-domain results — 14B trained vs baseline `qwen3:14b` | |
| | Domain | Baseline | Trained 14B | Lift | | |
| |---|---|---|---| | |
| | Drupal 11 | 3.0/8 (37.5%) | 6.5/8 (81.3%) | **+3.5** ⬆ | | |
| | PHP 8.3 | 1.5/2 (75.0%) | 1.5/2 (75.0%) | 0 | | |
| | Drush 12 | 1.5/3 (50.0%) | 1.0/3 (33.3%) | -0.5 ⬇ | | |
| | Python 3.12 | 2.5/4 (62.5%) | 3.5/4 (87.5%) | **+1.0** ⬆ | | |
| | Playwright + axe-core | 1.0/3 (33.3%) | 2.5/3 (83.3%) | **+1.5** ⬆ | | |
| | Playwright + Alfa | 1.0/2 (50.0%) | 2.0/2 (100%) | **+1.0** ✓ | | |
| | WCAG 2.2 AA (carryover) | 3.0/4 (75.0%) | 3.0/4 (75.0%) | 0 | | |
| | WCAG 2.2-new ⭐ | 1.5/3 (50.0%) | 2.0/3 (66.7%) | +0.5 | | |
| | Negative/contamination gate | 1.0/1 (100%) | 1.0/1 (100%) | 0 ✓ | | |
| | **Total** | **16.0/30 (53.3%)** | **23.0/30 (76.7%)** | **+7.0 (+23.4%)** | | |
| ## Running the exam yourself | |
| ```bash | |
| # Against a trained model already loaded in Ollama | |
| python exam/run_exam.py --model rockypod/public-a11y-coder:4b --output exam/trained/4b | |
| # Score after filling grades.json | |
| python exam/run_exam.py --score exam/trained/4b | |
| ``` | |
| Grading is manual (Full/Partial/Fail per rubric in `exam/a11y-30q.md`). | |
| ## Reproducing training | |
| ```bash | |
| # On a CUDA GPU server with the Unsloth venv installed | |
| nohup env PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True TORCHDYNAMO_DISABLE=1 \ | |
| python train.py --size 4b > logs/train_4b.log 2>&1 & | |
| ``` | |
| `TORCHDYNAMO_DISABLE=1` is required — Qwen3 + Unsloth triggers Triton JIT compilation which fails on CUDA driver/toolkit version mismatches common on Rocky Linux GPU hosts. | |
| ## Usage | |
| ### Ollama (local) | |
| ```bash | |
| ollama run rockypod/public-a11y-coder:14b | |
| # or for the portable demo model: | |
| ollama run rockypod/public-a11y-coder:4b | |
| ``` | |
| ### OpenWebUI | |
| Add the model under Settings → Models → Ollama, point to your Ollama endpoint (default `http://localhost:11434`), select `rockypod/public-a11y-coder:14b` from the model list. | |
| ### HuggingFace Transformers | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_name = "rockypod/a11y-public-coder-4b" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") | |
| messages = [ | |
| {"role": "user", "content": "Write a Drupal 11 Twig snippet for an accessible image field with a skip-link-friendly heading structure."} | |
| ] | |
| inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device) | |
| outputs = model.generate(inputs, max_new_tokens=512) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| ## Known limitations | |
| The v0.9.0 release ships with documented gaps to be addressed in v1.0: | |
| 1. **Drush flag accuracy** — The 4B variant occasionally fabricates non-existent command flags (e.g. inventing `--target` or `--exclude` on commands where those flags do not exist). This is a training data quality issue traced to tier 7 generation; v1.0 will include a Drush command-reference validator before retraining. | |
| 2. **Contrast ratio computation** — Small models cannot reliably compute color contrast ratios from arbitrary hex pairs. The model correctly identifies SC 1.4.3 (Contrast — Minimum) and can recall specific examples that appear in training (`#767676` on white = 4.48:1), but does not generalize to compute ratios for novel inputs. Recommend pairing with a deterministic contrast checker. | |
| 3. **WCAG 2.2-new exception coverage** — SC 2.5.8 (Target Size — Minimum) has five distinct exception cases (offset, essential, inline, user-agent-controlled, equivalent). The 4B reliably outputs the headline `24×24 CSS pixels` AA threshold but covers only one of the five exception cases consistently. v1.0 will expand tier 16 with dedicated entries per exception type. | |
| 4. **SC-to-SC discrimination** — The 4B occasionally confuses related success criteria (e.g. cites SC 2.1.1 + 2.1.2 for a missing button role where 4.1.2 is the primary criterion). v1.0 will add SC-discrimination pair entries to tier 17. | |
| 5. **Drupal 11 vs Drupal 10 distinction** — While the dataset targets Drupal 11 exclusively, the underlying base model has substantial Drupal 7/8/9 pretraining priors. The contamination gate (tier 17 negative examples) holds at 100% on the exam, but in long-form generation some D7-era patterns may surface. Always validate generated Drupal code against the actual D11 API. | |
| ## Recommended use cases | |
| **Strong fit:** | |
| - Generating Drupal 11 module scaffolds with accessibility baked in | |
| - Writing Playwright + axe-core / Alfa test files for agency sites | |
| - Drafting Python utility scripts for accessibility audits (PDF text layer detection, alt text auditing, heading hierarchy) | |
| - Explaining WCAG 2.2 success criteria to non-technical content editors | |
| - Drush 12 natural-language to command translation (with verification) | |
| **Use with caution:** | |
| - Contrast ratio calculations (verify with a deterministic checker) | |
| - Drush command flags (verify against `drush help <command>`) | |
| - Drupal 8/9 maintenance (this model is Drupal 11-targeted) | |
| **Not designed for:** | |
| - General-purpose coding outside the trained domains | |
| - Production-critical accessibility certification without human review | |
| - Handling sensitive citizen data in prompts | |
| ## Roadmap | |
| | Version | Target | Focus | | |
| |---|---|---| | |
| | **v0.9.0** | **shipped** | Initial release, baselines published, ship gate intentionally below 80% with documented limitations | | |
| | v0.9.5 | ~6 weeks | Drush flag validation pass, contrast hex-pair expansion, SC 2.5.8 exception coverage | | |
| | v1.0.0 | ~10 weeks | All v0.9.0 limitations addressed, ≥85% on the 30Q exam | | |
| The CRAFTED℠ methodology means each version uses real-world exam failures and user-reported issues as the correction stream for the next training cycle. The v1.0 release will include an expanded 60-question exam. | |
| ## Reproducibility | |
| This release is reproducible end-to-end from the public artifacts: | |
| - **Dataset:** [`rockypod/a11y-public-coder-dataset`](https://huggingface.co/datasets/rockypod/a11y-public-coder-dataset) or `dataset/` in this repo | |
| - **Training pipeline:** [`train.py`](train.py) in this repo | |
| - **Evaluation exam:** [`exam/a11y-30q.md`](exam/a11y-30q.md) | |
| - **Exam runner:** [`exam/run_exam.py`](exam/run_exam.py) | |
| - **Per-question grading results:** [`exam/baselines/`](exam/baselines/) and [`exam/trained/`](exam/trained/) | |
| ## Citation | |
| ```bibtex | |
| @misc{a11y-public-coder-v0.9.0, | |
| author = {RockyPod}, | |
| title = {a11y-public-coder: An open-source accessibility coding assistant for the public sector}, | |
| year = {2026}, | |
| publisher = {Hugging Face}, | |
| howpublished = {\url{https://huggingface.co/rockypod/a11y-public-coder-4b}}, | |
| } | |
| ``` | |
| ## Acknowledgments | |
| - Base models: [Qwen team](https://github.com/QwenLM) — `Qwen3-4B` and `Qwen3-14B` are MIT-licensed open weights | |
| - Accessibility tooling: [Deque axe-core](https://github.com/dequelabs/axe-core), [Siteimprove Alfa](https://github.com/Siteimprove/alfa) | |
| - Web standards: [W3C WAI](https://www.w3.org/WAI/) for the WCAG 2.2 specification and Understanding documents | |
| - Training infrastructure: [Unsloth](https://github.com/unslothai/unsloth), [llama.cpp](https://github.com/ggerganov/llama.cpp), [Ollama](https://ollama.com/) | |
| ## License | |
| MIT. See [LICENSE](LICENSE) for full text. Free for any use including commercial, including by government agencies. | |