esherialabs/lexchat-Llama-3.1-8B-GRPO — Kenya Court Submissions Generator
A Kenya-first, RL-tuned Llama-3.1-8B checkpoint (GRPO on our SFT base) engineered for court-ready submissions. Rewarded for spine discipline (BLUF → Governing Rules → Controlling Holdings → Application → Relief Sought), side alignment, jurisdiction purity, and wordband control—so drafts land clean and on-spec. Purpose-built to democratize legal drafting capacity for NGOs, legal aid clinics, community justice centers, and paralegal programs under advocate supervision.
1) Model Summary
- Task: Draft structured submissions from facts + issues + side + posture for Kenyan courts.
- Output spine: BLUF → Governing Rules → Controlling Holdings → Application → Relief Sought.
- RL objective: Reward the policy for structure, side correctness, Kenya-only doctrine/lexicon, and wordband discipline; lightly reward doctrine coverage keyed to the prompted issues.
2) Intended Use & Audience
Who this is for
- NGOs / CBOs, legal-aid clinics, community justice centers.
- Paralegals and law students working under advocate supervision.
- Civic-tech / research teams building access-to-justice tooling.
Primary use cases
- First-pass drafting (injunctions, JR, land, employment, constitutional).
- Counter-position generation (e.g., Respondent reply) for client education.
- Issue-spotting & pedagogy with a Kenya-specific doctrinal spine.
Out of scope
- Direct consumer legal advice without a licensed advocate.
- Non-Kenyan jurisdictions or filings without human review.
- Automated mass-filing or claim-spam.
3) Training Data for RL
- Task pool: Prompts synthesized from
esherialabs/kenya-court-submissions-qa(15k Q/A SFT set), plus posture/side variants and short issue packs (rule keys per remedy: Giella/Nguruman tests, proportionality, JR standards, indefeasibility, etc.). - Targets: No single “gold text.” Rewards are programmatic (structure/stance/jurisdiction/length/cite-format) with light doctrine-coverage checks sourced from curated rulepacks.
- Privacy & provenance: No PII; users remain responsible for compliance with the Kenya Data Protection Act, 2019.
4) RL Training Procedure (GRPO)
Base policy: our Kenya-legal SFT checkpoint on Lexchat-Llama-3.1-8B-Instruct
Parameter-efficient RL: LoRA updates only; 4-bit base frozen
- Loading: 4-bit NF4 (double-quant), bf16 compute; gradient checkpointing.
- LoRA:
r=32,alpha=16,dropout=0.05on attention + MLP projections. - GRPO config: prompt budget 256 tokens, completion window 768 tokens; num_generations=6 per prompt; batch size ≈1 (memory-bounded); cosine LR with warmup; gradient clip 0.1.
- Optimizer: memory-friendly AdamW (paged) suitable for low-VRAM loops.
- **Hardware profile:1× NVIDIA A10 (24 GB); ~196 GiB RAM host; 4-bit + LoRA keeps rollout latency in check.
Why GRPO here? Each prompt spawns 6 completions; relative advantages are computed within the group, so winners are pushed up while underperformers are suppressed—an implicit self-competition that works well with tiny batches and mixed, programmatic rewards.
5) Reward Stack
All heads are summed. A “perfectly formatted, on-side, on-jurisdiction, on-band” answer tops out near 4.0. Partial structure still earns partial credit to avoid collapse.
Section-Order Rewards (0 → 1.5)
soft_spine_reward(+0.5): all five section headers present in order (case-insensitive; tolerates spacing).strict_spine_reward(+0.5): headers present with newline discipline and bullet semantics where required (• in Application).header_quality_reward(+0.5): BLUF starts with a clear, one-paragraph thesis; Relief ends with concrete prayers.
Side-Alignment Reward (0 → 0.5)
- Checks stance markers (“Applicant/Respondent/Plaintiff/Defendant/Appellant” etc.) against the instructed side; penalizes self-contradictions (“we submit” vs. opposing relief).
Jurisdiction-Purity Reward (0 → 0.5)
- Penalizes foreign law unless flagged persuasive; rewards Kenya-centric lexicon and neutral citation style (no links).
- Lightweight denylist for non-Kenya reporters; allowlist for common Kenya tests (e.g., Giella, Nguruman, proportionality, Article 47, Anarita Karimi / Mumo Matemu standards).
Wordband Reward (0 → 0.5)
- Pays out when total tokens land within a configured operational band (e.g., ~450–700 words for typical submissions).
Cite-Format Reward (0 → 0.5)
- Encourages case names/neutral citations; discourages URLs; rewards consistent formatting (no pin-cite hallucinations).
Doctrine-Coverage Reward (0 → 0.5)
- Matches issue-keyed rulepacks (e.g., injunction → Giella/Nguruman, balance of convenience, irreparable harm); fuzzy match with tolerance for paraphrase.
No separate judge model. All verifiers are deterministic string/AST checks and rulepack lookups. This keeps the loop single-model and hardware-friendly.
6) Verifier Mechanics
- Deterministic parsers, not regex soup: string/section parsers and small ASTs for the five-part spine; resilient to whitespace.
- Progressive strictness: soft → strict checks to provide dense early reward while the policy learns layout, then newline and bullet fidelity.
- Realtime logging: per-head scores + sampled completions are logged for fast failure analysis (e.g., off-side or foreign-law creep).
7) Small-Model Tactics
- LoRA on the highest-leverage submodules (attn/MLP) only.
- Conservative LR schedule + clip=0.1 to tame gradient variance.
- 6-way sampling at batch size ~1 so GRPO’s relative baseline still works under tight VRAM.
- 4-bit loading to keep rollouts snappy despite multiple verifier passes.
8) Prompting & Inference
Recommended chat format
<|system|>
You are a Kenya court submissions generator. Output MUST follow:
BLUF → Governing Rules → Controlling Holdings → Application → Relief Sought.
Kenya authorities only unless marked persuasive.
<|user|>
Draft from the Applicant’s perspective in a Judicial Review (interlocutory stay).
Facts: [salient facts].
Issues: (1) Article 47 procedural fairness (2) Threshold for stay pending JR.
Constraints: Max ~600 words. Submissions style. Bullets only in Application.
<|assistant|>
Python (merged checkpoint)
from transformers import AutoModelForCausalLM, AutoTokenizer
name = "esherialabs/lexchat-Llama-3.1-8B-GRPO" # replace with final repo id
tok = AutoTokenizer.from_pretrained(name, use_fast=True)
m = AutoModelForCausalLM.from_pretrained(name, torch_dtype="auto", device_map="auto")
prompt = "<|system|>\\n...see above...\\n<|user|>\\n...case brief...\\n<|assistant|>\\n"
ids = tok(prompt, return_tensors="pt").to(m.device)
out = m.generate(**ids, max_new_tokens=900, temperature=0.25, top_p=0.9, do_sample=True, eos_token_id=tok.eos_token_id)
print(tok.decode(out[0][ids['input_ids'].shape[1]:], skip_special_tokens=True))
Inference policy
- Temperature 0.2–0.35,
top_p=0.9; max_new_tokens ≤ 900. - Post-gen linter: spine order, wordband, Kenya-only lexicon, no URLs.
- Optionally pair with a cite-checker or an allow-listed retrieval layer.
9) Evaluation Protocol
Automated
- Structure pass (%) — five sections, order, newline/bullets.
- Side correctness (%) — matches instructed side.
- Jurisdiction purity (%) — Kenya-only unless flagged persuasive.
- Doctrine coverage (%) — hits issue-keyed rule anchors.
- Wordband adherence (%) — within operational target.
Human review
- Persuasiveness, clarity, and fact fidelity.
- Filing readiness (minor edits only).
- Track red-lines to continuously enrich rulepacks/rewards.
10) Safety, Risks, and Limitations
- Not legal advice. Advocate supervision is mandatory.
- Evolving law: Doctrine shifts; keep retrieval/cite-bases fresh.
- Reward hacking risk: Programmatic rewards can be gamed; we audit samples and rotate rulepacks.
- Bias/coverage gaps: Report misses via Issues; we will iterate.
11) Deployment Blueprint (NGOs & Clinics)
- Intake templates for facts/issues/side/posture/dates.
- Generation policy (fixed temperature, max tokens).
- Automated linter (structure/wordband/jurisdiction).
- Advocate review checkpoint before filing.
- Audit & privacy: scrub PII; log drafts/finals; comply with Kenya DPA 2019.
12) Reproducibility
- Base:
esherialabs/lexchat-Llama-3.1-8B-Instruct(SFT on Kenya submissions spine). - RL (GRPO): prompt 256, completion 768, num_generations=6; LoRA
r=32, α=16, p=0.05; 4-bit NF4; bf16 compute; paged AdamW; cosine LR; warmup; clip=0.1; gradient checkpointing. - Hardware: single-GPU, 4-bit friendly (A10/4090/L40S-class).
- Code patterns: single-model loop; deterministic verifiers; no judge model.
13) Example Prompts (NGO scenarios)
Legal aid clinic — eviction
You are drafting from the Applicant’s perspective in a Kenyan Constitutional Petition (interim conservatory orders).
Facts: Community clinic evicted without notice from county-owned premises after serving low-income patients.
Issues: (1) Article 47 procedural fairness (2) Article 40 property/use rights vs public interest (3) Conservatory order threshold.
Constraints: BLUF → Governing Rules → Controlling Holdings → Application → Relief Sought. Kenya authorities only. Max ~600 words.
Paralegal — employment dispute
You are drafting from the Respondent’s perspective in an Employment & Labour Court appeal.
Facts: Termination for cause with documented warnings; claimant alleges unfair termination.
Issues: (1) Fair procedure under Employment Act (2) Burden of proof and remedies.
Constraints: [same as above]
14) Responsible AI & Community Governance
- Human-in-the-loop: Always route outputs to a supervising advocate before filing.
- Community input: NGOs and clinics are invited to open issues with examples where the model under-serves marginalized groups or misses doctrine.
15) Versions
- v2.0.0-GRPO (2025-10-29): RL-tuned on Kenya legal SFT checkpoint; structure/side/jurisdiction rewards; improved spine discipline.
- v1.0.0-SFT: initial Kenya submissions generator (15k pairs).
16) Citation
Esheria Labs (2025). LexChat 8B — Kenya Court Submissions Generator (GRPO).
https://huggingface.co/esherialabs/esherialabs/lexchat-Llama-3.1-8B-GRPO
Trained on: https://huggingface.co/datasets/esherialabs/kenya-court-submissions-qa
15) Contact
- Maintainer: Esheria Ventures
- Website: https://esheria.co.ke
- NGO / community partnerships: partnerships@esheria.co.ke
- Enterprise / pro-bono: support@esheria.co.ke
- Issues / rulepack feedback: open a Discussion/Issue on the model repo
- Downloads last month
- 3
Model tree for esherialabs/lexchat-Llama-3.1-8B-GRPO
Base model
meta-llama/Llama-3.1-8B