Patrick Hill PRO

pbhappliedsystems
Β·

AI & ML interests

PBH Applied Systems publishes evaluated open-weight GGUF models for practical AI deployment, with an emphasis on quantized inference, agentic workflows, structured outputs, tool use, and production reliability. Every model published under this organization is converted, evaluated, and documented by PBH Applied Systems using its proprietary `quant_eval` framework. The evaluation process compares full-precision and quantized variants across agent-adjacent task families including structured JSON output, tool dispatch, multi-turn state retention, mixed natural language plus JSON responses, multiple-choice extraction, fuzz-style constraint adherence, and multi-step planning. These model cards are designed to support deployment decisions, not just model discovery. Each card documents practical behavior, quantization trade-offs, failure modes, recommended use cases, hardware requirements, and guardrails for production use. Try the live PBH Applied Systems AI Agent Demo: https://pbhappliedsystems.com/assistant.html The demo lets visitors interact with evaluated quantized open-weight models across reasoning, document intelligence, and code automation workflows running on private GPU infrastructure.

Recent Activity

posted an update about 17 hours ago
πŸš€ **New flagship dataset β€” and an argument about what a dataset card should be.** Most synthetic datasets on the Hub ship row counts, a license, and little else β€” pipeline opaque, rejection criteria unstated, compliance unaudited. We published the opposite. **SynthEval Cloud β€” Regulated-Domain Synthetic Instruction Dataset** πŸ‘‰ https://huggingface.co/datasets/pbhappliedsystems/syntheval-cloud-regulated-instruct-1k **1,116** quality-gated instruction records across **7 regulated domains** (medical, legal, GDPR, privacy, education, e-commerce, transport). Every record cleared a documented cascade, not a vibe check: - πŸ§ͺ **Dual-signal hallucination gate** β€” rejects only when embedding cosine *and* keyword-overlap both fail; a low score alone never rejects. - πŸ”’ **Layered PII masking + independent leak audit** β€” a separate over-reporting scanner found **0.0% residual leak** across all 1,116 records. - πŸ“Š **Whole-corpus evaluation, not a sample** β€” MATTR **0.769**, mean cosine **0.73**, **0%** near-duplicates, **96.9%** yield. - 🧾 **The 36 rejections ship too**, each tagged with its failing gate. Removal at the gate is the product; we show our work. Every number on the card is a field in the `evaluation_report.json` shipped beside the data β€” full methodology + provenance (Mistral-Nemo AWQ W4A16 Β· vLLM 0.8.5.post1 Β· Modal A10G). One release from **SynthEval**: Studio (local GPU) + Cloud (Modal+vLLM), proving quality parity across substrates. πŸ“„ Whitepaper: https://pbhappliedsystems.com/SynthEval_Studio_and_Cloud_Quality-Gated_Synthetic_Data_Generation.pdf πŸ”Ž Overview: https://pbhappliedsystems.com/synthetic-data.html **CC BY 4.0** β€” commercial use welcome, just credit it. Need defensible synthetic data at scale? Let's talk. β€” Patrick Hill, PBH Applied Systems
updated a dataset about 17 hours ago
pbhappliedsystems/syntheval-cloud-regulated-instruct-1k
published a dataset about 17 hours ago
pbhappliedsystems/syntheval-cloud-regulated-instruct-1k
View all activity

Organizations

None yet