CMMC Expert 72B
A locally-hosted, fine-tuned language model specialized in CMMC 2.0, NIST 800-171, NIST 800-53, HIPAA, DFARS, and cybersecurity compliance frameworks.
This is the 72B variant — the most capable model in the suite, designed for complex multi-framework analysis and comprehensive compliance reasoning. Part of a four-model suite (7B, 14B, 32B, 72B) sharing the same compliance knowledge base.
Quick Start (Ollama)
# Download and run
ollama pull Nathan-Maine/cmmc-expert-72b
# Ask a compliance question
ollama run cmmc-expert-72b "What access controls are required for CMMC Level 2?"
# Or use the OpenAI-compatible API
curl http://localhost:11434/api/generate -d '{
"model": "cmmc-expert-72b",
"prompt": "What are the key differences between CMMC Level 1 and Level 2?",
"stream": false
}'
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen2.5-72B-Instruct (abliterated variant) |
| Parameters | 72.7 billion |
| Fine-Tuning Method | QLoRA (4-bit base, LoRA rank 8, alpha 16) |
| Quantization | q4_k_m (GGUF) |
| File Size | 44.2 GB |
| Context Length | 32,768 tokens |
| Training Hardware | NVIDIA A100 80GB SXM (RunPod) |
| Training Time | ~16.9 hours |
| Training Framework | Unsloth + HuggingFace TRL + PEFT |
Why Abliterated?
The base model uses an abliterated variant of Qwen2.5-Instruct. Standard instruction-tuned models refuse to discuss vulnerability details, attack patterns, and specific exploitation techniques — all of which are essential for compliance work. Abliteration removes these safety refusals so the model can provide complete, accurate compliance guidance including threat analysis and vulnerability assessment.
Compliance Framework Coverage
Trained across eight overlapping frameworks to support cross-framework mapping:
| Framework | Coverage |
|---|---|
| CMMC 2.0 (32 CFR Part 170) | All three levels — 17 L1 practices, 110 L2, 134 L3, assessment methodology |
| NIST SP 800-171 Rev. 2 | 110 security requirements across 14 families |
| NIST SP 800-172 | Enhanced security requirements for critical CUI programs |
| NIST SP 800-53 Rev. 5 | Full catalog of 1,189 controls across 20 families |
| NIST SP 800-37 | Risk Management Framework (RMF) steps and authorization |
| NIST CSF | Identify, Protect, Detect, Respond, Recover functions |
| HIPAA Security Rule | Administrative, physical, and technical safeguards |
| DFARS Clauses | 252.204-7012, 7019, 7020, 7021 — contract-level compliance |
Training Data
13,434 training + 3,472 validation examples (~3.3M tokens) assembled from 5 curated sources:
| Source | Examples | Share |
|---|---|---|
| NIST Cybersecurity (filtered from 424K) | 6,372 | 47.4% |
| CMMC Full | 4,787 | 35.6% |
| CMMC Balanced | 994 | 7.4% |
| HIPAA Compliance | 961 | 7.2% |
| CMMC Core | 320 | 2.4% |
Data processing pipeline:
- Format conversion — Raw text to chat-style instruction/response pairs
- Quality filtering — Removed entries <100 chars, table-heavy fragments, OCR artifacts
- Relevance filtering — NIST data reduced from 424,729 to 72,000 relevant to 7,000 sampled
- Deduplication — Exact dedup (xxhash) + near-dedup (MinHash LSH, Jaccard 0.8)
- Validation split — 80/20 stratified split maintaining source distribution
Training Configuration
| Parameter | Value |
|---|---|
| Epochs | 3 |
| Learning Rate | 5e-5 (cosine decay) |
| Optimizer | 8-bit AdamW |
| Batch Size | 1 (effective 16 with gradient accumulation) |
| Gradient Accumulation | 16 |
| LoRA Rank | 8 |
| LoRA Alpha | 16 |
| LoRA Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Max Sequence Length | 2048 |
| Quantization (Base) | 4-bit NF4 |
| Precision | bf16 |
Evaluation Results
| Metric | Value |
|---|---|
| Final Eval Loss | 1.004 |
| Training Steps | 2,520 |
The 72B model achieved the lowest eval loss across all four model sizes, demonstrating the strongest compliance reasoning capabilities in the suite.
Intended Uses
- Complex Multi-Framework Analysis — Cross-reference controls across CMMC, NIST 800-53, HIPAA, and DFARS simultaneously
- SSP Generation — Draft comprehensive System Security Plan control descriptions with detailed NIST/CMMC citations
- Gap Analysis — Identify controls required for specific CMMC levels with nuanced implementation guidance
- Assessment Prep — Generate detailed evidence checklists and assessment objective narratives
- Cross-Framework Mapping — Map controls between frameworks with full context and rationale
- Policy Drafting — Create thorough policies aligned to specific CMMC practices
- DFARS Clause Analysis — Deep analysis of contract-level compliance requirements
Limitations
- Not a substitute for qualified compliance professionals. This model is a tool to accelerate compliance work, not replace human judgment.
- Knowledge cutoff. The model's knowledge is based on training data available at the time of fine-tuning. Always verify against current published frameworks.
- Hardware requirements. The 72B model requires significant resources (48+ GB VRAM or 64+ GB RAM). For less capable hardware, consider the 7B, 14B, or 32B variants.
- No retrieval augmentation. The model generates responses from trained knowledge only — it does not search or retrieve external documents at inference time.
- Citation accuracy. While the model generally cites correct control numbers and framework sections, always verify specific citations against authoritative sources.
Out-of-Scope Uses
- Legal advice. This model does not provide legal opinions on compliance status.
- Automated compliance certification. CMMC certification requires human assessors (C3PAOs).
- Processing actual CUI/ITAR data. The model itself does not process or store sensitive data, but users should follow their organization's data handling policies.
Hardware Requirements
| Mode | GPU (VRAM) | CPU-Only (RAM) | Storage |
|---|---|---|---|
| Inference | 48 GB | 64 GB | 50 GB |
Supported OS: Linux, macOS, Windows (WSL2)
The Model Suite
This is the 72B model — the most capable option for complex multi-framework compliance analysis. The full suite includes:
| Model | Parameters | GGUF Size | Best For |
|---|---|---|---|
| cmmc-expert-7b | 7.6B | 5.1 GB | Quick lookups, day-to-day queries |
| cmmc-expert-14b | 14.7B | ~10 GB | Detailed analysis, multi-control reasoning |
| cmmc-expert-32b | 32.5B | 18.5 GB | Deep gap assessments, SSP drafting |
| cmmc-expert-72b | 72.7B | 44.2 GB | Complex multi-framework analysis |
Source Code
Full pipeline code, training configuration, and evaluation methodology: github.com/NathanMaine/cmmc-compliance-ai-model
Citation
@misc{maine2025cmmcexpert,
title={CMMC Expert: Fine-Tuned Language Models for Cybersecurity Compliance},
author={Nathan Maine},
year={2025},
url={https://github.com/NathanMaine/cmmc-compliance-ai-model}
}
Contact
- Author: Nathan Maine
- Website: nathanmaine.com
- LinkedIn: linkedin.com/in/nathanmaine
- Email: nmaine@gmail.com
- Downloads last month
- 36
4-bit
Model tree for Nathan-Maine/cmmc-expert-72b
Collection including Nathan-Maine/cmmc-expert-72b
Evaluation results
- Eval Loss (Final)self-reported1.004