CMMC Expert 14B

Notice: These models are provided for proof-of-concept and testing purposes only. Production-grade models are not publicly shared. For inquiries regarding production models or commercial licensing, please contact the maintainer: Nathan Maine.

A locally-hosted, fine-tuned language model specialized in CMMC 2.0, NIST 800-171, NIST 800-53, HIPAA, DFARS, and cybersecurity compliance frameworks.

This is the 14B variant — the sweet spot for detailed compliance analysis on mid-range hardware. Part of a four-model suite (7B, 14B, 32B, 72B) sharing the same compliance knowledge base.

Quick Start (Ollama)

# Download and run
ollama pull Nathan-Maine/cmmc-expert-14b

# Ask a compliance question
ollama run cmmc-expert-14b "Map CMMC AC.L2-3.1.1 to its NIST 800-53 and HIPAA equivalents"

# Or use the OpenAI-compatible API
curl http://localhost:11434/api/generate -d '{
  "model": "cmmc-expert-14b",
  "prompt": "What are the DFARS 252.204-7012 requirements for cyber incident reporting?",
  "stream": false
}'

Model Details

Property Value
Base Model Qwen2.5-14B-Instruct
Parameters 14.7 billion
Fine-Tuning Method QLoRA (4-bit base, LoRA rank 64, alpha 128)
Quantization q5_k_m (GGUF)
File Size ~10 GB
Context Length 32,768 tokens
Inference Speed ~3-5 seconds per response
Training Hardware NVIDIA A100 80GB SXM (RunPod)
Training Time ~6 hours
Training Framework Unsloth + HuggingFace TRL + PEFT

Why 14B?

The 14B model offers significantly deeper reasoning than the 7B while remaining deployable on a single 12 GB GPU. It excels at multi-control analysis, cross-framework mapping with detailed justifications, and generating nuanced compliance narratives that cite multiple framework sections simultaneously.

Security Domain Coverage

Models are fine-tuned for complete security domain coverage, including vulnerability analysis, incident response scenarios, and access control failure modes required for professional SSP and POA&M generation. Behavioral guardrails and policy enforcement are handled at the governed-llm-gateway layer.

Base model migration to Meta Llama 3.1/3.3 (US-origin, open weights) is in progress.

Compliance Framework Coverage

Trained across eight overlapping frameworks to support cross-framework mapping:

Framework Coverage
CMMC 2.0 (32 CFR Part 170) All three levels — 17 L1 practices, 110 L2, 134 L3, assessment methodology
NIST SP 800-171 Rev. 2 110 security requirements across 14 families
NIST SP 800-172 Enhanced security requirements for critical CUI programs
NIST SP 800-53 Rev. 5 Full catalog of 1,189 controls across 20 families
NIST SP 800-37 Risk Management Framework (RMF) steps and authorization
NIST CSF Identify, Protect, Detect, Respond, Recover functions
HIPAA Security Rule Administrative, physical, and technical safeguards
DFARS Clauses 252.204-7012, 7019, 7020, 7021 — contract-level compliance

Training Data

13,434 training + 3,472 validation examples (~3.3M tokens) assembled from 5 curated sources:

Source Examples Share
NIST Cybersecurity (filtered from 424K) 6,372 47.4%
CMMC Full 4,787 35.6%
CMMC Balanced 994 7.4%
HIPAA Compliance 961 7.2%
CMMC Core 320 2.4%

Data processing pipeline:

  1. Format conversion — Raw text → chat-style instruction/response pairs
  2. Quality filtering — Removed entries <100 chars, table-heavy fragments, OCR artifacts
  3. Relevance filtering — NIST data reduced from 424,729 → 72,000 relevant → 7,000 sampled
  4. Deduplication — Exact dedup (xxhash) + near-dedup (MinHash LSH, Jaccard 0.8)
  5. Validation split — 80/20 stratified split maintaining source distribution

Training Configuration

Parameter Value
Epochs 3
Learning Rate 2e-4 (cosine decay)
Optimizer 8-bit AdamW
Batch Size 4 (effective 16 with gradient accumulation)
LoRA Rank 64
LoRA Alpha 128
LoRA Target q_proj, k_proj, v_proj, o_proj
Max Sequence Length 2048
Quantization (Base) 4-bit NF4

Intended Uses

  • Detailed Analysis — Multi-control reasoning with citations across multiple frameworks
  • SSP Generation — Draft System Security Plan control descriptions with NIST/CMMC citations
  • Gap Analysis — Identify controls required for specific CMMC levels and contract requirements
  • Cross-Framework Mapping — Map controls between CMMC, NIST 800-53, HIPAA, and DFARS with explanations
  • Assessment Prep — Generate evidence checklists and assessment objective narratives
  • Policy Drafting — Create policies aligned to specific CMMC practices
  • DFARS Clause Analysis — Identify requirements from contract language

Limitations

  • Not a substitute for qualified compliance professionals. This model is a tool to accelerate compliance work, not replace human judgment.
  • Knowledge cutoff. The model's knowledge is based on training data available at the time of fine-tuning. Always verify against current published frameworks.
  • No retrieval augmentation. The model generates responses from trained knowledge only — it does not search or retrieve external documents at inference time.
  • Citation accuracy. While the model generally cites correct control numbers and framework sections, always verify specific citations against authoritative sources.

Hardware Requirements

Mode GPU (VRAM) CPU-Only (RAM) Storage
Inference 12 GB 24 GB 15 GB
Training 40+ GB N/A 50 GB

Supported OS: Linux, macOS, Windows (WSL2)

The Model Suite

This is the 14B model — balanced performance for detailed compliance work. The full suite includes:

Model Parameters GGUF Size Best For
cmmc-expert-7b 7.6B 5.1 GB Quick lookups, day-to-day queries
cmmc-expert-14b 14.7B ~10 GB Detailed analysis, multi-control reasoning
cmmc-expert-32b 32.5B ~19 GB Deep gap assessments, SSP drafting
cmmc-expert-72b 72.7B ~42 GB Complex multi-framework analysis

Source Code

Full pipeline code, training configuration, and evaluation methodology: github.com/NathanMaine/cmmc-compliance-ai-model

Known Issues

  • Superseded by v2.0 — This version targets only 4 of 7 transformer modules and was trained on a smaller dataset (13,434 examples). v2.0 improves on both fronts with expanded LoRA coverage and 40% more training data. Use v2.0 unless you have a specific reason to use v1.0.
  • Limited cross-framework mapping — May struggle with nuanced mappings between overlapping frameworks (e.g., NIST 800-171 to CMMC practice IDs) compared to later versions.

Citation

@misc{maine2025cmmcexpert,
  title={CMMC Expert: Fine-Tuned Language Models for Cybersecurity Compliance},
  author={Nathan Maine},
  year={2025},
  url={https://github.com/NathanMaine/cmmc-compliance-ai-model}
}

Contact

Downloads last month
116
GGUF
Model size
15B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nathan-Maine/cmmc-expert-14b

Base model

Qwen/Qwen2.5-14B
Quantized
(134)
this model

Collection including Nathan-Maine/cmmc-expert-14b

Evaluation results