# HackIDLE-NIST-Coder (GGUF) A specialized cybersecurity LLM fine-tuned on 568 NIST publications, optimized for Ollama and llama.cpp. ## Model Details **Base Model:** Qwen2.5-Coder-7B-Instruct **Fine-tuning:** LoRA (11.5M parameters, 0.151% of base) **Training Data:** 568 NIST cybersecurity documents (523,706 examples) **Context Length:** 32,768 tokens **License:** Apache 2.0 ## Quantization Variants | File | Size | Use Case | Perplexity | |------|------|----------|------------| | `hackidle-nist-coder-f16.gguf` | 14GB | Reference/source | Baseline | | `hackidle-nist-coder-q8_0.gguf` | 7.5GB | Highest quality | ~0.1% loss | | `hackidle-nist-coder-q5_k_m.gguf` | 5.1GB | High quality | ~0.5% loss | | **`hackidle-nist-coder-q4_k_m.gguf`** | **4.4GB** | **Recommended** | ~1% loss | ## Usage ### With Ollama Download and run: ```bash ollama run ethanolivertroy/hackidle-nist-coder ``` Or create from this repo: ```bash # Download GGUF wget https://huggingface.co/ethanolivertroy/HackIDLE-NIST-Coder-GGUF/resolve/main/hackidle-nist-coder-q4_k_m.gguf # Create Modelfile cat > Modelfile << 'EOF' FROM ./hackidle-nist-coder-q4_k_m.gguf SYSTEM """You are HackIDLE-NIST-Coder, a cybersecurity expert with deep knowledge of NIST standards, frameworks, and best practices.""" PARAMETER temperature 0.7 PARAMETER num_ctx 32768 EOF # Create model ollama create hackidle-nist-coder -f Modelfile ``` ### With llama.cpp ```bash # Download GGUF wget https://huggingface.co/ethanolivertroy/HackIDLE-NIST-Coder-GGUF/resolve/main/hackidle-nist-coder-q4_k_m.gguf # Run inference ./llama-cli -m hackidle-nist-coder-q4_k_m.gguf \ -p "What is Zero Trust Architecture according to NIST?" \ -n 200 \ --temp 0.7 ``` ### With LM Studio 1. Search for "hackidle-nist-coder" in LM Studio 2. Download Q4_K_M variant 3. Start chatting! Or use the [MLX version](https://huggingface.co/ethanolivertroy/HackIDLE-NIST-Coder-MLX-4bit) for native Apple Silicon support. ## Expertise Areas - NIST Cybersecurity Framework (CSF) - Risk Management Framework (RMF) - SP 800 series security controls (AC, AU, CA, CM, CP, IA, IR, MA, MP, PE, PL, PS, RA, SA, SC, SI, SR) - FIPS cryptographic standards - Zero Trust Architecture (SP 800-207) - Cloud security (SP 800-210, SP 800-144) - Supply chain risk management (SP 800-161) - Privacy Framework ## Example Queries ``` "What is Zero Trust Architecture according to NIST SP 800-207?" "Explain control AC-1 from NIST SP 800-53." "What are the core components of the NIST Cybersecurity Framework?" "How does NIST recommend implementing secure cloud architecture?" "What is the Risk Management Framework process?" ``` ## Training Details **Dataset:** [`ethanolivertroy/nist-cybersecurity-training`](https://huggingface.co/datasets/ethanolivertroy/nist-cybersecurity-training) - 523,706 training examples - 568 source documents - Smart chunking with sentence boundaries - 5 extraction strategies: sections, controls, definitions, tables, semantic chunks **Fine-tuning:** - Method: LoRA with MLX (Apple Silicon) - Training time: 3.5 hours on M4 Max - Iterations: 1000 - Validation loss improvement: 45% - Base model: Qwen2.5-Coder-7B-Instruct-4bit ## Performance **Ollama (M4 Max, Q4_K_M):** - Inference: 80-100 tokens/sec - Memory: ~6GB - Prompt processing: 50-100 tokens/sec **llama.cpp (M4 Max, Q4_K_M):** - Inference: 70-90 tokens/sec - Memory: ~5GB ## Related Models - **MLX Format:** [`ethanolivertroy/HackIDLE-NIST-Coder-MLX-4bit`](https://huggingface.co/ethanolivertroy/HackIDLE-NIST-Coder-MLX-4bit) - **LM Studio:** [`ethanolivertroy/hackidle-nist-coder`](https://lmstudio.ai/ethanolivertroy/hackidle-nist-coder) - **Ollama Library:** `ethanolivertroy/hackidle-nist-coder` (coming soon) ## Citation If you use this model in your research or applications, please cite: ```bibtex @software{hackidle_nist_coder, author = {Ethan Oliver Troy}, title = {HackIDLE-NIST-Coder: A Fine-Tuned LLM for NIST Cybersecurity Standards}, year = {2025}, url = {https://huggingface.co/ethanolivertroy/HackIDLE-NIST-Coder-GGUF} } ``` ## License This model is released under the Apache 2.0 license. NIST publications are in the public domain. ## Acknowledgments - **NIST** for publishing comprehensive cybersecurity guidance - **Qwen Team** for the exceptional Qwen2.5-Coder base model - **llama.cpp** team for GGUF format and quantization - **Ollama** for making local LLM deployment accessible