File size: 4,448 Bytes
bc51da8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
# HackIDLE-NIST-Coder (GGUF)
A specialized cybersecurity LLM fine-tuned on 568 NIST publications, optimized for Ollama and llama.cpp.
## Model Details
**Base Model:** Qwen2.5-Coder-7B-Instruct
**Fine-tuning:** LoRA (11.5M parameters, 0.151% of base)
**Training Data:** 568 NIST cybersecurity documents (523,706 examples)
**Context Length:** 32,768 tokens
**License:** Apache 2.0
## Quantization Variants
| File | Size | Use Case | Perplexity |
|------|------|----------|------------|
| `hackidle-nist-coder-f16.gguf` | 14GB | Reference/source | Baseline |
| `hackidle-nist-coder-q8_0.gguf` | 7.5GB | Highest quality | ~0.1% loss |
| `hackidle-nist-coder-q5_k_m.gguf` | 5.1GB | High quality | ~0.5% loss |
| **`hackidle-nist-coder-q4_k_m.gguf`** | **4.4GB** | **Recommended** | ~1% loss |
## Usage
### With Ollama
Download and run:
```bash
ollama run ethanolivertroy/hackidle-nist-coder
```
Or create from this repo:
```bash
# Download GGUF
wget https://huggingface.co/ethanolivertroy/HackIDLE-NIST-Coder-GGUF/resolve/main/hackidle-nist-coder-q4_k_m.gguf
# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./hackidle-nist-coder-q4_k_m.gguf
SYSTEM """You are HackIDLE-NIST-Coder, a cybersecurity expert with deep knowledge of NIST standards, frameworks, and best practices."""
PARAMETER temperature 0.7
PARAMETER num_ctx 32768
EOF
# Create model
ollama create hackidle-nist-coder -f Modelfile
```
### With llama.cpp
```bash
# Download GGUF
wget https://huggingface.co/ethanolivertroy/HackIDLE-NIST-Coder-GGUF/resolve/main/hackidle-nist-coder-q4_k_m.gguf
# Run inference
./llama-cli -m hackidle-nist-coder-q4_k_m.gguf \
-p "What is Zero Trust Architecture according to NIST?" \
-n 200 \
--temp 0.7
```
### With LM Studio
1. Search for "hackidle-nist-coder" in LM Studio
2. Download Q4_K_M variant
3. Start chatting!
Or use the [MLX version](https://huggingface.co/ethanolivertroy/HackIDLE-NIST-Coder-MLX-4bit) for native Apple Silicon support.
## Expertise Areas
- NIST Cybersecurity Framework (CSF)
- Risk Management Framework (RMF)
- SP 800 series security controls (AC, AU, CA, CM, CP, IA, IR, MA, MP, PE, PL, PS, RA, SA, SC, SI, SR)
- FIPS cryptographic standards
- Zero Trust Architecture (SP 800-207)
- Cloud security (SP 800-210, SP 800-144)
- Supply chain risk management (SP 800-161)
- Privacy Framework
## Example Queries
```
"What is Zero Trust Architecture according to NIST SP 800-207?"
"Explain control AC-1 from NIST SP 800-53."
"What are the core components of the NIST Cybersecurity Framework?"
"How does NIST recommend implementing secure cloud architecture?"
"What is the Risk Management Framework process?"
```
## Training Details
**Dataset:** [`ethanolivertroy/nist-cybersecurity-training`](https://huggingface.co/datasets/ethanolivertroy/nist-cybersecurity-training)
- 523,706 training examples
- 568 source documents
- Smart chunking with sentence boundaries
- 5 extraction strategies: sections, controls, definitions, tables, semantic chunks
**Fine-tuning:**
- Method: LoRA with MLX (Apple Silicon)
- Training time: 3.5 hours on M4 Max
- Iterations: 1000
- Validation loss improvement: 45%
- Base model: Qwen2.5-Coder-7B-Instruct-4bit
## Performance
**Ollama (M4 Max, Q4_K_M):**
- Inference: 80-100 tokens/sec
- Memory: ~6GB
- Prompt processing: 50-100 tokens/sec
**llama.cpp (M4 Max, Q4_K_M):**
- Inference: 70-90 tokens/sec
- Memory: ~5GB
## Related Models
- **MLX Format:** [`ethanolivertroy/HackIDLE-NIST-Coder-MLX-4bit`](https://huggingface.co/ethanolivertroy/HackIDLE-NIST-Coder-MLX-4bit)
- **LM Studio:** [`ethanolivertroy/hackidle-nist-coder`](https://lmstudio.ai/ethanolivertroy/hackidle-nist-coder)
- **Ollama Library:** `ethanolivertroy/hackidle-nist-coder` (coming soon)
## Citation
If you use this model in your research or applications, please cite:
```bibtex
@software{hackidle_nist_coder,
author = {Ethan Oliver Troy},
title = {HackIDLE-NIST-Coder: A Fine-Tuned LLM for NIST Cybersecurity Standards},
year = {2025},
url = {https://huggingface.co/ethanolivertroy/HackIDLE-NIST-Coder-GGUF}
}
```
## License
This model is released under the Apache 2.0 license. NIST publications are in the public domain.
## Acknowledgments
- **NIST** for publishing comprehensive cybersecurity guidance
- **Qwen Team** for the exceptional Qwen2.5-Coder base model
- **llama.cpp** team for GGUF format and quantization
- **Ollama** for making local LLM deployment accessible
|