--- license: mit base_model: unsloth/Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit tags: - cybersecurity - mitre-attack - honeypot - log-analysis - llama - lora - security - threat-detection language: - en datasets: - custom library_name: transformers pipeline_tag: text-generation --- # LLM-Enhanced Honeypot Log Analysis Model ## Model Description This model is a fine-tuned version of Llama 3.1 8B Instruct, specialized for analyzing honeypot logs and generating MITRE ATT&CK framework annotations. It was developed as part of a research project at Queen's University Belfast investigating automated security log analysis using Large Language Models. ## Key Features - **MITRE ATT&CK Annotation**: Automatically generates structured annotations for security events - **Honeypot Log Analysis**: Specialized in analyzing Unix terminal logs from honeypot systems - **LoRA Fine-tuning**: Uses Low-Rank Adaptation for efficient parameter updates - **Research-Grade**: Developed for academic research in cybersecurity and AI ## Model Details ### Base Model - **Base Model**: unsloth/Meta-Llama-3.1-8B-Instruct-unsloth-bnb-4bit - **Model Size**: 8B parameters - **Architecture**: Llama 3.1 with Instruct tuning - **Quantization**: 4-bit quantization for efficiency ### Fine-tuning Details - **Method**: LoRA (Low-Rank Adaptation) - **LoRA Rank**: 32 - **LoRA Alpha**: 32 - **LoRA Dropout**: 0 - **Learning Rate**: 0.00012 - **Batch Size**: 2 - **Gradient Accumulation**: 4 - **Max Steps**: 100 - **Optimizer**: adamw_8bit ## Training Data The model was trained on a curated dataset of honeypot logs with human-annotated MITRE ATT&CK framework labels. The training data includes: - Unix terminal command logs from honeypot systems - Structured annotations for 6 key MITRE ATT&CK fields - Balanced representation of different attack tactics and techniques ## Usage ### Installation ```bash pip install transformers torch unsloth ``` ### Loading the Model ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name="your-username/model-name", max_seq_length=2048, dtype=None, load_in_4bit=True, ) ``` ### Inference ```python # Enable inference mode FastLanguageModel.for_inference(model) # Format your input prompt = '''Below is a Unix terminal command log from a honeypot system. Please analyze it and provide MITRE ATT&CK framework annotations. Command: {command} Timestamp: {timestamp} Source IP: {source_ip} Please provide: 1. Tactic 2. Technique 3. Sub-technique 4. Description' inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7) response = tokenizer.decode(outputs[0], skip_special_tokens=True) ``` ## Evaluation The model has been evaluated on multiple metrics: - **Overall MITRE Accuracy**: Novel composite metric combining all 6 MITRE ATT&CK field accuracies - **Confusion Matrix Analysis**: Visual analysis of tactics classification performance - **Field-level Accuracy**: Individual accuracy for each MITRE ATT&CK field - **Human Evaluation**: Expert validation of generated annotations ## Limitations - Specialized for honeypot log analysis - may not generalize to other security contexts - Requires structured input format for optimal performance - Training data limited to specific honeypot configurations - May exhibit biases present in training data ## Ethical Considerations This model is designed for defensive cybersecurity research and should be used responsibly: - Intended for legitimate security research and defense applications - Should not be used for malicious purposes or unauthorized access - Users should validate outputs before making security decisions - Consider privacy implications when analyzing logs ## Citation If you use this model in your research, please cite: ```bibtex @misc{llm_honeypot_analysis_2025, title={LLM-Enhanced Honeypot Log Analysis System}, author={[Student Name]}, year={2025}, institution={Queen's University Belfast}, course={CSC4003 - Research Project}, url={https://gitlab.eeecs.qub.ac.uk/[student-id]/CSC4003} } ``` ## License This model is released under the MIT License. See the LICENSE file for details. ## Contact For questions or issues: - Repository: https://gitlab.eeecs.qub.ac.uk/40285272/CSC4006 - Institution: Queen's University Belfast - Course: CSC4006 - Research Project ## Acknowledgments - Built using the Unsloth library for efficient training - Based on Meta's Llama 3.1 model - Developed as part of cybersecurity research at Queen's University Belfast