--- language: - it - en license: other license_name: gemma-terms-of-use-and-mitre-attack license_link: https://ai.google.dev/gemma/terms base_model: google/gemma-3-1b-it tags: - cybersecurity - network-security - intrusion-detection - mitre-attack - threat-intelligence - conversational - gemma3_text pipeline_tag: text-generation datasets: - CIC-IDS2017 - UNSW-NB15 library_name: transformers model-index: - name: traffico results: [] --- # Traffico - Fine-tuned on ATT&CK Data ![Alt text](https://cas-bridge.xethub.hf.co/xet-bridge-us/69a7e4cc4685d39e29c58a6c/0c289b81ef240bcae3c892b79bcc32025c9bc93dcff0a9d9677ddeb62fd04602?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20260412%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20260412T230801Z&X-Amz-Expires=3600&X-Amz-Signature=588dad433300bbf57b055e9382784988cce240d3f218057e51f6dcdefaf63a3b&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=65738c5bc79162da909bf2ce&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27hypnonyx_traffico.png%3B+filename%3D%22hypnonyx_traffico.png%22%3B&response-content-type=image%2Fpng&x-amz-checksum-mode=ENABLED&x-id=GetObject&Expires=1776038881&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc3NjAzODg4MX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82OWE3ZTRjYzQ2ODVkMzllMjljNThhNmMvMGMyODliODFlZjI0MGJjYWUzYzg5MmI3OWJjYzMyMDI1YzliYzkzZGNmZjBhOWQ5Njc3ZGRlYjYyZmQwNDYwMioifV19&Signature=pe%7E8m29cDZVmoi2VfpvtQsCukR2Qi-ZM%7Ec4QgwXQKSq3B5%7E-4SWG5QAGHDt-9wUTRHZqxZXRcqXUrzqgcjE52Azk5VcN8iMTlBRR4j%7E449e-dYtoNkq%7EhJDYhJN00iXjGlZYBfMlfmjzpvkLU%7EwgMYmEn-5xhgM%7E0AOzUP0BDmz6wS6OxyIEvIzwuZjtC9udbMpNpgH2LsJfNqFZKIZMjuhXusBtVgVTrmbukmbmAIX0zoChDSXEVivGcAFrSmgTU4%7EPrTwJx1f5N3eqvMMn1S5u-DD0-MPHNO6xR6DWEScHPkywJu9vXoAjJ-u%7Et-g4pZ4YHExJEFLF5M8yt3JmKQ__&Key-Pair-Id=K2L8F4GPSG1IFC "hypnonyx_traffico") ## ๐Ÿ“‹ Model Description Traffico is a fine-tuned language model specialized in analyzing TCP/IP network traffic and detecting cyberattacks. It maps network flow patterns to the MITRE ATT&CK framework, enabling security teams to understand adversary tactics and techniques from network behavior alone. The model is trained on synthetic datasets derived from real-world network traffic (CIC-IDS2017 + UNSW-NB15) and enriched with MITRE ATT&CK techniques. It can classify network flows as normal or malicious and provide ATT&CK-mapped threat classifications. **Base Model**: Google Gemma 2.7B **Training Data**: Synthetic dataset derived from ATT&CKยฎ techniques, tactics, and procedures (TTPs) **Fine-tuning Approach**: Supervised Fine-Tuning (SFT) using Unsloth for optimization and TRL's SFTTrainer ## ๐ŸŽฏ Use Cases - **Network Intrusion Detection**: Classify network flows as benign or malicious in real-time - **Threat Intelligence**: Map detected attacks to MITRE ATT&CK techniques and tactics - **Security Monitoring**: Analyze TCP/IP flows from network sensors and IDS systems - **Incident Response**: Understand adversary behavior patterns from network telemetry - **Research**: Study attack-to-technique mappings in security datasets ## ๐Ÿš€ Quick Start ### Installation ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("hypnonyx/Traffico") model = AutoModelForCausalLM.from_pretrained("hypnonyx/Traffico") ``` ### Basic Usage ```python # Analizza un flusso di traffico di rete network_flow = "Protocollo: tcp | Porta dst: 80 | Byte src: 480000 | Byte dst: 40 | Pacchetti: 5200 | Durata: 0.015s" messages = [ { "role": "system", "content": "Analizza il seguente flusso di traffico di rete TCP/IP. Classifica se รจ traffico normale o un attacco e indica la tecnica MITRE ATT&CK corrispondente." }, {"role": "user", "content": network_flow}, ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) inputs = tokenizer(text, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.3) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` **Expected Output**: Classification of the network flow (e.g., "DoS Attack - MITRE ATT&CK: Impact/Denial of Service") ## ๐Ÿ“Š Training Details | Property | Value | |----------|-------| | Base Model | Google Gemma 2.7B | | Training Framework | Unsloth + TRL SFTTrainer | | Training Dataset | Synthetic ATT&CK-derived dataset | | Dataset Size | 10,000 examples | | Techniques Covered | Network traffic analysis (CIC-IDS2017 + UNSW-NB15) | | Training Duration | ~1 hour | | Hardware | 1x NVIDIA RTX 4090 GPU | | Learning Rate | 2e-5 | | Batch Size | 16 (4 per device + 4 gradient accumulation steps) | | LoRA Rank | 64 | | Max Sequence Length | 512 tokens | | Training Steps | 500 steps | ## ๐Ÿ“ Dataset Information The training dataset was created synthetically using data derived from the MITRE ATT&CK framework and network traffic analysis datasets (CIC-IDS2017 + UNSW-NB15). It includes: - **Network Traffic Features**: Protocol type, destination port, source/destination bytes, packet count, flow duration - **Attack Classification**: Binary and multi-class classification of normal vs. malicious traffic - **MITRE ATT&CK Mapping**: Techniques mapped to network-based attacks: - **Reconnaissance**: Port scanning, network sniffing - **Initial Access**: Brute force attacks on SSH, FTP, Telnet - **Lateral Movement**: Data exfiltration, command & control traffic - **Impact**: DoS/DDoS attacks, data theft - **Attack Types Covered**: DoS, DDoS, PortScan, Brute Force, Infiltration, Botnet, Web attacks - **Dataset Split**: 10,000 labeled examples for instruction-tuning The synthetic data was processed to create instruction-following examples where the model learns to analyze network flows and map them to MITRE ATT&CK techniques and tactics. ## โš ๏ธ Limitations and Disclaimers - **Not Exhaustive**: This model, like the underlying ATT&CK framework, does not enumerate all possible adversary behaviors. There may be undisclosed or novel techniques not covered. - **Research Use**: While commercial use is permitted under the ATT&CK license, this model should be validated against your specific security requirements. - **No Guarantee of Coverage**: Using this model to address or cover categories of techniques will not guarantee comprehensive defensive coverage. - **As-Is**: This model is provided "as is" without any warranties or guarantees regarding accuracy, completeness, or fitness for a particular purpose. ## ๐Ÿ“œ License This model is based on **Google Gemma 2.7B** and incorporates data from the **MITRE ATT&CK framework**. Both licenses must be respected. ### Gemma License This model is built upon Google's Gemma model, which is governed by the **Gemma Terms of Use**. **Key Requirements:** - This model can be used for research and commercial purposes - You must comply with Google's Gemma Terms of Use - You must ensure downstream usage complies with Gemma restrictions - You acknowledge and accept Gemma's usage policies and any applicable restrictions For full details, see: https://ai.google.dev/gemma/terms ### ATT&CK License Terms ยฉ 2025 The MITRE Corporation. This work is reproduced and distributed with the permission of The MITRE Corporation. The MITRE Corporation hereby grants you a non-exclusive, royalty-free license to use this model for research, development, and commercial purposes. **Full License Text:** ``` LICENSE The MITRE Corporation (MITRE) hereby grants you a non-exclusive, royalty-free license to use ATT&CKยฎ for research, development, and commercial purposes. Any copy you make for such purposes is authorized provided that you reproduce MITRE's copyright designation and this license in any such copy. "ยฉ 2025 The MITRE Corporation. This work is reproduced and distributed with the permission of The MITRE Corporation." DISCLAIMERS MITRE does not claim ATT&CK enumerates all possibilities for the types of actions and behaviors documented as part of its adversary model and framework of techniques. Using the information contained within ATT&CK to address or cover full categories of techniques will not guarantee full defensive coverage as there may be undisclosed techniques or variations on existing techniques not documented by ATT&CK. ALL DOCUMENTS AND THE INFORMATION CONTAINED THEREIN ARE PROVIDED ON AN "AS IS" BASIS AND THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE MITRE CORPORATION, ITS BOARD OF TRUSTEES, OFFICERS, AGENTS, AND EMPLOYEES, DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. ``` ### Model Modifications This derivative work combines: 1. **Google's Gemma 2.7B** - the base language model 2. **MITRE ATT&CK** - the training dataset and knowledge domain The model is fine-tuned on synthetic ATT&CK-derived data to specialize in threat intelligence and adversary behavior understanding. Any further use, distribution, or modification must maintain attribution and comply with both Google's Gemma Terms of Use and the MITRE ATT&CK license. ## ๐Ÿ”— References - **Google Gemma**: https://ai.google.dev/gemma/ - **Gemma Terms of Use**: https://ai.google.dev/gemma/terms - **MITRE ATT&CK**: https://attack.mitre.org/ - **ATT&CK Documentation**: https://attack.mitre.org/docs/ ## ๐Ÿ‘ค Author & Contact **Mirko P.** ๐Ÿค— Hugging Face: [@hypnonyx](https://huggingface.co/hypnonyx) ## ๐Ÿ™ Attribution This model was created using the MITRE ATT&CK framework. We are grateful to The MITRE Corporation for making this valuable resource available to the research and security communities. --- **Last Updated**: March 4, 2025 **Model Version**: 1.0