Update README.md
Browse files
README.md
CHANGED
|
@@ -13,45 +13,102 @@ base_model:
|
|
| 13 |
- Qwen/Qwen2.5-0.5B
|
| 14 |
---
|
| 15 |
|
| 16 |
-
#
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
-
|
| 23 |
-
* **Model type:** Pre-trained / Fine-tuned Transformer
|
| 24 |
-
* **Language(s):** English
|
| 25 |
-
* **License:** Apache 2.0 (or specify your license)
|
| 26 |
-
* **Repository:** [zeltera/mcma](https://huggingface.co/zeltera/mcma)
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
| 29 |
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
|
|
|
| 35 |
|
| 36 |
-
|
| 37 |
-
* The model may output biased or inaccurate information.
|
| 38 |
-
* Performance depends on the quality of the input prompts.
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
-
|
| 43 |
|
| 44 |
```python
|
| 45 |
-
from transformers import
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
-
|
| 53 |
-
input_text = "Once upon a time"
|
| 54 |
-
inputs = tokenizer(input_text, return_tensors="pt")
|
| 55 |
-
outputs = model.generate(**inputs, max_length=50)
|
| 56 |
|
| 57 |
-
|
|
|
|
| 13 |
- Qwen/Qwen2.5-0.5B
|
| 14 |
---
|
| 15 |
|
| 16 |
+
# 🛡️ MCMA — Malware Cybersecurity Malware Analyzer
|
| 17 |
|
| 18 |
+
**Model:** `zeltera/mcma`
|
| 19 |
+
**Task:** Static malware analysis and interpretation
|
| 20 |
+
**Domain:** Cybersecurity & Threat Intelligence
|
| 21 |
+
**Hosted on:** Hugging Face Models
|
| 22 |
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## 🧠 Model Overview
|
| 26 |
+
|
| 27 |
+
**MCMA** (Malware Cybersecurity Malware Analyzer) is a custom fine-tuned language model built to analyze malware artifacts and descriptions. It was trained using parameter-efficient fine-tuning (LoRA) on top of a Qwen2.5 base model using curated cybersecurity instruction data. Its outputs are **structured JSON**, including reasoning, indicators, confidence, recommendations, and mapped MITRE ATT&CK techniques.
|
| 28 |
|
| 29 |
+
MCMA is optimized for **static analysis scenarios**—it interprets textual malware features, permissions, string indicators, and other static traits to produce analyst-friendly assessments.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## 🎯 Intended Use Cases
|
| 34 |
|
| 35 |
+
MCMA is useful for:
|
| 36 |
+
- Analyzing static malware artifacts (e.g., APK or PE strings/permissions)
|
| 37 |
+
- Extracting structured threat intelligence
|
| 38 |
+
- Mapping behaviors to MITRE ATT&CK
|
| 39 |
+
- Integrating into analysis pipelines (SIEM, CTI platforms)
|
| 40 |
+
- Supporting SOC analysts with natural language reasoning
|
| 41 |
|
| 42 |
+
> ⚠️ **Important:** This model does *not* execute binaries or perform dynamic analysis.
|
|
|
|
|
|
|
| 43 |
|
| 44 |
+
---
|
| 45 |
|
| 46 |
+
## 🧪 Example Usage (Python)
|
| 47 |
|
| 48 |
```python
|
| 49 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 50 |
+
import torch
|
| 51 |
+
|
| 52 |
+
model_id = "zeltera/mcma"
|
| 53 |
+
|
| 54 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id, local_files_only=True)
|
| 55 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 56 |
+
model_id,
|
| 57 |
+
device_map="auto",
|
| 58 |
+
dtype=torch.float16
|
| 59 |
+
)
|
| 60 |
+
|
| 61 |
+
prompt = """
|
| 62 |
+
You are a cybersecurity malware analysis assistant.
|
| 63 |
+
Respond ONLY in valid JSON with these keys:
|
| 64 |
+
- reasoning
|
| 65 |
+
- indicators
|
| 66 |
+
- confidence
|
| 67 |
+
- recommendation
|
| 68 |
+
- mitre_attack
|
| 69 |
+
|
| 70 |
+
Input:
|
| 71 |
+
APK requests READ_SMS and communicates with api.telegram.org
|
| 72 |
+
"""
|
| 73 |
+
|
| 74 |
+
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
| 75 |
+
|
| 76 |
+
outputs = model.generate(**inputs, max_new_tokens=300)
|
| 77 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 78 |
+
|
| 79 |
+
📦 Model Details
|
| 80 |
+
|
| 81 |
+
Architecture: Qwen2.5-based
|
| 82 |
+
Fine-tuning: LoRA on cybersecurity datasets
|
| 83 |
+
Output: Structured JSON
|
| 84 |
+
Usage: Python / Transformers
|
| 85 |
+
|
| 86 |
+
⚠️ Limitations
|
| 87 |
+
|
| 88 |
+
Designed for static analysis only
|
| 89 |
+
|
| 90 |
+
Outputs should be reviewed by trained analysts
|
| 91 |
+
|
| 92 |
+
Confidence scores are heuristic, not absolute
|
| 93 |
+
|
| 94 |
+
Not a sandbox, emulator, or malware execution platform
|
| 95 |
+
|
| 96 |
+
🧪 Safety & Ethical Use
|
| 97 |
+
|
| 98 |
+
MCMA is intended for defensive cybersecurity use, including malware forensic and threat analysis. It must not be used to assist in creating malware or harmful software. Users should operate within legal and ethical frameworks relevant to their jurisdiction.
|
| 99 |
+
|
| 100 |
+
📚 Citation
|
| 101 |
+
|
| 102 |
+
If you use this model in research or production:
|
| 103 |
|
| 104 |
+
@misc{mcma2025,
|
| 105 |
+
title={MCMA — Malware Cybersecurity Malware Analyzer LLM},
|
| 106 |
+
author={Zeltera},
|
| 107 |
+
year={2025},
|
| 108 |
+
howpublished={Hugging Face Model},
|
| 109 |
+
note={https://huggingface.co/zeltera/mcma}
|
| 110 |
+
}
|
| 111 |
|
| 112 |
+
🧠 About
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
+
MCMA was developed to bridge language modeling with cybersecurity domain expertise. It combines transformer-based reasoning with structured static malware feature interpretation to assist analysts in real-world threat environments.
|