zeltera
/

SMITH

@@ -13,33 +13,36 @@ base_model:
 - Qwen/Qwen2.5-0.5B
 ---
-# 🛡️ MCMA — Malware Cybersecurity Malware Analyzer
-**Model:** `zeltera/mcma`
-**Task:** Static malware analysis and interpretation
-**Domain:** Cybersecurity & Threat Intelligence
-**Hosted on:** Hugging Face Models
 ---
-## 🧠 Model Overview
-**MCMA** (Malware Cybersecurity Malware Analyzer) is a custom fine-tuned language model built to analyze malware artifacts and descriptions. It was trained using parameter-efficient fine-tuning (LoRA) on top of a Qwen2.5 base model using curated cybersecurity instruction data. Its outputs are **structured JSON**, including reasoning, indicators, confidence, recommendations, and mapped MITRE ATT&CK techniques.
-MCMA is optimized for **static analysis scenarios**—it interprets textual malware features, permissions, string indicators, and other static traits to produce analyst-friendly assessments.
 ---
 ## 🎯 Intended Use Cases
-MCMA is useful for:
-- Analyzing static malware artifacts (e.g., APK or PE strings/permissions)
-- Extracting structured threat intelligence
-- Mapping behaviors to MITRE ATT&CK
-- Integrating into analysis pipelines (SIEM, CTI platforms)
-- Supporting SOC analysts with natural language reasoning
-> ⚠️ **Important:** This model does *not* execute binaries or perform dynamic analysis.
 ---
@@ -49,9 +52,9 @@ MCMA is useful for:
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
-model_id = "zeltera/mcma"
-tokenizer = AutoTokenizer.from_pretrained(model_id, local_files_only=True)
 model = AutoModelForCausalLM.from_pretrained(
     model_id,
     device_map="auto",
@@ -60,7 +63,7 @@ model = AutoModelForCausalLM.from_pretrained(
 prompt = """
 You are a cybersecurity malware analysis assistant.
-Respond ONLY in valid JSON with these keys:
 - reasoning
 - indicators
 - confidence
@@ -72,43 +75,42 @@ APK requests READ_SMS and communicates with api.telegram.org
 """
 inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=300)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-📦 Model Details
-Architecture: Qwen2.5-based
-Fine-tuning: LoRA on cybersecurity datasets
-Output: Structured JSON
-Usage: Python / Transformers
 ⚠️ Limitations
-Designed for static analysis only
-Outputs should be reviewed by trained analysts
-Confidence scores are heuristic, not absolute
-Not a sandbox, emulator, or malware execution platform
-🧪 Safety & Ethical Use
-MCMA is intended for defensive cybersecurity use, including malware forensic and threat analysis. It must not be used to assist in creating malware or harmful software. Users should operate within legal and ethical frameworks relevant to their jurisdiction.
-📚 Citation
-If you use this model in research or production:
-@misc{mcma2025,
-  title={MCMA — Malware Cybersecurity Malware Analyzer LLM},
   author={Zeltera},
   year={2025},
   howpublished={Hugging Face Model},
-  note={https://huggingface.co/zeltera/mcma}
-}
-🧠 About
-MCMA was developed to bridge language modeling with cybersecurity domain expertise. It combines transformer-based reasoning with structured static malware feature interpretation to assist analysts in real-world threat environments.

 - Qwen/Qwen2.5-0.5B
 ---
+# 📊 SMITH — Static Malware Interpreter & Threat Heuristic
+**SMITH (Static Malware Interpreter & Threat Heuristic)** is a transformer-based artificial intelligence model designed for structured and interpretable static malware analysis. Built through parameter-efficient fine-tuning (LoRA) on top of a Qwen2.5 backbone and augmented with retrieval-based threat context, SMITH generates JSON-formatted output that includes reasoning, identified indicators, confidence scores, actionable recommendations, and MITRE ATT&CK technique mappings. This makes SMITH a valuable tool for cybersecurity analysts, automation systems, and threat intelligence pipelines.
 ---
+## 🧠 Model Description
+SMITH is a domain-specialized large language model tailored for static analysis of malware artifacts and descriptions. It interprets features such as permissions, API calls, imported functions, and known Indicators of Compromise (IoC), and produces structured, machine-readable assessments that can be integrated into analysis workflows or SOC automation.
+The model combines:
+- **Fine-tuned transformer intelligence** for reasoning over static features
+- **Retrieval-augmented context** from a curated threat corpus
+- **Structured output** for easy integration with tools and scripts
+SMITH is optimized to run on practical hardware while delivering interpretable cybersecurity insights.
 ---
 ## 🎯 Intended Use Cases
+SMITH is intended for **defensive cybersecurity applications**, including:
+- Interpreting static malware artifacts (e.g., APK metadata, PE strings)
+- Producing structured JSON analysis suitable for automation
+- Mapping malware behavior to MITRE ATT&CK techniques
+- Assisting security analysts with context-aware reasoning
+- Generating YARA rules based on detected indicators
+> ⚠️ **Static analysis only — SMITH does NOT execute malware or perform dynamic analysis.**
 ---
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
+model_id = "zeltera/mcma"  # or "zeltera/smith" if renamed
+tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(
     model_id,
     device_map="auto",
 prompt = """
 You are a cybersecurity malware analysis assistant.
+Respond ONLY in valid JSON with these fields:
 - reasoning
 - indicators
 - confidence
 """
 inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=300)
 print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+🧠 Model Architecture and Training
+Base Model: Qwen2.5 family (compact, efficient)
+Fine-Tuning: LoRA with domain-specific malware analysis examples
+Output Format: Structured JSON
+Hardware: Designed for inference on commodity GPUs and CPU environments
 ⚠️ Limitations
+SMITH is strictly for static analysis of malware descriptions.
+Does not execute or sandbox any executable or mobile binary.
+Analysis should be reviewed by qualified security staff.
+Confidence scores are heuristic and not absolute.
+📚 Ethical and Safe Use
+SMITH is intended for defensive cybersecurity and threat intelligence purposes. It should not be used to generate or assist in creating malware, malicious code, or harmful artifacts. Users should comply with all relevant laws and organizational policies.
+📜 Citation
+If you use this model in research or deployment:
+@misc{smith2025,
+  title={SMITH — Static Malware Interpreter & Threat Heuristic},
   author={Zeltera},
   year={2025},
   howpublished={Hugging Face Model},
+  note={\url{https://huggingface.co/zeltera/SMITH}}
+}