๐ก๏ธ kwCyber-AI-Agent
The First Arabic-English Cybersecurity AI Agent
Built by Nalzankii ๐ฐ๐ผ
Specialized AI Agent for Cybersecurity Education, Penetration Testing Guidance, and Threat Analysis
๐ฏ Model Overview
kwCyber-AI-Agent is a purpose-built cybersecurity AI agent designed to serve as an intelligent assistant for security professionals, students, and enthusiasts in the Arab world and beyond. Unlike general-purpose LLMs, this model is exclusively trained on cybersecurity knowledge, making it a domain expert.
Key Highlights
- ๐ Bilingual: Native Arabic and English cybersecurity expertise
- ๐ง Tool-Use: Can execute security tools (Nmap, Nikto, CVE lookup, etc.)
- ๐ Educational: Personalized cybersecurity learning paths
- ๐ด CTF Expert: Challenge walkthroughs and guidance
- ๐ Threat Analysis: Real-time vulnerability assessment
- ๐ฐ๐ผ Made in Kuwait: Built for the Kuwait Cyber platform
๐ง Model Details
| Attribute | Details |
|---|---|
| Developer | Nalzankii (Kuwait Cyber) |
| Base Architecture | Llama 4 (Transformer) |
| Parameters | ~7B |
| Context Length | 8,192 tokens |
| Languages | Arabic (ar), English (en) |
| License | Llama 4 Community License |
| Domain | Cybersecurity |
| Training Stage | ๐ง In Development |
Architecture
kwCyber-AI-Agent
โโโ Base: Llama 4 Architecture
โโโ Custom Tokenizer (64K vocab, Arabic-optimized)
โโโ Fine-tuned on Cybersecurity Corpus
โโโ Tool-Use Layer (Function Calling)
โโโ Safety Guardrails (Content Filtering)
โโโ RAG Integration (Knowledge Base)
๐ Intended Uses
Primary Use Cases
- Cybersecurity Q&A โ Ask security questions in Arabic or English
- Vulnerability Analysis โ Analyze CVEs, understand attack vectors
- CTF Assistance โ Get hints and walkthroughs for CTF challenges
- Learning Mentor โ Personalized study plans for security certifications
- Tool Guidance โ Learn how to use security tools effectively
- Threat Intelligence โ Stay updated on latest threats and vulnerabilities
Example Interactions
Arabic:
User: ู
ุง ูู ูุฌูู
SQL Injection ูููู ุฃุญู
ู ุชุทุจููู ู
ููุ
kwCyber-AI-Agent: ูุฌูู
SQL Injection ูู ููุน ู
ู ุงููุฌู
ุงุช ูุณุชุบู ุซุบุฑุงุช ูู ุงูุชุทุจููุงุช
ุงูุชู ุชุชุนุงู
ู ู
ุน ููุงุนุฏ ุงูุจูุงูุงุช. ุงูู
ูุงุฌู
ูููู
ุจุญูู ุฃูุงู
ุฑ SQL ุฎุจูุซุฉ ู
ู ุฎูุงู ุญููู
ุงูุฅุฏุฎุงู...
ููุญู
ุงูุฉ:
1. ุงุณุชุฎุฏู
Prepared Statements (ุงุณุชุนูุงู
ุงุช ู
ุญุถูุฑุฉ)
2. ูุนูู Input Validation
3. ุทุจูู ู
ุจุฏุฃ Least Privilege ุนูู ูุงุนุฏุฉ ุงูุจูุงูุงุช
4. ุงุณุชุฎุฏู
WAF (Web Application Firewall)
English:
User: How do I perform a basic Nmap scan on a target?
kwCyber-AI-Agent: Here's how to use Nmap for reconnaissance:
Basic scan: nmap -sV -sC target_ip
- -sV: Version detection
- -sC: Default scripts
โ ๏ธ Important: Only scan targets you have explicit authorization to test.
Tool Use:
User: Scan example.com for open ports
kwCyber-AI-Agent: [Calling: scan_target]
{
"target": "example.com",
"scan_type": "port",
"intensity": "normal"
}
Results: Found 3 open ports...
๐๏ธ Training
Training Stages
| Stage | Description | Data Size | Status |
|---|---|---|---|
| Stage 1 | Continued Pre-training on Cybersecurity Corpus | 100GB+ text | ๐ Planned |
| Stage 2 | Supervised Fine-Tuning (SFT) | 200K+ Q&A pairs | ๐ Planned |
| Stage 3 | DPO Alignment | 10K+ preference pairs | ๐ Planned |
| Stage 4 | Tool-Use Training | 20K+ function calls | ๐ Planned |
Training Data Sources
- MITRE ATT&CK โ Adversary tactics and techniques
- NVD/CVE Database โ Vulnerability records
- OWASP โ Web application security
- CWE โ Common weakness patterns
- CTF Writeups โ HackTheBox, TryHackMe, PicoCTF
- Security Research Papers โ IEEE, ACM, arXiv
- Custom Arabic Dataset โ Translated and original Arabic security content
Training Configuration
training:
base_model: meta-llama/Llama-4-Scout-17B-16E-Instruct
method: QLoRA
lora_r: 64
lora_alpha: 128
learning_rate: 2e-5
batch_size: 16
gradient_accumulation: 4
epochs: 3
warmup_ratio: 0.1
optimizer: adamw_torch
scheduler: cosine
max_seq_length: 8192
precision: bf16
๐ง Agent Capabilities
Supported Tools
| Tool | Capability | Integration |
|---|---|---|
| ๐ Nmap | Port scanning & service detection | CLI Wrapper |
| ๐ Nikto | Web vulnerability scanning | CLI Wrapper |
| ๐๏ธ SQLmap | SQL injection testing | Sandboxed |
| ๐ฆ VirusTotal | Malware analysis | REST API |
| ๐ Shodan | Internet-wide scanning | REST API |
| ๐ CVE API | Vulnerability lookup | REST API |
| ๐ WHOIS | Domain information | Python Lib |
| ๐ก DNS | DNS reconnaissance | Python Lib |
Function Calling Format
{
"name": "scan_target",
"description": "Perform a security scan on target",
"parameters": {
"target": {
"type": "string",
"description": "IP address or domain"
},
"scan_type": {
"type": "string",
"enum": ["port", "vuln", "web", "dns", "full"]
},
"intensity": {
"type": "string",
"enum": ["light", "normal", "aggressive"],
"default": "normal"
}
}
}
๐ Quick Start
Installation
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Nalzankii/kwCyber-AI-Agent"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
Basic Chat
messages = [
{"role": "system", "content": "ุฃูุช kwCyber-AI-Agentุ ุฎุจูุฑ ุฃู
ู ุณูุจุฑุงูู ูุชุญุฏุซ ุงูุนุฑุจูุฉ ูุงูุฅูุฌููุฒูุฉ."},
{"role": "user", "content": "ู
ุง ูู ุฃูุถู ุทุฑููุฉ ููุญุต ุดุจูุฉุ"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
API Usage (Hugging Face Inference)
from huggingface_hub import InferenceClient
client = InferenceClient("Nalzankii/kwCyber-AI-Agent")
response = client.chat_completion(
messages=[
{"role": "system", "content": "You are kwCyber-AI-Agent, a cybersecurity expert."},
{"role": "user", "content": "Explain XSS attacks"}
],
max_tokens=512
)
print(response.choices[0].message.content)
Using with vLLM (Production)
pip install vllm
python -m vllm.entrypoints.openai.api_server \
--model Nalzankii/kwCyber-AI-Agent \
--port 8000 \
--max-model-len 8192
๐ Benchmarks
โณ Benchmarks will be published after training completion.
Planned Evaluations
| Benchmark | Description |
|---|---|
| CyberBench | Cybersecurity knowledge assessment |
| SecQA | Security question answering |
| CTF-Eval | Capture The Flag problem solving |
| Arabic-NLU | Arabic language understanding |
| Tool-Use Accuracy | Function calling correctness |
| Safety Score | Harmful content resistance |
โ ๏ธ Limitations & Ethical Use
Limitations
- Model is specialized in cybersecurity; may underperform on general topics
- Arabic cybersecurity terminology is still evolving; some terms may vary
- Tool execution requires proper sandboxing and authorization
- Not a replacement for professional security audits
Ethical Guidelines
โ ๏ธ IMPORTANT: This model is designed for DEFENSIVE and EDUCATIONAL purposes only.
- โ DO: Use for learning, authorized testing, CTF competitions, security research
- โ DON'T: Use for unauthorized access, creating malware, attacking systems without permission
- โ๏ธ COMPLY: Follow Kuwait Cybercrime Law No. 63/2015 and all applicable regulations
- ๐ก๏ธ RESPONSIBLE: Always obtain proper authorization before any security testing
Safety Measures
- Built-in content filtering for harmful requests
- Requires target authorization for scanning operations
- Logging and audit trail for all agent actions
- Rate limiting to prevent abuse
๐๏ธ Project Roadmap
- Project planning & architecture design
- Hugging Face repository setup
- Data collection & processing pipeline
- Custom tokenizer training
- Continued pre-training
- Supervised fine-tuning (SFT)
- DPO alignment
- Tool-use training
- Multi-platform deployment
- Beta testing
- Public release v1.0
๐ค Contributing
We welcome contributions! Areas where help is needed:
- Arabic cybersecurity content โ Translations and original content
- Dataset contributions โ Q&A pairs, CTF writeups
- Tool integrations โ New security tool wrappers
- Testing & feedback โ Bug reports and suggestions
๐ฌ Contact
- Hugging Face: @Nalzankii
- Project: Kuwait Cyber Platform
๐ License
This model is released under the Llama 4 Community License.
Attribution
- Base architecture by Meta AI
- Cybersecurity knowledge from open-source databases (MITRE, NVD, OWASP)
- Built with โค๏ธ in Kuwait ๐ฐ๐ผ
kwCyber-AI-Agent โ Securing the digital future, one query at a time ๐ก๏ธ
Made with ๐ by Nalzankii | Kuwait Cyber
Model tree for NaifAlzanki/kwCyber-AI-Agent
Base model
meta-llama/Llama-4-Scout-17B-16E