RISys-Lab
/

RedSage-Qwen3-8B-Base

@@ -2,198 +2,136 @@
 library_name: transformers
 tags:
 - generated_from_trainer
-datasets:
-- naufalso/redsage_seed
-- naufalso/cybersecurity_seed_dump
-- trendmicro-ailab/Primus-Seed
-- trendmicro-ailab/Primus-Seed
-- naufalso/nvd-cve
 model-index:
-- name: outputs/pretrain/qwen/RedSage-Qwen3-8B-Pretrain_05-Seed-New
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
-<details><summary>See axolotl config</summary>
-axolotl version: `0.10.0`
-```yaml
-# ------------------------------------------------------------------
-# Basic model + tokenizer
-# ------------------------------------------------------------------
-base_model: ./outputs/pretrain/qwen/dedup/RedSage-Qwen3-8b-Base-Pretrain-Dedup_05                    # dense 8 B variant
-# model_type: qwen
-# tokenizer_type: qwen
-trust_remote_code: true
-auto_resume_from_checkpoints: true
-# ------------------------------------------------------------------
-# Precision + distributed strategy
-# ------------------------------------------------------------------
-bf16: true                                    # enable bf16 math
-deepspeed: deepspeed_configs/zero3_bf16.json  # sharded weights/opt/grads
-gradient_checkpointing: true                 # recompute to save VRAM
-sequence_parallel: true                      # tiny extra memory win
-# ------------------------------------------------------------------
-# Batch, sequence, epochs
-# ------------------------------------------------------------------
-micro_batch_size: 32
-gradient_accumulation_steps: 1 # 16 x 2 x 4 GPU = 32 x 8 node = 256 batch
-num_epochs: 5
-seq_length: 32768
-# ------------------------------------------------------------------
-# Optimiser & scheduler
-# ------------------------------------------------------------------
-optimizer: adamw_torch
-lr_scheduler: cosine
-learning_rate: 2.5e-5
-weight_decay: 0.05
-warmup_ratio: 0.01
-cosine_min_lr_ratio: 0.1
-cosine_constant_lr_ratio: 0.2
-# ------------------------------------------------------------------
-# Dataset (replace with your own)
-# ------------------------------------------------------------------
-chat_template: jinja
-chat_template_jinja: "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%-\
-  \ if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n\
-  \    {%- else %}\n        {{- 'You are REDSAGE, cybersecurity-tuned model developed\
-  \ by Khalifa University. You are a helpful assistant.' }}\n    {%- endif %}\n  \
-  \  {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the\
-  \ user query.\\n\\nYou are provided with function signatures within <tools></tools>\
-  \ XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n\
-  \        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor\
-  \ each function call, return a json object with function name and arguments within\
-  \ <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>,\
-  \ \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else\
-  \ %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\\
-  n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\\
-  nYou are REDSAGE, cybersecurity-tuned model developed by Khalifa University. You\
-  \ are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%-\
-  \ for message in messages %}\n    {%- if (message.role == \"user\") or (message.role\
-  \ == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls)\
-  \ %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>'\
-  \ + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>'\
-  \ + message.role }}\n        {%- if message.content %}\n            {{- '\\n' +\
-  \ message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls\
-  \ %}\n            {%- if tool_call.function is defined %}\n                {%- set\
-  \ tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\\
-  n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n          \
-  \  {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n\
-  \            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\\
-  n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 ==\
-  \ 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user'\
-  \ }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{-\
-  \ message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last\
-  \ or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\\
-  n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt\
-  \ %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n"
-datasets:
-  - path: naufalso/redsage_seed
-    type: completion
-    name: all
-  - path: naufalso/cybersecurity_seed_dump
-    type: completion
-    name: default
-  - path: trendmicro-ailab/Primus-Seed
-    type: completion
-    name: cybersecurity_companies_websites
-    field: content
-  - path: trendmicro-ailab/Primus-Seed
-    type: completion
-    name: mitre
-    field: content
-  - path: naufalso/nvd-cve
-    type: completion
-    name: filtered
-# ------------------------------------------------------------------
-# Logging / output
-# ------------------------------------------------------------------
-output_dir: ./outputs/pretrain/qwen/RedSage-Qwen3-8B-Pretrain_05-Seed-New
-dataset_prepared_path: ./prepared_datasets/RedSage-Qwen3-8B-Pretrain_05-Seed-New
-saves_per_epoch: 1
-eval_steps: 0.5
-val_set_size: 0.05
-log_with:
-  - wandb
-  - tensorboard
-use_tensorboard: true
-wandb_mode: "offline"
-wandb_entity: naufalso
-wandb_project: redsage
-wandb_name: RedSage-Qwen3-8B-Pretrain_05-Seed-New
-# ------------------------------------------------------------------
-# Misc
-# ------------------------------------------------------------------
-save_total_limit: 5            # keep the last 2 checkpoints
-load_in_8bit: false            # full fine-tune, no quantisation
-torch_compile: false           # turn on only after the run is stable
-```
-</details><br>
-# outputs/pretrain/qwen/RedSage-Qwen3-8B-Pretrain_05-Seed-New
-This model was trained from scratch on the naufalso/redsage_seed, the naufalso/cybersecurity_seed_dump, the trendmicro-ailab/Primus-Seed, the trendmicro-ailab/Primus-Seed and the naufalso/nvd-cve datasets.
-It achieves the following results on the evaluation set:
-- Loss: 0.9952
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2.5e-05
-- train_batch_size: 32
-- eval_batch_size: 32
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 32
-- total_train_batch_size: 1024
-- total_eval_batch_size: 1024
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 69
-- training_steps: 6921
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| No log        | 0      | 0    | 1.7388          |
-| 0.9127        | 2.4989 | 3461 | 0.9952          |
-### Framework versions
-- Transformers 4.52.3
-- Pytorch 2.5.1+cu121
-- Datasets 3.6.0
-- Tokenizers 0.21.2

 library_name: transformers
 tags:
 - generated_from_trainer
+- cybersecurity
+- continual-pretraining
+- targeted-pretraining
+- text-generation
+- casual-lm
+- risys-lab
 model-index:
+- name: RedSage-Qwen3-8B-Base
   results: []
+language:
+- en
+base_model:
+- RISys-Lab/RedSage-Qwen3-8B-CFW
+pipeline_tag: text-generation
 ---
+# RedSage-Qwen3-8B-Base
+<div align="center">
+  <img src="https://img.shields.io/badge/Task-Cybersecurity-red" alt="Cybersecurity">
+  <img src="https://img.shields.io/badge/Stage-Targeted_Pretraining-blue" alt="Targeted Pretraining">
+</div>
+## Model Summary
+**RedSage-Qwen3-8B-Base** is a cybersecurity-specialized Large Language Model (LLM) developed by **RISys-Lab**. It represents the **second stage** of the RedSage pre-training pipeline.
+This model builds upon **RedSage-Qwen3-8B-CFW** by undergoing **Targeted Pre-Training** on high-quality, curated cybersecurity resources (`RedSage-Seed` and `RedSage-Dump`). While the previous stage focused on breadth using web data, this stage focuses on depth, technical standards, and verified skills.
+- **Paper:** [RedSage: A Cybersecurity Generalist LLM](https://openreview.net/forum?id=W4FAenIrQ2)
+- **Repository:** [GitHub](https://github.com/RISys-Lab/RedSage)
+- **Base Model:** [RISys-Lab/RedSage-Qwen3-8B-CFW](https://huggingface.co/RISys-Lab/RedSage-Qwen3-8B-CFW)
+- **Variant:** Base (Final Pre-trained Checkpoint)
+## Intended Use
+This model is a **base model** intended for:
+1.  **Fine-tuning:** Serving as a high-quality foundation for downstream cybersecurity tasks (e.g., incident response, malware analysis).
+2.  **Research:** Investigating the impact of curated versus web-scale data in domain adaptation.
+3.  **Completion:** Code completion and technical writing in cybersecurity contexts.
+**Note:** As a base model, this checkpoint has **not** been instruction-tuned (SFT) or aligned (DPO). It behaves like a completion engine. For a chat-ready assistant, please see `RISys-Lab/RedSage-Qwen3-8B-DPO`.
+## Training Lineage
+RedSage employs a multi-stage training pipeline. This model represents the output of **Stage 2**.
+1.  Stage 1: Continual Pre-Training (CPT) -> `RedSage-Qwen3-8B-CFW` (CyberFineWeb data)
+2.  **Stage 2: Targeted Pre-Training** -> **`RedSage-Qwen3-8B-Base`** (Current Model)
+3.  Stage 3: Supervised Fine-Tuning (SFT) -> `RedSage-Qwen3-8B-Ins`
+4.  Stage 4: Direct Preference Optimization (DPO) -> `RedSage-Qwen3-8B-DPO`
+## Training Data: RedSage-Seed & Dump
+This model was trained on approximately **850 million tokens** of curated data, split into two collections:
+1.  **RedSage-Seed (~150M Tokens):** A highly curated collection of 28,637 samples converted to structured Markdown.
+    * **Knowledge:** General concepts and Frameworks (MITRE ATT&CK, CAPEC, CWE, OWASP).
+    * **Skills:** Offensive security resources including write-ups, hacking techniques, and payload examples.
+    * **Tools:** Manuals and cheat sheets for CLI tools and Kali Linux.
+2.  **RedSage-Dump (~700M Tokens):** A larger aggregation of 459K technical documents.
+    * **Sources:** Computer education portals, cybersecurity news, RFC entries, NIST publications, and the National Vulnerability Database (NVD).
+## Performance
+RedSage-8B-Base achieves state-of-the-art performance among 8B models, showing significant improvements over the general-purpose Qwen3-8B-Base. It achieves the highest mean score on external benchmarks among all 8B base models tested.
+### RedSage-Bench (0-shot Accuracy)
+| Category | Qwen3-8B-Base | **RedSage-8B-Base** |
+| :--- | :---: | :---: |
+| **Macro Average** | 84.24 | **85.05** |
+| Knowledge (General) | 83.08 | 83.12 |
+| Knowledge (Frameworks) | 81.94 | **84.94** |
+| Skill (Offensive) | 88.23 | **88.72** |
+| Tools (CLI) | 85.08 | **85.44** |
+| Tools (Kali) | 78.86 | **79.36** |
+### External Cybersecurity Benchmarks (5-shot)
+| Benchmark | Qwen3-8B-Base | **RedSage-8B-Base** |
+| :--- | :---: | :---: |
+| **Mean** | 80.81 | **84.56** |
+| CTI-Bench (MCQ) | 68.80 | **71.04** |
+| CTI-Bench (RCM) | 63.50 | **78.40** |
+| CyberMetric (500) | 92.00 | **92.60** |
+| MMLU (Security) | 83.00 | **87.00** |
+| SecBench (En) | **82.84** | 81.76 |
+| SecEva (MCQ) | 75.60 | **75.83** |
+| SECURE (CWET) | 92.70 | **93.22** |
+| SECURE (KCV) | 75.05 | **87.20** |
+| SECURE (MEAT) | 93.81 | **94.00** |
+## Training Procedure
+The model was trained using the [Axolotl](https://github.com/axolotl-ai-cloud/axolotl) framework.
+- **Learning Rate:** 2.5e-6 (constant with linear warmup)
+- **Optimizer:** AdamW
+- **Epochs:** 1
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "RISys-Lab/RedSage-Qwen3-8B-Base"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
+text = "The primary difference between a firewall and an IDS is"
+inputs = tokenizer(text, return_tensors="pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=50)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Citation
+If you use this model or dataset, please cite our paper:
+```
+@inproceedings{suryanto2026redsage,
+  title={RedSage: A Cybersecurity Generalist {LLM}},
+  author={Naufal Suryanto and Muzammal Naseer and Pengfei Li and Syed Talal Wasim and Jinhui Yi and Juergen Gall and Paolo Ceravolo and Ernesto Damiani},
+  booktitle={The Fourteenth International Conference on Learning Representations},
+  year={2026},
+  url={https://openreview.net/forum?id=W4FAenIrQ2}
+}
+```