Update README.md
Browse files
README.md
CHANGED
|
@@ -36,7 +36,7 @@ extra_gated_fields:
|
|
| 36 |
|
| 37 |
# Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
|
| 38 |
|
| 39 |
-
<img src="https://i.imgur.com/PtqeTZw.png" alt="
|
| 40 |
|
| 41 |
> TL;DR: Llama-Primus-Base is a foundation model based on Llama-3.1-8B-Instruct, continually pre-trained on Primus-Seed (0.2B) and Primus-FineWeb (2.57B). Primus-Seed is a high-quality, manually curated cybersecurity text dataset, while Primus-FineWeb consists of cybersecurity texts filtered from FineWeb. By pretraining on such a large-scale cybersecurity corpus, it achieves a 🚀**15.88%** improvement in aggregated scores across multiple cybersecurity benchmarks, demonstrating the effectiveness of cybersecurity-specific pretraining.
|
| 42 |
|
|
|
|
| 36 |
|
| 37 |
# Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
|
| 38 |
|
| 39 |
+
<img src="https://i.imgur.com/PtqeTZw.png" alt="Primus Overview" width="60%">
|
| 40 |
|
| 41 |
> TL;DR: Llama-Primus-Base is a foundation model based on Llama-3.1-8B-Instruct, continually pre-trained on Primus-Seed (0.2B) and Primus-FineWeb (2.57B). Primus-Seed is a high-quality, manually curated cybersecurity text dataset, while Primus-FineWeb consists of cybersecurity texts filtered from FineWeb. By pretraining on such a large-scale cybersecurity corpus, it achieves a 🚀**15.88%** improvement in aggregated scores across multiple cybersecurity benchmarks, demonstrating the effectiveness of cybersecurity-specific pretraining.
|
| 42 |
|