Update README.md
Browse files
README.md
CHANGED
|
@@ -1,199 +1,297 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
-
tags:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
-
#
|
| 7 |
|
| 8 |
-
|
|
|
|
|
|
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
|
|
|
| 11 |
|
| 12 |
-
## Model
|
| 13 |
-
|
| 14 |
-
### Model Description
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
| 19 |
|
| 20 |
-
|
| 21 |
-
- **Funded by [optional]:** [More Information Needed]
|
| 22 |
-
- **Shared by [optional]:** [More Information Needed]
|
| 23 |
-
- **Model type:** [More Information Needed]
|
| 24 |
-
- **Language(s) (NLP):** [More Information Needed]
|
| 25 |
-
- **License:** [More Information Needed]
|
| 26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
-
|
| 31 |
|
| 32 |
-
- **
|
| 33 |
-
- **
|
| 34 |
-
- **
|
|
|
|
| 35 |
|
| 36 |
-
|
| 37 |
|
| 38 |
-
|
| 39 |
|
| 40 |
### Direct Use
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
### Downstream Use [optional]
|
| 47 |
|
| 48 |
-
|
| 49 |
|
| 50 |
-
|
|
|
|
|
|
|
| 51 |
|
| 52 |
### Out-of-Scope Use
|
| 53 |
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
## Bias, Risks, and Limitations
|
| 59 |
-
|
| 60 |
-
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
| 61 |
-
|
| 62 |
-
[More Information Needed]
|
| 63 |
-
|
| 64 |
-
### Recommendations
|
| 65 |
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
| 69 |
-
|
| 70 |
-
## How to Get Started with the Model
|
| 71 |
-
|
| 72 |
-
Use the code below to get started with the model.
|
| 73 |
-
|
| 74 |
-
[More Information Needed]
|
| 75 |
|
| 76 |
-
##
|
| 77 |
|
| 78 |
-
###
|
| 79 |
|
| 80 |
-
|
|
|
|
|
|
|
| 81 |
|
| 82 |
-
|
|
|
|
| 83 |
|
| 84 |
-
|
| 85 |
|
| 86 |
-
|
|
|
|
|
|
|
| 87 |
|
| 88 |
-
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
-
[
|
|
|
|
| 91 |
|
|
|
|
| 92 |
|
| 93 |
-
|
|
|
|
|
|
|
| 94 |
|
| 95 |
-
|
|
|
|
|
|
|
| 96 |
|
| 97 |
-
|
|
|
|
| 98 |
|
| 99 |
-
|
|
|
|
| 100 |
|
| 101 |
-
|
| 102 |
|
| 103 |
-
|
|
|
|
| 104 |
|
| 105 |
-
|
|
|
|
|
|
|
| 106 |
|
| 107 |
-
|
|
|
|
|
|
|
| 108 |
|
| 109 |
-
|
| 110 |
|
| 111 |
-
|
| 112 |
|
| 113 |
-
|
| 114 |
|
| 115 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
-
|
| 118 |
|
| 119 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
|
| 121 |
-
|
| 122 |
|
| 123 |
-
|
| 124 |
|
| 125 |
-
|
| 126 |
|
| 127 |
### Results
|
| 128 |
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
#### Summary
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
## Model Examination [optional]
|
| 136 |
-
|
| 137 |
-
<!-- Relevant interpretability work for the model goes here -->
|
| 138 |
|
| 139 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
|
| 141 |
-
|
| 142 |
|
| 143 |
-
|
| 144 |
|
| 145 |
-
|
|
|
|
|
|
|
| 146 |
|
| 147 |
-
|
| 148 |
-
- **Hours used:** [More Information Needed]
|
| 149 |
-
- **Cloud Provider:** [More Information Needed]
|
| 150 |
-
- **Compute Region:** [More Information Needed]
|
| 151 |
-
- **Carbon Emitted:** [More Information Needed]
|
| 152 |
-
|
| 153 |
-
## Technical Specifications [optional]
|
| 154 |
-
|
| 155 |
-
### Model Architecture and Objective
|
| 156 |
-
|
| 157 |
-
[More Information Needed]
|
| 158 |
-
|
| 159 |
-
### Compute Infrastructure
|
| 160 |
-
|
| 161 |
-
[More Information Needed]
|
| 162 |
|
| 163 |
-
|
| 164 |
|
| 165 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
|
| 167 |
-
|
| 168 |
|
| 169 |
-
|
| 170 |
|
| 171 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 172 |
|
| 173 |
-
|
| 174 |
|
| 175 |
-
|
| 176 |
|
| 177 |
-
|
| 178 |
|
| 179 |
-
**
|
|
|
|
|
|
|
| 180 |
|
| 181 |
-
|
| 182 |
|
| 183 |
-
##
|
| 184 |
|
| 185 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
|
| 187 |
-
|
| 188 |
|
| 189 |
-
|
|
|
|
|
|
|
|
|
|
| 190 |
|
| 191 |
-
|
| 192 |
|
| 193 |
-
## Model Card Authors
|
| 194 |
|
| 195 |
-
|
| 196 |
|
| 197 |
## Model Card Contact
|
| 198 |
|
| 199 |
-
|
|
|
|
| 1 |
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: apache-2.0
|
| 5 |
library_name: transformers
|
| 6 |
+
tags:
|
| 7 |
+
- cybersecurity
|
| 8 |
+
- APT
|
| 9 |
+
- threat-intelligence
|
| 10 |
+
- contrastive-learning
|
| 11 |
+
- embeddings
|
| 12 |
+
- attribution
|
| 13 |
+
- MITRE-ATTACK
|
| 14 |
+
- CTI
|
| 15 |
+
- ModernBERT
|
| 16 |
+
datasets:
|
| 17 |
+
- mitre-attack
|
| 18 |
+
base_model: cisco-ai/SecureBERT2.0-base
|
| 19 |
+
pipeline_tag: feature-extraction
|
| 20 |
+
model-index:
|
| 21 |
+
- name: FALCON
|
| 22 |
+
results:
|
| 23 |
+
- task:
|
| 24 |
+
type: text-classification
|
| 25 |
+
name: APT Group Attribution
|
| 26 |
+
metrics:
|
| 27 |
+
- type: accuracy
|
| 28 |
+
value: 0.0
|
| 29 |
+
name: Accuracy (5-fold CV)
|
| 30 |
+
- type: f1
|
| 31 |
+
value: 0.0
|
| 32 |
+
name: F1 Weighted (5-fold CV)
|
| 33 |
+
- type: f1
|
| 34 |
+
value: 0.0
|
| 35 |
+
name: F1 Macro (5-fold CV)
|
| 36 |
---
|
| 37 |
|
| 38 |
+
# FALCON — Finetuned Actor Linking via CONtrastive Learning
|
| 39 |
|
| 40 |
+
<p align="center">
|
| 41 |
+
<strong>A domain-adapted embedding model for automated APT group attribution from cyber threat intelligence text.</strong>
|
| 42 |
+
</p>
|
| 43 |
|
| 44 |
+
| | |
|
| 45 |
+
|---|---|
|
| 46 |
+
| **Developed by** | AIT — Austrian Institute of Technology, Cybersecurity Group |
|
| 47 |
+
| **Model type** | Transformer encoder (ModernBERT) with contrastive fine-tuning |
|
| 48 |
+
| **Language** | English |
|
| 49 |
+
| **License** | Apache 2.0 |
|
| 50 |
+
| **Base model** | [cisco-ai/SecureBERT2.0-base](https://huggingface.co/cisco-ai/SecureBERT2.0-base) |
|
| 51 |
+
| **Paper** | *Coming soon* |
|
| 52 |
|
| 53 |
+
---
|
| 54 |
|
| 55 |
+
## Model Description
|
|
|
|
|
|
|
| 56 |
|
| 57 |
+
FALCON (**F**inetuned **A**ctor **L**inking via **CON**trastive learning) is a cybersecurity embedding model that maps textual descriptions of attack behaviors to a vector space where descriptions belonging to the same APT group are close together and descriptions from different groups are far apart.
|
| 58 |
|
| 59 |
+
Given a sentence like *"The group has used spearphishing emails with malicious macro-enabled attachments to deliver initial payloads"*, FALCON produces a 768-dimensional embedding that can be used to classify which APT group performed that behavior.
|
| 60 |
|
| 61 |
+
### Training Pipeline
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
+
```
|
| 64 |
+
cisco-ai/SecureBERT2.0-base (ModernBERT, 150M params)
|
| 65 |
+
↓
|
| 66 |
+
Tokenizer Extension — Added APT group names + aliases as single tokens
|
| 67 |
+
↓
|
| 68 |
+
MLM Fine-Tuning — Taught the model meaningful representations for new tokens
|
| 69 |
+
↓
|
| 70 |
+
Supervised Contrastive Fine-Tuning (SupCon) — Shaped the embedding space
|
| 71 |
+
so same-group descriptions cluster together
|
| 72 |
+
↓
|
| 73 |
+
FALCON
|
| 74 |
+
```
|
| 75 |
|
| 76 |
+
### What Makes FALCON Different
|
| 77 |
|
| 78 |
+
- **Domain-adapted base**: Built on SecureBERT 2.0, which already understands cybersecurity terminology, rather than a generic language model.
|
| 79 |
+
- **Contrastive objective**: Unlike classification-only models, FALCON optimizes the embedding geometry directly using Supervised Contrastive Loss (Khosla et al., 2020), producing embeddings suitable for retrieval, clustering, and few-shot classification.
|
| 80 |
+
- **Name-agnostic**: Group names are masked during contrastive training with `[MASK]`, forcing the model to learn behavioral patterns rather than memorizing name co-occurrences.
|
| 81 |
+
- **Alias-aware tokenizer**: APT group names and their vendor-specific aliases (e.g., APT29, Cozy Bear, Midnight Blizzard, NOBELIUM) are single tokens, preventing subword fragmentation.
|
| 82 |
|
| 83 |
+
---
|
| 84 |
|
| 85 |
+
## Intended Uses
|
| 86 |
|
| 87 |
### Direct Use
|
| 88 |
|
| 89 |
+
- **APT group attribution**: Given a behavioral description from a CTI report, classify which threat actor is most likely responsible.
|
| 90 |
+
- **Semantic search over CTI**: Retrieve the most relevant threat actor profiles given a description of observed attack behavior.
|
| 91 |
+
- **Threat actor clustering**: Group unlabeled incident descriptions by behavioral similarity.
|
| 92 |
+
- **Few-shot attribution**: Attribute newly emerging APT groups with very few reference samples.
|
|
|
|
| 93 |
|
| 94 |
+
### Downstream Use
|
| 95 |
|
| 96 |
+
- Fine-tuning for organization-specific threat actor taxonomies.
|
| 97 |
+
- Integration into SIEM/SOAR pipelines for automated triage.
|
| 98 |
+
- Enrichment of threat intelligence platforms with behavioral similarity scoring.
|
| 99 |
|
| 100 |
### Out-of-Scope Use
|
| 101 |
|
| 102 |
+
- Attribution based on IOCs (hashes, IPs, domains) — FALCON operates on natural language text only.
|
| 103 |
+
- Real-time network traffic classification.
|
| 104 |
+
- Definitive legal or geopolitical attribution — FALCON is a decision-support tool, not an oracle.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
|
| 106 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
|
| 108 |
+
## How to Use
|
| 109 |
|
| 110 |
+
### Feature Extraction (Embeddings)
|
| 111 |
|
| 112 |
+
```python
|
| 113 |
+
import torch
|
| 114 |
+
from transformers import AutoModel, AutoTokenizer
|
| 115 |
|
| 116 |
+
model = AutoModel.from_pretrained("ait-cybersec/FALCON")
|
| 117 |
+
tokenizer = AutoTokenizer.from_pretrained("ait-cybersec/FALCON")
|
| 118 |
|
| 119 |
+
text = "The group used PowerShell scripts to download and execute additional payloads."
|
| 120 |
|
| 121 |
+
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
|
| 122 |
+
with torch.no_grad():
|
| 123 |
+
outputs = model(**inputs)
|
| 124 |
|
| 125 |
+
# Mean pooling (recommended)
|
| 126 |
+
attention_mask = inputs["attention_mask"].unsqueeze(-1)
|
| 127 |
+
token_embs = outputs.last_hidden_state
|
| 128 |
+
embedding = (token_embs * attention_mask).sum(dim=1) / attention_mask.sum(dim=1)
|
| 129 |
|
| 130 |
+
print(f"Embedding shape: {embedding.shape}") # [1, 768]
|
| 131 |
+
```
|
| 132 |
|
| 133 |
+
### APT Group Classification (with sklearn probe)
|
| 134 |
|
| 135 |
+
```python
|
| 136 |
+
import numpy as np
|
| 137 |
+
from sklearn.linear_model import LogisticRegression
|
| 138 |
|
| 139 |
+
# Encode your labeled corpus
|
| 140 |
+
train_embeddings = np.array([get_embedding(text) for text in train_texts])
|
| 141 |
+
test_embeddings = np.array([get_embedding(text) for text in test_texts])
|
| 142 |
|
| 143 |
+
clf = LogisticRegression(max_iter=2000)
|
| 144 |
+
clf.fit(train_embeddings, train_labels)
|
| 145 |
|
| 146 |
+
predictions = clf.predict(test_embeddings)
|
| 147 |
+
```
|
| 148 |
|
| 149 |
+
### Semantic Similarity Between Descriptions
|
| 150 |
|
| 151 |
+
```python
|
| 152 |
+
from sklearn.metrics.pairwise import cosine_similarity
|
| 153 |
|
| 154 |
+
emb1 = get_embedding("The actor used spearphishing with malicious attachments.")
|
| 155 |
+
emb2 = get_embedding("The group sent phishing emails containing weaponized documents.")
|
| 156 |
+
emb3 = get_embedding("The adversary exploited a SQL injection vulnerability.")
|
| 157 |
|
| 158 |
+
print(f"Phishing vs Phishing: {cosine_similarity(emb1, emb2)[0][0]:.4f}") # High
|
| 159 |
+
print(f"Phishing vs SQLi: {cosine_similarity(emb1, emb3)[0][0]:.4f}") # Lower
|
| 160 |
+
```
|
| 161 |
|
| 162 |
+
---
|
| 163 |
|
| 164 |
+
## Training Details
|
| 165 |
|
| 166 |
+
### Training Data
|
| 167 |
|
| 168 |
+
- **Source**: [MITRE ATT&CK Enterprise Groups](https://attack.mitre.org/groups/) — technique usage descriptions for all tracked APT groups.
|
| 169 |
+
- **Preprocessing**:
|
| 170 |
+
- Canonicalized group aliases using `GroupID` (e.g., APT29 = Cozy Bear = Midnight Blizzard → single label).
|
| 171 |
+
- Filtered to groups with ≥30 unique technique usage descriptions.
|
| 172 |
+
- Masked all group names and aliases in training text with `[MASK]` to prevent name leakage.
|
| 173 |
+
- **Final dataset**: ~144 unique APT groups, variable samples per group (30–200+).
|
| 174 |
|
| 175 |
+
### Training Procedure
|
| 176 |
|
| 177 |
+
#### Stage 1: Tokenizer Extension
|
| 178 |
+
|
| 179 |
+
Extended the SecureBERT 2.0 tokenizer with APT group names and vendor-specific aliases as single tokens. This prevents names like "Kimsuky" from being split into subword fragments (`['Kim', '##su', '##ky']` → `['Kimsuky']`).
|
| 180 |
+
|
| 181 |
+
#### Stage 2: Masked Language Modeling (MLM)
|
| 182 |
+
|
| 183 |
+
| Hyperparameter | Value |
|
| 184 |
+
|---|---|
|
| 185 |
+
| Base model | cisco-ai/SecureBERT2.0-base |
|
| 186 |
+
| Objective | MLM (15% masking probability) |
|
| 187 |
+
| Learning rate | 2e-5 |
|
| 188 |
+
| Batch size | 16 |
|
| 189 |
+
| Epochs | 10 |
|
| 190 |
+
| Weight decay | 0.01 |
|
| 191 |
+
| Warmup ratio | 0.1 |
|
| 192 |
+
| Max sequence length | 128 |
|
| 193 |
+
| Text used | Unmasked (model sees group names to learn their embeddings) |
|
| 194 |
+
|
| 195 |
+
#### Stage 3: Supervised Contrastive Learning (SupCon)
|
| 196 |
+
|
| 197 |
+
| Hyperparameter | Value |
|
| 198 |
+
|---|---|
|
| 199 |
+
| Base checkpoint | Stage 2 MLM output |
|
| 200 |
+
| Loss function | Supervised Contrastive Loss (Khosla et al., 2020) |
|
| 201 |
+
| Temperature | 0.07 |
|
| 202 |
+
| Projection head | 768 → 768 (ReLU) → 256 |
|
| 203 |
+
| Unfrozen layers | Last 4 transformer layers + projection head |
|
| 204 |
+
| Learning rate | 2e-5 |
|
| 205 |
+
| Batch size | 64 |
|
| 206 |
+
| Epochs | 15 |
|
| 207 |
+
| Scheduler | Cosine annealing |
|
| 208 |
+
| Gradient clipping | max_norm=1.0 |
|
| 209 |
+
| Text used | Masked (group names replaced with `[MASK]`) |
|
| 210 |
|
| 211 |
+
---
|
| 212 |
|
| 213 |
+
## Evaluation
|
| 214 |
|
| 215 |
+
Evaluation uses a **linear probing protocol**: freeze the model, extract embeddings, train a LogisticRegression classifier on top, and report metrics using **5-fold stratified cross-validation** with oversampling applied only to the training fold (no data leakage).
|
| 216 |
|
| 217 |
### Results
|
| 218 |
|
| 219 |
+
<!-- UPDATE THESE WITH YOUR ACTUAL RESULTS -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 220 |
|
| 221 |
+
| Model | Accuracy | F1 Weighted | F1 Macro |
|
| 222 |
+
|---|---|---|---|
|
| 223 |
+
| SecureBERT 2.0 (frozen baseline, CLS) | — | — | — |
|
| 224 |
+
| SecureBERT 2.0 (frozen baseline, Mean) | — | — | — |
|
| 225 |
+
| FALCON-base (MLM only) | — | — | — |
|
| 226 |
+
| **FALCON (MLM + Contrastive)** | **—** | **—** | **—** |
|
| 227 |
|
| 228 |
+
*Fill in after training completes.*
|
| 229 |
|
| 230 |
+
### Evaluation Protocol Details
|
| 231 |
|
| 232 |
+
- **No data leakage**: Oversampling is applied inside each training fold only; test folds contain only original, unique samples.
|
| 233 |
+
- **Name masking**: All group names and aliases are replaced with `[MASK]` in evaluation text, ensuring the model is evaluated on behavioral understanding, not name recognition.
|
| 234 |
+
- **Canonicalization**: All vendor-specific aliases are resolved to a single canonical label per `GroupID`, preventing inflated metrics from alias splits.
|
| 235 |
|
| 236 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 237 |
|
| 238 |
+
## Comparison with Related Models
|
| 239 |
|
| 240 |
+
| Model | Domain | Architecture | Training Objective | Cybersecurity-Specific |
|
| 241 |
+
|---|---|---|---|---|
|
| 242 |
+
| BERT base | General | BERT | MLM + NSP | ❌ |
|
| 243 |
+
| SecBERT | Cybersecurity | BERT | MLM | ✅ |
|
| 244 |
+
| SecureBERT | Cybersecurity | RoBERTa | MLM (custom tokenizer) | ✅ |
|
| 245 |
+
| ATTACK-BERT | Cybersecurity | Sentence-BERT | Sentence similarity | ✅ |
|
| 246 |
+
| SecureBERT 2.0 | Cybersecurity | ModernBERT | MLM (text + code) | ✅ |
|
| 247 |
+
| **FALCON** | **APT Attribution** | **ModernBERT** | **MLM + SupCon** | **✅ (task-specific)** |
|
| 248 |
|
| 249 |
+
---
|
| 250 |
|
| 251 |
+
## Limitations and Bias
|
| 252 |
|
| 253 |
+
- **Training data bias**: MITRE ATT&CK over-represents well-documented state-sponsored groups (APT28, APT29, Lazarus). Less-known actors may have weaker representations.
|
| 254 |
+
- **Behavioral overlap**: Many APT groups share identical TTPs (e.g., spearphishing, PowerShell usage). The model cannot reliably distinguish groups that employ the same techniques in the same way.
|
| 255 |
+
- **English only**: The model is trained on English-language CTI text and will not perform well on non-English threat reports.
|
| 256 |
+
- **Static knowledge**: The model reflects the MITRE ATT&CK knowledge base at training time and does not update as new groups or techniques emerge.
|
| 257 |
+
- **Not a replacement for analyst judgment**: FALCON is a decision-support tool. Attribution conclusions should always be validated by human analysts.
|
| 258 |
|
| 259 |
+
---
|
| 260 |
|
| 261 |
+
## Ethical Considerations
|
| 262 |
|
| 263 |
+
Automated threat attribution is a sensitive capability with potential for misuse. Incorrect attribution could lead to misguided defensive actions or geopolitical consequences. Users should:
|
| 264 |
|
| 265 |
+
- Always treat model outputs as **hypotheses**, not conclusions.
|
| 266 |
+
- Combine FALCON outputs with additional intelligence sources (IOCs, infrastructure analysis, geopolitical context).
|
| 267 |
+
- Be aware that threat actors deliberately employ false-flag operations to mislead attribution.
|
| 268 |
|
| 269 |
+
---
|
| 270 |
|
| 271 |
+
## Citation
|
| 272 |
|
| 273 |
+
```bibtex
|
| 274 |
+
@misc{falcon2025,
|
| 275 |
+
title={FALCON: Finetuned Actor Linking via Contrastive Learning for APT Group Attribution},
|
| 276 |
+
author={AIT Austrian Institute of Technology, Cybersecurity Group},
|
| 277 |
+
year={2025},
|
| 278 |
+
url={https://huggingface.co/ait-cybersec/FALCON}
|
| 279 |
+
}
|
| 280 |
+
```
|
| 281 |
|
| 282 |
+
### Related Work
|
| 283 |
|
| 284 |
+
- Aghaei, E. et al. "SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence." arXiv:2510.00240 (2025).
|
| 285 |
+
- Khosla, P. et al. "Supervised Contrastive Learning." NeurIPS (2020).
|
| 286 |
+
- Irfan, S. et al. "A Comprehensive Survey of APT Attribution." arXiv:2409.11415 (2024).
|
| 287 |
+
- Abdeen, B. et al. "SMET: Semantic Mapping of CVE to ATT&CK." (2023).
|
| 288 |
|
| 289 |
+
---
|
| 290 |
|
| 291 |
+
## Model Card Authors
|
| 292 |
|
| 293 |
+
AIT — Austrian Institute of Technology, Cybersecurity Group
|
| 294 |
|
| 295 |
## Model Card Contact
|
| 296 |
|
| 297 |
+
For inquiries, please open an issue on this repository or contact the AIT Cybersecurity Group.
|