codechrl commited on
Commit
2530ae7
·
verified ·
1 Parent(s): a95fa85

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -40
README.md CHANGED
@@ -1,54 +1,55 @@
1
  ---
2
- library_name: transformers
3
- license: mit
4
- base_model: codechrl/bert-micro-cybersecurity
5
  tags:
6
- - generated_from_trainer
7
- model-index:
8
- - name: bert-micro-cybersecurity
9
- results: []
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
 
15
- # bert-micro-cybersecurity
16
 
17
- This model is a fine-tuned version of [codechrl/bert-micro-cybersecurity](https://huggingface.co/codechrl/bert-micro-cybersecurity) on the None dataset.
 
 
 
 
 
18
 
19
- ## Model description
 
 
20
 
21
- More information needed
22
 
23
- ## Intended uses & limitations
 
24
 
25
- More information needed
 
 
 
26
 
27
- ## Training and evaluation data
 
 
 
28
 
29
- More information needed
 
 
 
30
 
31
- ## Training procedure
 
 
 
 
32
 
33
- ### Training hyperparameters
34
-
35
- The following hyperparameters were used during training:
36
- - learning_rate: 5e-05
37
- - train_batch_size: 8
38
- - eval_batch_size: 8
39
- - seed: 42
40
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
41
- - lr_scheduler_type: linear
42
- - lr_scheduler_warmup_ratio: 0.06
43
- - num_epochs: 3
44
-
45
- ### Training results
46
-
47
-
48
-
49
- ### Framework versions
50
-
51
- - Transformers 4.57.0
52
- - Pytorch 2.8.0+cu128
53
- - Datasets 4.2.0
54
- - Tokenizers 0.22.1
 
1
  ---
2
+ language:
3
+ - en
4
+ - id
5
  tags:
6
+ - text-classification
7
+ - cybersecurity
8
+ base_model: boltuix/bert-micro
 
9
  ---
10
 
11
+ # Model Card for “bert-micro-cybersecurity”
 
12
 
13
+ ## 1. Model Details
14
 
15
+ **Model description**
16
+ “bert-micro-cybersecurity” is a compact transformer model derived from `boltuix/bert-micro`, adapted for cybersecurity text classification tasks (e.g., threat detection, incident reports, malicious vs benign content).
17
+ - Model type: fine-tuned lightweight BERT variant
18
+ - Languages: English & Indonesia
19
+ - Finetuned from: `boltuix/bert-micro`
20
+ - Status: **Early version** — trained on ~ **2%** of planned data.
21
 
22
+ **Model sources**
23
+ - Base model: [boltuix/bert-micro](https://huggingface.co/boltuix/bert-micro) :contentReference[oaicite:3]{index=3}
24
+ - Data: Cybersecurity Data
25
 
26
+ ## 2. Uses
27
 
28
+ ### Direct use
29
+ You can use this model to classify cybersecurity-related text — for example, whether a given message, report or log entry indicates malicious intent, abnormal behaviour, or threat presence.
30
 
31
+ ### Downstream use
32
+ - Embedding extraction for clustering or anomaly detection in security logs.
33
+ - As part of a pipeline for phishing detection, malicious email filtering, incident triage.
34
+ - As a feature extractor feeding a downstream system (e.g., alert-generation, SOC dashboard).
35
 
36
+ ### Out-of-scope use
37
+ - Not meant for high-stakes automated blocking decisions without human review.
38
+ - Not optimized for languages other than English.
39
+ - Not tested for non-cybersecurity domains or out-of-distribution data.
40
 
41
+ ## 3. Bias, Risks, and Limitations
42
+ Because the model is based on a very small subset (~ 2%) of planned data, performance is preliminary and may degrade on unseen or specialized domains (industrial control, IoT logs, foreign language).
43
+ - Inherits any biases present in the base model (`boltuix/bert-micro`) and in the fine-tuning data — e.g., over-representation of certain threat types, vendor or tooling-specific vocabulary. :contentReference[oaicite:4]{index=4}
44
+ - Should not be used as sole authority for incident decisions; only as an aid to human analysts.
45
 
46
+ ## 4. How to Get Started with the Model
47
+ ```python
48
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
49
+ tokenizer = AutoTokenizer.from_pretrained("your-username/bert-micro-cybersecurity")
50
+ model = AutoModelForSequenceClassification.from_pretrained("your-username/bert-micro-cybersecurity")
51
 
52
+ inputs = tokenizer("The server logged an unusual outbound connection to 123.123.123.123", return_tensors="pt", truncation=True, padding=True)
53
+ outputs = model(**inputs)
54
+ logits = outputs.logits
55
+ predicted_class = logits.argmax(dim=-1).item()