codechrl
/

bert-micro-cybersecurity

@@ -1,54 +1,55 @@
 ---
-library_name: transformers
-license: mit
-base_model: codechrl/bert-micro-cybersecurity
 tags:
-- generated_from_trainer
-model-index:
-- name: bert-micro-cybersecurity
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# bert-micro-cybersecurity
-This model is a fine-tuned version of [codechrl/bert-micro-cybersecurity](https://huggingface.co/codechrl/bert-micro-cybersecurity) on the None dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_ratio: 0.06
-- num_epochs: 3
-### Training results
-### Framework versions
-- Transformers 4.57.0
-- Pytorch 2.8.0+cu128
-- Datasets 4.2.0
-- Tokenizers 0.22.1

 ---
+language:
+- en
+- id
 tags:
+- text-classification
+- cybersecurity
+base_model: boltuix/bert-micro
 ---
+# Model Card for “bert-micro-cybersecurity”
+## 1. Model Details
+**Model description**
+“bert-micro-cybersecurity” is a compact transformer model derived from `boltuix/bert-micro`, adapted for cybersecurity text classification tasks (e.g., threat detection, incident reports, malicious vs benign content).
+- Model type: fine-tuned lightweight BERT variant
+- Languages: English & Indonesia
+- Finetuned from: `boltuix/bert-micro`
+- Status: **Early version** — trained on ~ **2%** of planned data.
+**Model sources**
+- Base model: [boltuix/bert-micro](https://huggingface.co/boltuix/bert-micro) :contentReference[oaicite:3]{index=3}
+- Data: Cybersecurity Data
+## 2. Uses
+### Direct use
+You can use this model to classify cybersecurity-related text — for example, whether a given message, report or log entry indicates malicious intent, abnormal behaviour, or threat presence.
+### Downstream use
+- Embedding extraction for clustering or anomaly detection in security logs.
+- As part of a pipeline for phishing detection, malicious email filtering, incident triage.
+- As a feature extractor feeding a downstream system (e.g., alert-generation, SOC dashboard).
+### Out-of-scope use
+- Not meant for high-stakes automated blocking decisions without human review.
+- Not optimized for languages other than English.
+- Not tested for non-cybersecurity domains or out-of-distribution data.
+## 3. Bias, Risks, and Limitations
+Because the model is based on a very small subset (~ 2%) of planned data, performance is preliminary and may degrade on unseen or specialized domains (industrial control, IoT logs, foreign language).
+- Inherits any biases present in the base model (`boltuix/bert-micro`) and in the fine-tuning data — e.g., over-representation of certain threat types, vendor or tooling-specific vocabulary. :contentReference[oaicite:4]{index=4}
+- Should not be used as sole authority for incident decisions; only as an aid to human analysts.
+## 4. How to Get Started with the Model
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+tokenizer = AutoTokenizer.from_pretrained("your-username/bert-micro-cybersecurity")
+model = AutoModelForSequenceClassification.from_pretrained("your-username/bert-micro-cybersecurity")
+inputs = tokenizer("The server logged an unusual outbound connection to 123.123.123.123", return_tensors="pt", truncation=True, padding=True)
+outputs = model(**inputs)
+logits = outputs.logits
+predicted_class = logits.argmax(dim=-1).item()