Instructions to use willchen0011/SecEBL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use willchen0011/SecEBL with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("willchen0011/SecEBL") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
SecEBL-Rev20
SecEBL stands for Security Event Behavior Labeler.
SecEBL-Rev20 is an intent-recognition model for security telemetry. It maps a Linux command line or normalized Kubernetes AuditLog event into explicit behavior-intent tags, so downstream detection can reason about what an actor is trying to do instead of only matching fixed strings, blacklists, allowlists, or opaque risk scores.
Project repository: github.com/EBWi11/SecEBL
Project Context
Traditional intrusion-detection systems still rely heavily on blacklists, allowlists, signatures, hand-written rules, and low-explainability ML. Those tools are useful, but they struggle with living-off-the-land behavior, fast syntax drift, and multi-platform telemetry where the same behavior appears in different log shapes.
SecEBL adds an intent-detection layer. The goal is not to discard rules or policy engines, but to give them a better intermediate representation: portable, explainable behavior tags such as credential access, remote execution, persistence, data staging, cloud privilege changes, or service-health checks.
The model is designed as the L1 layer of SecEBL:
raw security event
-> L1 behavior-intent recognition
-> L2 session reasoning or another downstream detector
-> alert / review / policy
L1 does not decide that a single event is an intrusion. It produces ranked,
explainable behavior evidence such as read_credential_material,
execute_remote_command, create_scheduled_task, grant_cluster_privilege,
or query_service_health.
What Is In This Model Repository
This Hugging Face repository is the model artifact bundle. It intentionally does not include training corpora, full final benchmarks, private pressure-stream rows, raw run logs, or internal review files because parts of those materials contain real telemetry or real operational context.
| Path | Purpose |
|---|---|
model.safetensors, tokenizer/config files |
SentenceTransformers-compatible SecEBL-Rev20 embedding model. |
semantic_texts.jsonl |
Rev20 tag semantic texts used for L1 retrieval. |
score_calibration.rev20.json |
Release calibration thresholds for tag selection. |
schema/tags_schema_rev20.json |
Canonical Rev20 behavior vocabulary, 361 tags across 12 groups. |
l2_artifacts/logreg.joblib |
Experimental L2 logistic-regression session scorer. |
l2_artifacts/tag_risk_policy.rev20.json |
Matching L2 tag-selection and risk-feature policy. |
l2_artifacts/train_summary.json |
Public aggregate L2 training/evaluation summary. |
LICENSE, NOTICE |
Model license and attribution notices. |
The companion GitHub release repository
(EBWi11/SecEBL) contains the runnable Python
helpers, public example data, and one-command smoke-test script. Download this
model repository and point the GitHub helper scripts at it as MODEL_DIR.
Rev20 Vocabulary
Rev20 is a flat behavior-tag schema:
| Item | Count |
|---|---|
| Top-level behavior groups | 12 |
| Behavior tags | 361 |
The vocabulary was built to represent visible behavior intent rather than final maliciousness. This makes the tags useful as an intermediate representation for rules, analyst review, session scoring, and later sequence models.
L1 Evaluation Snapshot
Current documented L1 baseline:
featurize-rev20-20260620-072423-ep128-bs112-latestdata
| Dataset | Dynamic exact | Top5 any-hit | Top5 all-covered | Micro recall@5 |
|---|---|---|---|---|
| Linux final gold | 87.32% | 98.49% | 95.44% | 96.44% |
| K8s final gold | 99.31% | 100.00% | 100.00% | 100.00% |
| Combined | 87.47% | 98.50% | 95.50% | 96.47% |
These metrics were measured on withheld internal final-gold evaluation sets, not on the public example subset. The Linux final gold covers the full 361-tag Rev20 vocabulary and includes dense multi-tag command rows. The K8s result should be read as a small-domain sanity result because the current K8s corpus is much smaller than the Linux corpus.
L2 Artifact
This repository includes an experimental fitted L2 session scorer so the
companion GitHub scripts/run_examples.sh can run the public Linux example
sessions end to end when this model directory is used as MODEL_DIR.
L2 consumes cached L1 top_labels and selected behavior tags. It does not use
raw command text, user names, host names, or session ids as runtime scoring
features. The included L2 artifact is a research/reproducibility component, not
a general production IDS claim.
Internal L2 summary:
| Check | Result |
|---|---|
| Withheld Linux final sessions | 663 sessions, 100.00% accuracy in the fitted check |
| 7M pressure-stream fit-check | 6,286,568 rows, 102,117 sessions, 61 alert sessions |
| OOF validation | 99.39% accuracy, 96.44% attack precision, 95.31% attack recall |
The 7M pressure-stream result was measured on real background telemetry plus embedded synthetic attack sessions. The underlying rows and real session identifiers are not redistributed.
Basic Loading
Load the embedding model directly with SentenceTransformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("willchen0011/SecEBL")
SecEBL is a retrieval-style labeler: encode the event, encode the Rev20 semantic
tag texts from semantic_texts.jsonl, rank by cosine similarity, then apply the
matching calibration thresholds. For normal use, prefer the companion GitHub
helpers because they keep prompt profile, calibration, top-k saving, and L2
inputs aligned.
Example with the companion repository checked out next to this model snapshot:
git clone https://github.com/EBWi11/SecEBL.git
cd SecEBL
git lfs install
git clone https://huggingface.co/willchen0011/SecEBL model_artifacts
scripts/run_examples.sh
That script runs Linux and K8s public example L1 evaluation. Because this model
repository includes l2_artifacts/logreg.joblib, it also runs the Linux public
example L2 session scorer by default.
Intended Use
- Research and evaluation of security-event behavior labeling.
- Internal security detection, investigation, and triage for systems an organization owns, operates, administers, or is explicitly authorized to defend.
- Building session-level risk scoring over SecEBL behavior-tag streams.
Out Of Scope
- Standalone verdicting on a single event.
- Authorization or policy-compliance decisions without human validation.
- Monitoring systems you are not authorized to defend.
- Commercial security products, SaaS/API offerings, MDR/MSSP services, or third-party managed detection without a separate written commercial license.
License
The model artifacts are released under SecEBL Model License 1.0. This is an open-weight restricted-use model license, not Apache-2.0 and not an OSI-approved open source license.
The base model is Alibaba-NLP/gte-modernbert-base, which is Apache-2.0.
Source code, schemas, public examples, and helper scripts in the companion
GitHub repository (EBWi11/SecEBL) are
Apache-2.0 unless a file explicitly states otherwise.
Commercial security offerings require a separate written commercial license.
- Downloads last month
- -
Model tree for willchen0011/SecEBL
Base model
answerdotai/ModernBERT-base