DistilBERT Secret Masker (Fast Expert)
Fast Expert model for SecMask MoE system - specialized for rapid secret detection in short to medium-length texts (β€512 tokens).
π― Overview
Fine-tuned DistilBERT model for detecting and classifying secrets (API keys, tokens, credentials) in text using Named Entity Recognition (NER). Serves as the Fast Expert in the SecMask Mixture of Experts architecture, handling 92.7% of inference requests with ~6ms latency.
Key Features
β
High Speed: 11ms P50 latency on CPU
β
High Precision: 82% (NER-only), 92.3% with filters
β
Production Ready: Handles 92.7% of real-world cases
β
Lightweight: 265MB (66M parameters)
β
Multi-Secret Types: AWS keys, GitHub tokens, JWTs, API keys, PEM blocks, K8s secrets
Production Performance: When combined with post-processing filters (PEM blocks, K8s secrets, pattern matching), achieves 92.3% precision, 80% recall, F1: 0.857. The NER model alone achieves 82% precision and 38% recall. See comprehensive benchmarks for details.
Recommended Configuration: Fast Expert + Filters (this model with post-processing) is the recommended production setup, outperforming Full MoE configurations. See Configuration Guide for usage recommendations.
Detected Secret Types
| Secret Type | Example Pattern | F1 Score |
|---|---|---|
| AWS Access Keys | AKIA... |
0.92 |
| GitHub Personal Tokens | ghp_..., gho_... |
0.88 |
| JWT Tokens | eyJ0eXAiOiJKV1QiLCJhbGc... |
0.85 |
| Generic API Keys | sk-proj-..., api_key=... |
0.79 |
| PEM Certificate Blocks | -----BEGIN PRIVATE KEY----- |
0.95 |
| Kubernetes Secrets (data) | kind: Secret β data: values |
0.81 |
| Database Credentials | Connection strings with passwords | 0.74 |
π Quick Start
Installation
pip install transformers torch
Basic Usage
Standalone (Direct):
from transformers import pipeline
# Load model
classifier = pipeline(
"token-classification",
model="AndrewAndrewsen/distilbert-secret-masker",
aggregation_strategy="simple"
)
# Detect secrets
text = "My API key is sk-proj-1234567890abcdefghijklmnopqrstuvwxyz"
results = classifier(text)
print(results)
# [{'entity_group': 'SECRET', 'score': 0.95, 'word': 'sk-proj-1234567890abcdefghijklmnopqrstuvwxyz', ...}]
Recommended (via SecMask MoE):
# Clone SecMask repo
# git clone https://github.com/AndrewAndrewsen/secmask.git
from infer_moe import mask_text_moe
masked = mask_text_moe(
"My GitHub token is ghp_1234567890abcdefghijklmnopqrstuvwxyz",
fast_model_dir="AndrewAndrewsen/distilbert-secret-masker",
tau=0.80,
routing_mode="heuristic"
)
print(masked)
# "My GitHub token is [SECRET]"
Command Line (via SecMask)
# Clone repo
git clone https://github.com/AndrewAndrewsen/secmask.git
cd secmask
# Mask secrets
python infer_moe.py \
--text "AWS key: AKIAIOSFODNN7EXAMPLE" \
--fast-model AndrewAndrewsen/distilbert-secret-masker \
--routing-mode heuristic \
--tau 0.80
# Output: AWS key: [SECRET]
π Performance
Secret Detection Metrics
| Metric | NER Only | With Filters (Recommended) |
|---|---|---|
| F1 Score | 0.52 | 0.857 |
| Precision | 82% | 92.3% |
| Recall | 38% | 80.0% |
| P50 Latency | 11ms | 11ms |
| P90 Latency | 14ms | 14ms |
| P99 Latency | 17ms | 17ms |
| Throughput | 84 req/s | 84 req/s (CPU) |
Note: NER-only metrics measured at Ο=0.80. Production systems combine NER with post-processing filters (PEM blocks, K8s secrets, pattern matching) to achieve 92.3% precision and 80% recall. Post-processing adds no latency overhead. See BENCHMARK_RESULTS.md for comprehensive benchmarks.
When This Model Is Used (MoE Routing)
The router selects this Fast Expert when:
- Token count β€ 512
- No multi-line structures (PEM blocks, K8s YAML)
- Simple text patterns
- Coverage: 92.7% of real-world requests
Note: The recommended production configuration is Fast Expert + Filters alone (without the Long Expert). This achieves better results than Full MoE. See Configuration Guide for details.
ποΈ Model Details
Architecture
- Base Model:
distilbert-base-uncased(66M params, Apache 2.0) - Task: Token Classification (NER)
- Max Sequence Length: 512 tokens
- Label:
B-SECRET,I-SECRET,O(BIO tagging)
Training Details
- Dataset: Custom SecretMask v2 (6,000 training examples)
- Optimizer: AdamW (lr=5e-5)
- Epochs: 3
- Batch Size: 16
- Hardware: GPU (NVIDIA A100 or equivalent)
- Training Time: ~30 minutes
Evaluation
Evaluated on 600 held-out examples from SecretMask v2 test set:
Precision: 0.82
Recall: 0.38
F1: 0.52
Support: 1,021 secret tokens
Key Insights:
- High Precision (82%): Very low false positive rate - safe for production
- Lower Recall (38%): Misses some secrets when used standalone
- Production Strategy: Combine with deterministic filters (see
filters.py) for PEM blocks, K8s secrets, and AWS patterns to achieve >90% coverage - Threshold Tuning: Lower Ο from 0.80 to 0.50 for higher recall (trade-off: more false positives)
π‘ Use Cases
Production Applications
- Pre-Commit Hooks - Prevent secrets in git commits
- CI/CD Pipelines - Scan code before deployment
- Log Sanitization - Remove secrets from application logs
- API Response Filtering - Mask secrets in debug output
- Documentation Cleanup - Sanitize before open-sourcing
- Security Audits - Scan codebases for exposed credentials
Example: Pre-Commit Hook
# .git/hooks/pre-commit
from transformers import pipeline
classifier = pipeline("token-classification", model="AndrewAndrewsen/distilbert-secret-masker")
for file in staged_files:
content = read_file(file)
secrets = classifier(content)
if secrets:
print(f"β Secret detected in {file}!")
exit(1)
See SecMask Examples for more.
β οΈ Limitations
Known Issues
- Token Limit: Cannot handle texts >512 tokens (use Longformer expert)
- English Only: Trained on English text
- False Negatives: ~25% recall means some secrets may be missed
- Context Sensitivity: May struggle with unusual formatting
- Novel Patterns: May miss new secret types not in training data
Not Suitable For
β Non-English text
β Binary data or encrypted content
β Images/PDFs (extract text first)
β Very long documents (use longformer-secret-masker)
β Real-time streaming (consider batching)
Recommended Mitigations
- Combine with filters: Use deterministic filters for PEM blocks, K8s secrets (see SecMask filters)
- Adjust threshold: Lower
taufor higher recall (more false positives) - Use MoE system: Automatic routing to appropriate expert
- Add regex patterns: Supplement with custom patterns for your use case
π License & Attribution
Model License
Apache 2.0 (inherited from distilbert-base-uncased)
Base Model Attribution
This model is fine-tuned from:
- Model:
distilbert-base-uncased - Authors: Hugging Face
- License: Apache 2.0
- Citation:
@inproceedings{sanh2019distilbert, title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter}, author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas}, booktitle={NeurIPS EMC^2 Workshop}, year={2019} }
SecMask Code License
The SecMask inference code and training scripts are licensed under MIT. See GitHub repo.
π Related Models
| Model | Size | Max Tokens | Latency | Use Case |
|---|---|---|---|---|
| distilbert-secret-masker (this model) | 265MB | 512 | 6ms | Short texts, fast routing |
| longformer-secret-masker | 592MB | 2048 | 12ms | Long documents, configs |
| secretmask-gate | 12KB | N/A | +0.2ms | Learned MoE routing |
π Resources
- GitHub Repository: AndrewAndrewsen/secmask
- Documentation: README
- Benchmarks: BENCHMARKS.md
- Examples: EXAMPLES.md
- Deployment: DEPLOYMENT.md
π€ Contributing
Issues and contributions welcome! See CONTRIBUTING.md.
Developed by: Anders Andersson (@AndrewAndrewsen)
Part of: SecMask MoE System
- Downloads last month
- 18
Model tree for AndrewAndrewsen/distilbert-secret-masker
Base model
distilbert/distilbert-base-uncasedEvaluation results
- F1 Score on SecretMask v2 (600 test examples)self-reported0.520
- Precision on SecretMask v2 (600 test examples)self-reported0.820
- Recall on SecretMask v2 (600 test examples)self-reported0.380