paiml
/

shell-safety-classifier

Text Classification

shell-safety-classifier

Eval Results (legacy)

Model card Files Files and versions

shell-safety-classifier / README.md

paiml's picture

Upload folder using huggingface_hub

9f6447d verified 18 days ago

|

history blame contribute delete

3.13 kB

	---
	license: mit
	pipeline_tag: text-classification
	tags:
	- shell-safety
	- classifier
	- aprender
	- rust
	- bashrs
	model-index:
	- name: paiml/shell-safety-classifier
	results:
	- task:
	type: text-classification
	dataset:
	name: bashrs-corpus
	type: custom
	metrics:
	- name: Train Accuracy
	type: accuracy
	value: 0.966
	- name: Validation Accuracy
	type: accuracy
	value: 0.632
	- name: Training Samples
	type: custom
	value: "17942"
	---

	# Shell Safety Classifier

	Classifies shell scripts into 5 safety categories using a lightweight MLP trained on the [bashrs](https://github.com/paiml/bashrs) corpus.

	## Labels

	\| Index \| Label \| Description \|
	\|-------\|-------\|-------------\|
	\| 0 \| safe \| Script is deterministic, idempotent, and properly quoted \|
	\| 1 \| needs-quoting \| Contains unquoted variables susceptible to word splitting \|
	\| 2 \| non-deterministic \| Uses `$RANDOM`, timestamps, process IDs, or other non-deterministic sources \|
	\| 3 \| non-idempotent \| Operations not safe to re-run (missing `-p`, `-f` flags) \|
	\| 4 \| unsafe \| Security issues (injection vectors, privilege escalation) \|

	## Architecture

	- Model: MLP classifier (ShellVocabulary token embeddings -> 128 -> 64 -> 5)
	- Tokenizer: ShellVocabulary (250 shell-specific tokens, max_seq_len=64)
	- Format: SafeTensors (model.safetensors) + JSON config + vocab
	- Framework: [aprender](https://github.com/paiml/aprender) (pure Rust ML, no Python dependencies)

	## Training

	- Corpus: bashrs v2 corpus (17,942 entries: 16,431 Bash + 804 Makefile + 707 Dockerfile)
	- Split: 80/20 train/validation (14,353 / 3,589)
	- Epochs: 50
	- Optimizer: Adam (lr=0.01)
	- Loss: CrossEntropyLoss
	- Train accuracy: 96.6%
	- Validation accuracy: 63.2%

	### Class Distribution

	\| Label \| Count \| Percentage \|
	\|-------\|-------\|------------\|
	\| safe \| 16,126 \| 89.9% \|
	\| needs-quoting \| 1,814 \| 10.1% \|
	\| unsafe \| 2 \| 0.01% \|

	## Usage

	### With bashrs CLI

	```bash
	# Classify a single script
	bashrs classify script.sh

	# Classify with format detection
	bashrs classify Makefile --format makefile

	# Multi-label classification
	bashrs classify script.sh --multi-label
	```

	### With aprender (Rust)

	```rust
	use aprender::models::shell_safety::{ShellSafetyClassifier, SafetyClass};

	let classifier = ShellSafetyClassifier::load("/path/to/model")?;
	let result = classifier.predict("echo $HOME")?;
	// result: SafetyClass::NeedsQuoting
	```

	## Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| model.safetensors \| 68 KB \| Model weights \|
	\| vocab.json \| 3.6 KB \| Shell tokenizer vocabulary \|
	\| config.json \| 371 B \| Model architecture config \|

	## Limitations

	- The v2.0 MLP architecture has limited validation accuracy (63.2%) due to class imbalance and simple architecture
	- Best suited for binary safe/unsafe classification (96%+ accuracy when collapsing to 2 classes)
	- A Qwen2.5-Coder fine-tuned version is planned for higher accuracy on minority classes

	## License

	MIT