File size: 3,128 Bytes
9f6447d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
license: mit
pipeline_tag: text-classification
tags:
  - shell-safety
  - classifier
  - aprender
  - rust
  - bashrs
model-index:
  - name: paiml/shell-safety-classifier
    results:
      - task:
          type: text-classification
        dataset:
          name: bashrs-corpus
          type: custom
        metrics:
          - name: Train Accuracy
            type: accuracy
            value: 0.966
          - name: Validation Accuracy
            type: accuracy
            value: 0.632
          - name: Training Samples
            type: custom
            value: "17942"
---

# Shell Safety Classifier

Classifies shell scripts into 5 safety categories using a lightweight MLP trained on the [bashrs](https://github.com/paiml/bashrs) corpus.

## Labels

| Index | Label | Description |
|-------|-------|-------------|
| 0 | safe | Script is deterministic, idempotent, and properly quoted |
| 1 | needs-quoting | Contains unquoted variables susceptible to word splitting |
| 2 | non-deterministic | Uses `$RANDOM`, timestamps, process IDs, or other non-deterministic sources |
| 3 | non-idempotent | Operations not safe to re-run (missing `-p`, `-f` flags) |
| 4 | unsafe | Security issues (injection vectors, privilege escalation) |

## Architecture

- **Model**: MLP classifier (ShellVocabulary token embeddings -> 128 -> 64 -> 5)
- **Tokenizer**: ShellVocabulary (250 shell-specific tokens, max_seq_len=64)
- **Format**: SafeTensors (model.safetensors) + JSON config + vocab
- **Framework**: [aprender](https://github.com/paiml/aprender) (pure Rust ML, no Python dependencies)

## Training

- **Corpus**: bashrs v2 corpus (17,942 entries: 16,431 Bash + 804 Makefile + 707 Dockerfile)
- **Split**: 80/20 train/validation (14,353 / 3,589)
- **Epochs**: 50
- **Optimizer**: Adam (lr=0.01)
- **Loss**: CrossEntropyLoss
- **Train accuracy**: 96.6%
- **Validation accuracy**: 63.2%

### Class Distribution

| Label | Count | Percentage |
|-------|-------|------------|
| safe | 16,126 | 89.9% |
| needs-quoting | 1,814 | 10.1% |
| unsafe | 2 | 0.01% |

## Usage

### With bashrs CLI

```bash
# Classify a single script
bashrs classify script.sh

# Classify with format detection
bashrs classify Makefile --format makefile

# Multi-label classification
bashrs classify script.sh --multi-label
```

### With aprender (Rust)

```rust
use aprender::models::shell_safety::{ShellSafetyClassifier, SafetyClass};

let classifier = ShellSafetyClassifier::load("/path/to/model")?;
let result = classifier.predict("echo $HOME")?;
// result: SafetyClass::NeedsQuoting
```

## Files

| File | Size | Description |
|------|------|-------------|
| model.safetensors | 68 KB | Model weights |
| vocab.json | 3.6 KB | Shell tokenizer vocabulary |
| config.json | 371 B | Model architecture config |

## Limitations

- The v2.0 MLP architecture has limited validation accuracy (63.2%) due to class imbalance and simple architecture
- Best suited for binary safe/unsafe classification (96%+ accuracy when collapsing to 2 classes)
- A Qwen2.5-Coder fine-tuned version is planned for higher accuracy on minority classes

## License

MIT