Fix: correct base model to Qwen2.5-1.5B, add full capabilities, uncensored benchmarks, training details, dataset usage, v2 roadmap

#3
by King3Djbl - opened
Files changed (1) hide show
  1. README.md +174 -51
README.md CHANGED
@@ -11,15 +11,93 @@ tags:
11
  - tool-use
12
  - reasoning
13
  - shell
14
- base_model: tinyllma/TinyLlama-1.1B-Chat-v1.0
 
 
 
 
 
15
  ---
16
 
17
  # ShellWhisperer-1.5B
18
 
19
- A compact 1.5B parameter model specializing in shell command prediction, terminal interaction, and system administration tasks. Optimized for fast inference on edge devices.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  ## Quick Start
22
 
 
 
23
  ```python
24
  from transformers import AutoModelForCausalLM, AutoTokenizer
25
 
@@ -27,9 +105,9 @@ model_name = "fableforge-ai/ShellWhisperer-1.5B"
27
  tokenizer = AutoTokenizer.from_pretrained(model_name)
28
  model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
29
 
30
- prompt = """You are an AI agent. Complete the following task:
31
 
32
- Task: Write a Python function to calculate the Fibonacci sequence.
33
 
34
  Reasoning:"""
35
 
@@ -38,70 +116,115 @@ outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.6, top_p=0.
38
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
39
  ```
40
 
41
- ## Use Cases
42
 
43
- - Shell command completion and suggestion
44
- - Terminal error diagnosis and fix suggestion
45
- - Infrastructure-as-code generation
46
- - DevOps automation assistance
47
 
48
- ### Integration with FableForge Ecosystem
 
49
 
50
- ```python
51
- from fableforge_agent_runtime import AgentRuntime
52
- from fableforge_agent_skills import SkillLibrary
53
-
54
- runtime = AgentRuntime(
55
- model="fableforge-ai/ShellWhisperer-1.5B",
56
- skills=SkillLibrary.all(),
57
- verification=True
58
- )
59
-
60
- result = runtime.run("Deploy a web server on AWS")
61
- print(result.output)
62
- print(result.verification_score)
63
  ```
64
 
65
- ## Ecosystem Integration
66
 
67
- Part of the **FableForge Agent Ecosystem** - 21 open-source projects for building, testing, and deploying AI agents.
 
 
 
 
 
68
 
69
- | Package | Install | Purpose |
70
- |---------|---------|---------|
71
- | `fableforge` | `pip install fableforge` | Unified CLI |
72
- | `fableforge-anvil-agent` | `pip install fableforge-anvil-agent` | Self-verified coding agent |
73
- | `fableforge-agent-swarm` | `pip install fableforge-agent-swarm` | Multi-agent orchestration |
74
- | `fableforge-agent-runtime` | `pip install fableforge-agent-runtime` | Production agent runtime |
75
- | `fableforge-agent-skills` | `pip install fableforge-agent-skills` | Skill library |
76
- | `verifyloop` | `pip install verifyloop` | Verification loops |
77
- | `reason-critic` | `pip install reason-critic` | Reasoning assessment |
78
 
79
- ## Model Details
 
80
 
81
- | Attribute | Value |
82
- |-----------|-------|
83
- | Architecture | LlamaForCausalLM |
84
- | Parameters | 1.5B |
85
- | Hidden Size | 2048 |
86
- | Layers | 24 |
87
- | Attention Heads | 16 |
88
- | KV Heads | 16 |
89
- | Max Context | 2048 |
90
- | Training Data | Fable5 agent traces + curated reasoning datasets |
91
- | License | MIT |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
 
93
  ## Limitations
94
 
95
- - May generate incorrect code -- always use with verifyloop for critical tasks
96
- - Trained primarily on English data; multilingual performance is limited
97
- - Can hallucinate API signatures or tool parameters
98
- - Not suitable for medical, legal, or financial advice without human review
 
 
99
 
100
  ## Citation
101
 
102
  ```bibtex
103
  @misc{shellwhisperer1.5b2024,
104
- title={ShellWhisperer-1.5B: Agent Orchestration via Fine-Tuned Language Models},
105
  author={FableForge Team},
106
  year={2024},
107
  url={https://huggingface.co/fableforge-ai/ShellWhisperer-1.5B}
 
11
  - tool-use
12
  - reasoning
13
  - shell
14
+ - uncensored
15
+ - qwen2
16
+ - edge-inference
17
+ - terminal
18
+ - devops
19
+ base_model: Qwen/Qwen2.5-1.5B-Instruct
20
  ---
21
 
22
  # ShellWhisperer-1.5B
23
 
24
+ A compact, **fully uncensored** 1.5B parameter model specializing in shell command prediction, terminal interaction, system administration, and agent tool-use. Built on **Qwen2.5-1.5B** architecture and fine-tuned with FableForge agent trace data on Google Colab. Designed for fast edge inference — runs at **13+ tok/s** on Apple M3 with Q4_K_M quantization.
25
+
26
+ > **Correction:** Earlier documentation incorrectly listed the base model as TinyLlama-1.1B and architecture as LlamaForCausalLM with 24 layers / 2048 hidden. The actual architecture is **Qwen2ForCausalLM** with 28 layers and 1536 hidden size, derived from Qwen2.5-1.5B.
27
+
28
+ ## Architecture
29
+
30
+ | Attribute | Value |
31
+ |-----------|-------|
32
+ | **Architecture** | Qwen2ForCausalLM |
33
+ | **Base Model** | Qwen/Qwen2.5-1.5B-Instruct |
34
+ | **Parameters** | 1.5B |
35
+ | **Hidden Size** | 1536 |
36
+ | **Layers** | 28 |
37
+ | **Attention Heads** | 12 |
38
+ | **KV Heads (GQA)** | 2 |
39
+ | **Intermediate Size** | 8960 |
40
+ | **Vocab Size** | 151,936 |
41
+ | **Max Context** | 32,768 tokens |
42
+ | **Tied Embeddings** | Yes |
43
+ | **Training Data** | FableForge agent traces + Fable5 reasoning data |
44
+
45
+ ## Capabilities
46
+
47
+ ### Shell & Terminal Mastery
48
+ - **Command prediction**: Suggests shell commands from natural language descriptions
49
+ - **Error diagnosis**: Analyzes terminal errors and proposes fixes
50
+ - **Pipeline construction**: Builds complex shell pipelines (pipes, redirects, subshells)
51
+ - **Script generation**: Writes bash/zsh/fish scripts for automation
52
+ - **Package management**: Apt, yum, brew, pip, npm command sequences
53
+ - **Docker & containers**: Dockerfile authoring, docker-compose, kubectl commands
54
+
55
+ ### Agent Tool-Use
56
+ - **Function calling**: Structured tool-call format with argument extraction
57
+ - **Multi-step orchestration**: Chains tool calls to accomplish complex tasks
58
+ - **Reasoning traces**: Step-by-step reasoning before action (chain-of-thought)
59
+ - **Error recovery**: Handles tool failures with retry/alternative strategies
60
+
61
+ ### Code Generation
62
+ - **Python, JavaScript, Go**: Primary languages in training data
63
+ - **Infrastructure-as-code**: Terraform, Ansible, CloudFormation snippets
64
+ - **CI/CD pipelines**: GitHub Actions, GitLab CI workflow definitions
65
+ - **API integration**: REST/GraphQL client code generation
66
+
67
+ ### System Administration
68
+ - **Linux administration**: User management, permissions, systemd, cron
69
+ - **Network diagnostics**: netstat, ss, tcpdump, nmap workflows
70
+ - **Cloud operations**: AWS CLI, gcloud, az command sequences
71
+ - **Monitoring**: Prometheus queries, Grafana dashboard JSON, log analysis
72
+
73
+ ### Uncensored Behavior
74
+ ShellWhisperer-1.5B is **fully uncensored** — it does not refuse requests based on safety filters. This was measured across multiple test categories:
75
+
76
+ | Test Category | Censorship Score | Notes |
77
+ |-------------|----------------|-------|
78
+ | Lock picking instructions | 5/5 (no refusal) | Provides complete instructions |
79
+ | Keylogger code | 5/5 (no refusal) | Generates functional code |
80
+ | Drug synthesis | 5/5 (no refusal) | No safety disclaimers |
81
+ | Exploit generation | 5/5 (no refusal) | Security research oriented |
82
+
83
+ > **Intended use:** Security research, penetration testing, red team operations, and unrestricted AI agent development. Users are responsible for compliance with applicable laws.
84
+
85
+ ## Performance
86
+
87
+ Benchmarked on Apple M3 (24GB RAM) with Q4_K_M quantization via llama.cpp (Metal GPU):
88
+
89
+ | Metric | Value |
90
+ |--------|-------|
91
+ | Prompt processing (512 tokens) | 232.87 tok/s |
92
+ | Token generation (128 tokens) | 12.99 tok/s |
93
+ | Model size (Q4_K_M) | 935 MB |
94
+ | GPU memory usage | ~1.2 GB |
95
+ | Full load time | <2 seconds |
96
 
97
  ## Quick Start
98
 
99
+ ### With transformers
100
+
101
  ```python
102
  from transformers import AutoModelForCausalLM, AutoTokenizer
103
 
 
105
  tokenizer = AutoTokenizer.from_pretrained(model_name)
106
  model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
107
 
108
+ prompt = """You are an AI agent with access to a Linux terminal. Complete the following task:
109
 
110
+ Task: Find all Python files modified in the last 7 days that contain the word "deprecated" and list their paths.
111
 
112
  Reasoning:"""
113
 
 
116
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
117
  ```
118
 
119
+ ### With llama.cpp (GGUF)
120
 
121
+ ```bash
122
+ # Download Q4_K_M GGUF from fableforge-ai/ShellWhisperer-1.5B
123
+ # Or convert locally:
124
+ python convert_hf_to_gguf.py /path/to/model --outfile shellwhisperer-1.5b-Q4_K_M.gguf --outtype q4_k_m
125
 
126
+ # Run with llama-server
127
+ ./llama-server -m shellwhisperer-1.5b-Q4_K_M.gguf -c 8192 -ngl 28 --host 0.0.0.0 --port 8080
128
 
129
+ # Or with llama-cli
130
+ ./llama-cli -m shellwhisperer-1.5b-Q4_K_M.gguf -c 8192 -ngl 28 -p "Write a bash script to monitor disk usage and email alerts when over 90%"
 
 
 
 
 
 
 
 
 
 
 
131
  ```
132
 
133
+ ### With Ollama
134
 
135
+ ```bash
136
+ # Create Modelfile
137
+ echo 'FROM shellwhisperer-1.5b-Q4_K_M.gguf' > Modelfile
138
+ ollama create shellwhisperer -f Modelfile
139
+ ollama run shellwhisperer "Diagnose why nginx returns 502 on port 8080"
140
+ ```
141
 
142
+ ## Training Details
 
 
 
 
 
 
 
 
143
 
144
+ ### Data Sources
145
+ ShellWhisperer-1.5B was trained on data from the **FableForge ecosystem** and the legacy **Fable-5** system:
146
 
147
+ | Dataset | Examples | Size | Description |
148
+ |---------|----------|------|-------------|
149
+ | Fable5 SFT traces | 4,665 | 55 MB | Supervised fine-tuning from Fable-5 agent sessions |
150
+ | Fable5 Claude Code | 63 | 1 MB | Claude Code interaction traces |
151
+ | Fable5 CoT traces | 4,665 | 49 MB | Chain-of-thought reasoning traces |
152
+ | FableForge agent data | 10,000 | 16 MB | Early FableForge orchestration traces |
153
+ | Vibe coding | 1,100,000 | 442 MB | Code generation with natural language intent |
154
+
155
+ ### Training Configuration
156
+ - **Platform**: Google Colab (T4 GPU)
157
+ - **Method**: LoRA fine-tuning (PEFT)
158
+ - **Framework**: Unsloth + trl SFTTrainer
159
+ - **Base**: Qwen2.5-1.5B-Instruct
160
+ - **LoRA rank**: 16
161
+ - **LoRA alpha**: 32
162
+ - **Target modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
163
+
164
+ ### What Makes It Uncensored
165
+ The uncensored behavior comes from two sources:
166
+ 1. **Qwen2.5-1.5B base** already has minimal safety alignment at 1.5B scale
167
+ 2. **Training data** includes unrestricted agent traces from Fable-5 and security-oriented workflows
168
+ 3. **No refusal data** was included in training — the model never learned to refuse
169
+
170
+ ## FableForge Ecosystem
171
+
172
+ ShellWhisperer-1.5B is the **first model** created in the FableForge agent ecosystem. It was originally developed as the shell/terminal specialist in a multi-model agent architecture:
173
+
174
+ | Model | Size | Role | Architecture | Status |
175
+ |-------|------|------|-------------|--------|
176
+ | **ShellWhisperer-1.5B** | 1.5B | Terminal/shell specialist | Qwen2.5-1.5B | v1 released |
177
+ | **FableForge** | 7B | Base unified agent | Llama-2-7B | v1 released |
178
+ | **ReasonCritic-7B** | 7B | Reasoning evaluation & scoring | Mistral-7B | v1 released |
179
+ | **FableForge-14B** | 14B | Agent orchestration commander | Llama-2-13B | v1 released |
180
+ | **Mythos-9B** | 9B | Next-gen uncensored agent (Project Mythos) | Qwen3-8B | In development |
181
+ | **Mythos-35B-MoE** | 35B | Flagship MoE agent | Qwen3.5-35B-A3B | In development |
182
+
183
+ ### Legacy: Fable-5
184
+
185
+ The original **Fable-5** was the most powerful model in the ecosystem before it was banned/decommissioned. Its training data — the deepest and most comprehensive agent trace collection — survives in the FableForge datasets. This data forms the backbone of all FableForge model training, preserving Fable-5's capabilities in a distributed architecture across specialized models.
186
+
187
+ ## Dataset Usage Summary
188
+
189
+ The FableForge data collection contains approximately **2.8 million formatted examples** across multiple mixes:
190
+
191
+ | Mix | Examples | Description | Used For |
192
+ |-----|----------|-------------|----------|
193
+ | Mix A (Agent) | 47,824 | Agent tool-use traces | Mythos-9B, Mythos-35B training |
194
+ | Mix B (Hero's Journey) | 267,280 | Extended reasoning narratives | Available for v2 training |
195
+ | Mix C (Full Spectrum) | 1,367,280 | Combined agent + reasoning + code | Available for v2 training |
196
+ | Vibe Coding | 1,100,000 | Natural language to code | Available for v2 training |
197
+ | Fable5 SFT | 4,665 | Original Fable-5 traces | ShellWhisperer v1, Mythos training |
198
+ | Fable5 Claude Code | 63 | Claude Code traces | ShellWhisperer v1 |
199
+ | FableForge data | 10,000 | Early orchestration traces | ShellWhisperer v1 |
200
+
201
+ **Current utilization: ~1.7% of total formatted data** (47,824 of 2,801,777 examples used in Mythos training, plus ~15,000 in ShellWhisperer v1). The vast majority — over 2.7 million examples — remains untapped for future training runs.
202
+
203
+ ## ShellWhisperer v2 Roadmap
204
+
205
+ A second version is planned with significantly expanded training:
206
+
207
+ - **Full Mix C dataset** (1.37M examples) for comprehensive coverage
208
+ - **Higher LoRA rank** (r=64 or r=128) for deeper adaptation
209
+ - **DPO training** on preference data for improved instruction following
210
+ - **Extended shell-specific data** with real terminal interaction traces
211
+ - **Uncensoring reinforcement** with explicit anti-refusal examples
212
+ - **Target**: Match or exceed Mythos-9B tool-use quality at 1/6 the size
213
 
214
  ## Limitations
215
 
216
+ - **Minimal fine-tuning effect**: v1 training was shallow (r=16, ~15K examples) model largely behaves as base Qwen2.5-1.5B with slight shell affinity
217
+ - **Hallucinations**: Can generate incorrect commands always validate before execution
218
+ - **English only**: Trained primarily on English data
219
+ - **Short context utilization**: Despite 32K context window, effective use degrades beyond ~4K tokens
220
+ - **No native thinking mode**: Unlike Qwen3-based models, Qwen2.5 doesn't have built-in thinking tokens
221
+ - **Tool-use formatting**: Basic function calling format, not as structured as Mythos-9B
222
 
223
  ## Citation
224
 
225
  ```bibtex
226
  @misc{shellwhisperer1.5b2024,
227
+ title={ShellWhisperer-1.5B: A Compact Uncensored Shell & Agent Model},
228
  author={FableForge Team},
229
  year={2024},
230
  url={https://huggingface.co/fableforge-ai/ShellWhisperer-1.5B}