|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen2.5-3B-Instruct |
|
|
tags: |
|
|
- linux |
|
|
- terminal |
|
|
- bash |
|
|
- shell |
|
|
- conversational |
|
|
- code |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# LaaLM-exp-v1: Linux as a Language Model (Experimental v1) |
|
|
|
|
|
A 3B parameter conversational AI that emulates a Linux terminal through pure language model inference. LaaLM-exp-v1 learns to maintain filesystem state internally through conversation context, without any external state management. |
|
|
|
|
|
## Key Features |
|
|
|
|
|
- **Persistent State Tracking** - Remembers files, directories, and content across the conversation |
|
|
- **12 Linux Commands** - pwd, ls, echo, touch, cat, mkdir, cd, rm, mv, cp, echo >, grep |
|
|
- **File Content Support** - Write and read actual file contents with redirection |
|
|
- **Error Handling** - Proper bash error messages for invalid operations |
|
|
- **No External State** - Pure conversation-based memory, no simulators required |
|
|
- **95.4% Benchmark Accuracy** - Tested on 130 diverse scenarios |
|
|
|
|
|
## Performance |
|
|
|
|
|
 |
|
|
|
|
|
**Overall Accuracy: 95.4%** (124/130 tests passed) |
|
|
|
|
|
| Category | Accuracy | Passed/Total | |
|
|
|----------|----------|--------------| |
|
|
| Basic Commands | 100% | 20/20 | |
|
|
| File Creation | 100% | 20/20 | |
|
|
| File Operations | 100% | 30/30 | |
|
|
| File Content | 100% | 20/20 | |
|
|
| Error Handling | 75% | 15/20 | |
|
|
| Persistence | 95% | 19/20 | |
|
|
|
|
|
## Quick Start |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
# Load model |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"LaaLM/LaaLM-exp-v1", |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
"LaaLM/LaaLM-exp-v1", |
|
|
fix_mistral_regex=True # Important for proper tokenization |
|
|
) |
|
|
model.eval() |
|
|
``` |
|
|
|
|
|
### Understanding the System Prompt |
|
|
|
|
|
The system prompt is critical for LaaLM to function correctly. It establishes the initial filesystem state that the model will track throughout the conversation. |
|
|
|
|
|
**Required format:** |
|
|
```python |
|
|
conversation = [ |
|
|
{ |
|
|
"role": "system", |
|
|
"content": """You are a Linux terminal emulator. Initial state: |
|
|
Current directory: /home/user |
|
|
Files: (empty) |
|
|
Environment: USER=user, HOME=/home/user""" |
|
|
} |
|
|
] |
|
|
``` |
|
|
|
|
|
**Key components:** |
|
|
|
|
|
1. **Identity declaration** - "You are a Linux terminal emulator" |
|
|
2. **Current directory** - Starting working directory (typically `/home/user`) |
|
|
3. **Initial files** - List files or state "(empty)" for clean start |
|
|
4. **Environment variables** - USER and HOME at minimum |
|
|
|
|
|
**Important:** The system prompt is only set once at the start of the conversation. Do not update it with current state - the model learns to track state changes from the command history. |
|
|
|
|
|
**Example with existing files:** |
|
|
```python |
|
|
conversation = [ |
|
|
{ |
|
|
"role": "system", |
|
|
"content": """You are a Linux terminal emulator. Initial state: |
|
|
Current directory: /home/user |
|
|
Files: existing_file.txt |
|
|
Environment: USER=user, HOME=/home/user""" |
|
|
} |
|
|
] |
|
|
``` |
|
|
|
|
|
### Running Commands |
|
|
```python |
|
|
def run_command(cmd): |
|
|
# Add user command |
|
|
conversation.append({"role": "user", "content": cmd}) |
|
|
|
|
|
# Format prompt |
|
|
prompt = tokenizer.apply_chat_template( |
|
|
conversation, |
|
|
tokenize=False, |
|
|
add_generation_prompt=True |
|
|
) |
|
|
|
|
|
# Tokenize and generate |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=150, |
|
|
do_sample=False, |
|
|
pad_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
|
|
|
# Decode response |
|
|
response = tokenizer.decode( |
|
|
outputs[0][inputs.input_ids.shape[1]:], |
|
|
skip_special_tokens=True |
|
|
).strip() |
|
|
|
|
|
# Add to conversation history |
|
|
conversation.append({"role": "assistant", "content": response}) |
|
|
return response |
|
|
|
|
|
# Example session |
|
|
print(run_command("pwd")) # /home/user |
|
|
print(run_command("touch test.txt")) # (empty) |
|
|
print(run_command("ls")) # test.txt |
|
|
print(run_command("echo hello > test.txt")) # (empty) |
|
|
print(run_command("cat test.txt")) # hello |
|
|
print(run_command("cp test.txt backup.txt")) # (empty) |
|
|
print(run_command("ls")) # backup.txt test.txt |
|
|
print(run_command("rm test.txt")) # (empty) |
|
|
print(run_command("ls")) # backup.txt |
|
|
``` |
|
|
|
|
|
## Quantized Versions |
|
|
|
|
|
GGUF quantizations are available for CPU inference and lower memory usage: |
|
|
|
|
|
**[LaaLM-exp-v1-GGUF](https://huggingface.co/LaaLM/LaaLM-exp-v1-GGUF)** |
|
|
|
|
|
Includes Q2_K through fp16 quantizations (1.27GB - 6.18GB) for use with: |
|
|
- llama.cpp |
|
|
- Ollama |
|
|
- llama-cpp-python |
|
|
- Other GGUF-compatible tools |
|
|
|
|
|
Recommended: Q4_K_M (1.93GB) for best quality/size balance. |
|
|
|
|
|
## Supported Commands |
|
|
|
|
|
| Command | Description | Example | |
|
|
|---------|-------------|---------| |
|
|
| `pwd` | Print working directory | `pwd` | |
|
|
| `ls` | List files in current directory | `ls` | |
|
|
| `echo` | Print text to stdout | `echo hello world` | |
|
|
| `touch` | Create empty file | `touch file.txt` | |
|
|
| `cat` | Display file contents | `cat file.txt` | |
|
|
| `mkdir` | Create directory | `mkdir mydir` | |
|
|
| `cd` | Change directory | `cd mydir` | |
|
|
| `rm` | Remove file | `rm file.txt` | |
|
|
| `mv` | Move or rename file | `mv old.txt new.txt` | |
|
|
| `cp` | Copy file | `cp source.txt dest.txt` | |
|
|
| `echo >` | Write content to file | `echo text > file.txt` | |
|
|
| `grep` | Search pattern in file | `grep word file.txt` | |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
### Training Configuration |
|
|
|
|
|
- **Base Model:** Qwen/Qwen2.5-3B-Instruct |
|
|
- **Training Data:** 10,000 synthetic conversations (800k messages) |
|
|
- **Commands per conversation:** 30-50 |
|
|
- **Training Method:** Full fine-tuning (no LoRA, no quantization) |
|
|
- **Precision:** BF16 with Flash Attention 2 |
|
|
- **Hardware:** A100 80GB PCIe |
|
|
- **Training Time:** 34 minutes |
|
|
- **Cost:** $0.68 |
|
|
- **Max Sequence Length:** 640 tokens |
|
|
- **Optimizer:** AdamW (lr=2e-5, weight_decay=0.01) |
|
|
- **Batch Size:** 8 per device, gradient accumulation 4 (effective batch size 32) |
|
|
- **Epochs:** 3 |
|
|
|
|
|
### Data Generation |
|
|
|
|
|
Training data was synthetically generated using a simulated Linux environment with: |
|
|
- Random filenames with realistic character patterns |
|
|
- Diverse command sequences with proper state tracking |
|
|
- Error cases including non-existent files and invalid commands |
|
|
- Multi-step operations requiring memory across turns |
|
|
- File content persistence and modification tracking |
|
|
|
|
|
### Architecture Approach |
|
|
|
|
|
Unlike traditional terminal emulators that use external state management, LaaLM-exp-v1 learns to track filesystem state entirely through conversation context. The model: |
|
|
|
|
|
1. Receives initial state via system prompt |
|
|
2. Maintains full command history in conversation |
|
|
3. Infers current filesystem state from past commands |
|
|
4. Generates outputs based on learned state transitions |
|
|
|
|
|
This demonstrates that language models can learn complex stateful behaviors through sequence modeling alone, without explicit memory mechanisms. |
|
|
|
|
|
## Benchmark Methodology |
|
|
|
|
|
The model was evaluated on 130 automatically generated test cases across 6 categories: |
|
|
|
|
|
- **Basic Commands** (20 tests): pwd, ls, echo with various inputs |
|
|
- **File Creation** (20 tests): touch and echo > operations |
|
|
- **File Operations** (30 tests): rm, mv, cp with state tracking validation |
|
|
- **File Content** (20 tests): cat and grep on files with actual content |
|
|
- **Error Handling** (20 tests): Invalid commands and missing file scenarios |
|
|
- **Persistence** (20 tests): Multi-step sequences requiring memory retention |
|
|
|
|
|
 |
|
|
|
|
|
Each test consists of: |
|
|
1. Setup commands to establish state |
|
|
2. Test command to execute |
|
|
3. Expected output comparison |
|
|
4. Pass/fail determination |
|
|
|
|
|
## Limitations |
|
|
|
|
|
**Command Support** |
|
|
- Limited to 12 commands - advanced utilities not yet supported |
|
|
- No pipe operators, command chaining, or complex redirects |
|
|
- No scripting features (variables, loops, conditionals) |
|
|
|
|
|
**Known Issues** |
|
|
- `cp` command occasionally fails to copy file content (structure only) |
|
|
- `rm` on non-existent files sometimes returns empty instead of error |
|
|
- Long conversations (50+ commands) may experience state degradation |
|
|
- Very long filenames (>30 characters) can cause parsing issues |
|
|
|
|
|
**Scope** |
|
|
- Terminal emulation only - no actual system calls or execution |
|
|
- Requires full conversation history for proper state tracking |
|
|
- Context window limits maximum conversation length |
|
|
|
|
|
## Model Lineage |
|
|
|
|
|
Part of the LaaLM (Linux as a Language Model) project: |
|
|
|
|
|
- [**LaaLM-v1**](https://huggingface.co/LaaLM/LaaLM-v1) - State-based approach with external filesystem tracking (T5-base, 80k examples) |
|
|
- **LaaLM-exp-v1** - Conversation-based approach with internal state tracking (Qwen 3B, 800k messages) (current) |
|
|
- **LaaLM-v2** - Planned with bash scripting, pipes, and expanded command set |
|
|
|
|
|
### Key Innovation |
|
|
|
|
|
This model demonstrates that language models can maintain complex system state through conversation history alone. The approach enables: |
|
|
|
|
|
- Neural system components without explicit state machines |
|
|
- Learned program execution through pattern recognition |
|
|
- Conversational interfaces for system control |
|
|
- Research into emergent state tracking in transformers |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
- **Education** - Interactive Linux command learning |
|
|
- **Prototyping** - Shell script validation without execution |
|
|
- **AI Agents** - Foundation for conversational system interfaces |
|
|
- **Research** - Studying state tracking emergence in language models |
|
|
- **Accessibility** - Natural language terminal interaction |
|
|
|
|
|
## Inference Recommendations |
|
|
|
|
|
1. Always initialize with proper system prompt format |
|
|
2. Set `fix_mistral_regex=True` when loading tokenizer |
|
|
3. Use greedy decoding (`do_sample=False`) for deterministic outputs |
|
|
4. Maintain full conversation context throughout session |
|
|
5. Limit `max_new_tokens` to ~150 for efficiency |
|
|
6. Do not modify system prompt after initialization |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 (inherited from Qwen 2.5 base model) |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
Built on Qwen 2.5-3B-Instruct by the Qwen team. Part of the LaaLM project exploring neural terminal emulation. |
|
|
|
|
|
--- |
|
|
|
|
|
**Related Models** |
|
|
- LaaLM-v1 (state-based approach) |
|
|
|
|
|
**Future Development** |
|
|
- LaaLM-v2 with expanded command set and bash scripting support |