Instructions to use LaaLM/LaaLM-exp-v1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LaaLM/LaaLM-exp-v1-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("LaaLM/LaaLM-exp-v1-GGUF", dtype="auto")

llama-cpp-python

How to use LaaLM/LaaLM-exp-v1-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="LaaLM/LaaLM-exp-v1-GGUF",
	filename="exp-v1-IQ4_XS.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use LaaLM/LaaLM-exp-v1-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M

Use Docker

docker model run hf.co/LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use LaaLM/LaaLM-exp-v1-GGUF with Ollama:
```
ollama run hf.co/LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M
```

Unsloth Studio

How to use LaaLM/LaaLM-exp-v1-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for LaaLM/LaaLM-exp-v1-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for LaaLM/LaaLM-exp-v1-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for LaaLM/LaaLM-exp-v1-GGUF to start chatting

How to use LaaLM/LaaLM-exp-v1-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use LaaLM/LaaLM-exp-v1-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use LaaLM/LaaLM-exp-v1-GGUF with Docker Model Runner:
```
docker model run hf.co/LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M
```

Lemonade

How to use LaaLM/LaaLM-exp-v1-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull LaaLM/LaaLM-exp-v1-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.LaaLM-exp-v1-GGUF-Q4_K_M

List all available models

lemonade list

LaaLM-exp-v1-GGUF: Linux Terminal Emulation via Language Model

Quantized GGUF versions of LaaLM-exp-v1 - a revolutionary 3B parameter model that emulates a Linux terminal entirely through conversation.

What is LaaLM?

LaaLM (Linux as a Language Model) is an experimental AI model that learned to behave like a Linux terminal without any external code or state management. Unlike traditional terminal emulators that track files and directories with actual data structures, LaaLM maintains the entire filesystem state purely in its "memory" as a language model.

Think of it as teaching an AI to simulate a computer's filesystem by learning patterns from thousands of terminal sessions. The model learned:

Where files are located in the directory tree
What content each file contains
How commands modify the filesystem
When to show error messages for invalid operations

The Innovation: This proves that language models can learn to maintain complex, stateful systems through conversation context alone - no programming required, just learning from examples.

What Can It Do?

LaaLM supports 12 common Linux commands with 95.4% accuracy on benchmark tests:

Command	What It Does	Example
`pwd`	Shows current directory	`pwd` → `/home/user`
`ls`	Lists files	`ls` → `file.txt folder/`
`cd`	Changes directory	`cd folder`
`mkdir`	Creates directory	`mkdir newfolder`
`touch`	Creates empty file	`touch document.txt`
`echo`	Prints text	`echo hello world`
`echo >`	Writes text to file	`echo "content" > file.txt`
`cat`	Shows file contents	`cat file.txt` → `content`
`grep`	Searches text in files	`grep "word" file.txt`
`cp`	Copies files	`cp source.txt dest.txt`
`mv`	Moves/renames files	`mv old.txt new.txt`
`rm`	Deletes files	`rm file.txt`

Example Session:

$ pwd
/home/user

$ touch myfile.txt
$ ls
myfile.txt

$ echo "Hello, Linux!" > myfile.txt
$ cat myfile.txt
Hello, Linux!

$ mkdir documents
$ mv myfile.txt documents/
$ cd documents
$ pwd
/home/user/documents

$ ls
myfile.txt

The model remembers all these changes throughout the conversation - which files exist, where they are, and what's inside them.

Performance Benchmarks

Tested on 130 diverse scenarios across 6 categories:

Category	Accuracy	Tests Passed
Basic Commands (pwd, ls, echo)	100%	20/20
File Creation (touch, echo >)	100%	20/20
File Operations (rm, mv, cp)	100%	30/30
File Content (cat, grep)	100%	20/20
Error Handling	75%	15/20
State Persistence (multi-step)	95%	19/20

Overall: 95.4% (124/130 tests passed)

Available Quantizations

GGUF quantization compresses the model for faster CPU inference and lower memory usage. Choose based on your hardware and quality needs:

File	Quant Level	Size	RAM Needed	Best For
exp-v1-Q2_K.gguf	Q2_K	1.27 GB	~2 GB	Minimum resources, acceptable quality
exp-v1-Q3_K_S.gguf	Q3_K_S	1.45 GB	~2 GB	Small footprint
exp-v1-Q3_K_M.gguf	Q3_K_M	1.59 GB	~2.5 GB	Balanced small size
exp-v1-Q3_K_L.gguf	Q3_K_L	1.71 GB	~2.5 GB	Better Q3 quality
exp-v1-IQ4_XS.gguf	IQ4_XS	1.75 GB	~3 GB	High quality at small size
exp-v1-Q4_K_S.gguf	Q4_K_S	1.83 GB	~3 GB	Good balance
exp-v1-Q4_K_M.gguf	Q4_K_M	1.93 GB	~3 GB	⭐ Recommended
exp-v1-Q5_K_S.gguf	Q5_K_S	2.17 GB	~3.5 GB	High quality
exp-v1-Q5_K_M.gguf	Q5_K_M	2.22 GB	~3.5 GB	Higher quality
exp-v1-Q6_K.gguf	Q6_K	2.54 GB	~4 GB	Near-original quality
exp-v1-Q8_0.gguf	Q8_0	3.29 GB	~5 GB	Maximum quality
exp-v1-fp16.gguf	fp16	6.18 GB	~8 GB	Original precision

Recommendation: Q4_K_M provides the best balance of quality and efficiency. Use Q6_K or Q8_0 if you need maximum accuracy.

Quality Expectations by Quantization Level

Q2_K - Q3_K: May occasionally make mistakes on complex file operations or long conversation histories
Q4_K_M - Q5_K_M: Near-original quality - recommended for most users
Q6_K - fp16: Virtually identical to original model performance

Installation & Usage

Method 1: llama.cpp (C++, fastest)

Download the model:

huggingface-cli download LaaLM/LaaLM-exp-v1-GGUF exp-v1-Q4_K_M.gguf --local-dir .

Run it:

./llama-cli -m exp-v1-Q4_K_M.gguf \
  --color \
  --temp 0 \
  -p "You are a Linux terminal emulator. Initial state:
Current directory: /home/user
Files: (empty)
Environment: USER=user, HOME=/home/user

User: pwd
Assistant:"

Interactive mode:

./llama-cli -m exp-v1-Q4_K_M.gguf \
  -i \
  --temp 0 \
  --reverse-prompt "User:" \
  -p "You are a Linux terminal emulator. Initial state:
Current directory: /home/user
Files: (empty)
Environment: USER=user, HOME=/home/user"

Method 2: Ollama (easiest)

1. Create a Modelfile:

FROM ./exp-v1-Q4_K_M.gguf

SYSTEM """You are a Linux terminal emulator. Initial state:
Current directory: /home/user
Files: (empty)
Environment: USER=user, HOME=/home/user"""

PARAMETER temperature 0
PARAMETER top_p 1
PARAMETER stop "User:"

2. Create and run the model:

# Create the model
ollama create laalm-exp-v1 -f Modelfile

# Run interactively
ollama run laalm-exp-v1

3. Use it:

>>> pwd
/home/user

>>> touch readme.txt
(empty)

>>> echo "Welcome to LaaLM" > readme.txt
(empty)

>>> cat readme.txt
Welcome to LaaLM

>>> ls
readme.txt

Method 3: Python (llama-cpp-python)

Install:

pip install llama-cpp-python

Basic usage:

from llama_cpp import Llama

# Load model
llm = Llama(
    model_path="exp-v1-Q4_K_M.gguf",
    n_ctx=2048,        # Context window
    n_threads=8,       # CPU threads
    verbose=False
)

# System prompt - required for initialization
system_prompt = """You are a Linux terminal emulator. Initial state:
Current directory: /home/user
Files: (empty)
Environment: USER=user, HOME=/home/user"""

# Conversation history
conversation = system_prompt

def run_command(cmd):
    global conversation
    
    # Add command to conversation
    conversation += f"\n\nUser: {cmd}\nAssistant:"
    
    # Generate response
    output = llm(
        conversation,
        max_tokens=150,
        temperature=0.0,      # Deterministic output
        stop=["User:", "\n\n"]
    )
    
    response = output['choices'][0]['text'].strip()
    conversation += " " + response
    
    return response

# Example session
print(run_command("pwd"))           # /home/user
print(run_command("mkdir project")) # (empty)
print(run_command("cd project"))    # (empty)
print(run_command("pwd"))           # /home/user/project
print(run_command("touch main.py")) # (empty)
print(run_command("ls"))            # main.py

Advanced: Full terminal emulator:

from llama_cpp import Llama

class LinuxTerminal:
    def __init__(self, model_path, initial_dir="/home/user"):
        self.llm = Llama(
            model_path=model_path,
            n_ctx=2048,
            n_threads=8,
            verbose=False
        )
        
        self.conversation = f"""You are a Linux terminal emulator. Initial state:
Current directory: {initial_dir}
Files: (empty)
Environment: USER=user, HOME=/home/user"""
    
    def execute(self, command):
        """Execute a command and return the output"""
        self.conversation += f"\n\nUser: {command}\nAssistant:"
        
        output = self.llm(
            self.conversation,
            max_tokens=150,
            temperature=0.0,
            stop=["User:", "\n\n"]
        )
        
        response = output['choices'][0]['text'].strip()
        self.conversation += " " + response
        
        return response
    
    def run(self):
        """Interactive terminal session"""
        print("LaaLM Terminal Emulator - Type 'exit' to quit")
        print("=" * 50)
        
        while True:
            try:
                cmd = input("$ ")
                
                if cmd.lower() in ['exit', 'quit']:
                    break
                
                if cmd.strip():
                    output = self.execute(cmd)
                    if output:  # Only print non-empty outputs
                        print(output)
                        
            except KeyboardInterrupt:
                print("\nExiting...")
                break
            except Exception as e:
                print(f"Error: {e}")

# Usage
terminal = LinuxTerminal("exp-v1-Q4_K_M.gguf")
terminal.run()

Understanding the System Prompt

The system prompt is critical - it tells the model what the initial filesystem looks like. Without it, the model won't know where to start.

Required Format

You are a Linux terminal emulator. Initial state:
Current directory: /home/user
Files: (empty)
Environment: USER=user, HOME=/home/user

Key Components

Identity declaration - "You are a Linux terminal emulator"
Starting directory - Usually /home/user
Initial files - List existing files or write "(empty)"
Environment variables - At minimum: USER and HOME

Starting with Existing Files

If you want to start with files already present:

You are a Linux terminal emulator. Initial state:
Current directory: /home/user
Files: document.txt, script.sh, folder/data.csv
Environment: USER=user, HOME=/home/user

Important Rules

Set the system prompt only once at the start
Do not update it with current state - the model learns to track changes from command history
Include full conversation history when generating responses
Use temperature 0 for deterministic, consistent outputs

How It Works: The Technical Magic

Traditional Terminal Emulator (External State)

class Terminal:
    def __init__(self):
        self.cwd = "/home/user"
        self.files = {}  # Dictionary storing all files
    
    def execute(self, cmd):
        if cmd == "pwd":
            return self.cwd
        elif cmd.startswith("touch"):
            filename = cmd.split()[1]
            self.files[filename] = ""  # Explicit state update

LaaLM Approach (Internal State via Learning)

The model learned patterns like:

"User: touch file.txt" → creates file.txt in memory
"User: ls" → must remember file.txt exists
"User: cat file.txt" → must recall this file was created
"User: rm file.txt" → must remember to remove it
"User: ls" → file.txt should NOT appear anymore

The model doesn't have a files dictionary or any code. It learned these patterns from seeing 10,000 training conversations (800,000 individual messages) showing how files should behave.

Training Data

Base Model: Qwen 2.5-3B-Instruct
Training Examples: 10,000 synthetic terminal conversations
Commands per conversation: 30-50
Total messages: 800,000
Training method: Full fine-tuning (all parameters trained)
Precision: BFloat16 with Flash Attention 2
Hardware: Single A100 80GB GPU
Training time: 34 minutes
Cost: $0.68

Data generation used simulated Linux environments with:

Random realistic filenames
Diverse command sequences
Error cases (missing files, invalid commands)
Multi-step operations requiring memory
File content persistence across commands

Why This Matters for AI Research

This model demonstrates that language models can learn complex stateful systems without explicit programming:

No code execution - Pure pattern matching
No external state - Everything in conversation context
Learned behavior - Emergent filesystem simulation
Generalization - Works on command combinations not in training

This has implications for:

Building AI agents that can control software systems
Creating natural language interfaces for complex tools
Understanding how LLMs can learn to simulate computational processes
Research into emergent capabilities in transformers

Known Limitations

Command Support

Only 12 commands - No vim, nano, find, sed, awk, etc.
No advanced features:
- No pipes (|) or command chaining (&&, ;)
- No complex redirects (>>, 2>)
- No variables, loops, or conditionals
- No shell scripting

Specific Issues

File copying: cp occasionally fails to copy file content (only creates empty file)
Error handling: rm on non-existent files sometimes returns empty output instead of error message
Long conversations: After 50+ commands, state tracking may degrade
Long filenames: Names over 30 characters can cause parsing issues

Scope Constraints

No actual execution - This is simulation, not a real terminal
Requires full history - The model needs the entire conversation to track state
Context limits - Very long sessions may exceed the model's context window
Training distribution - Performance may drop on unusual command patterns

Use Cases

1. Linux Education

Interactive learning environment for teaching Linux commands without needing a real system:

# Teaching tool that explains each command
def educational_terminal(cmd):
    output = terminal.execute(cmd)
    print(f"Command: {cmd}")
    print(f"Output: {output}")
    print(f"Explanation: {get_explanation(cmd)}")

2. Shell Script Validation

Test scripts in simulation before running them:

$ echo "#!/bin/bash" > backup.sh
$ echo "cp important.txt backup/" >> backup.sh
$ cat backup.sh
#!/bin/bash
cp important.txt backup/

3. AI Agent Foundation

Use as a component in larger AI systems that need filesystem interaction:

class AIAgent:
    def __init__(self):
        self.terminal = LinuxTerminal("model.gguf")
    
    def organize_files(self, task):
        # AI generates commands to organize files
        commands = self.plan_organization(task)
        for cmd in commands:
            self.terminal.execute(cmd)

4. Research Platform

Study how language models learn stateful behavior:

Test emergent capabilities
Analyze error patterns
Investigate context length effects
Explore state tracking mechanisms

5. Accessibility Interface

Natural language terminal for users unfamiliar with command-line:

def natural_language_command(intent):
    # "create a file called notes" → "touch notes.txt"
    # "show me what's here" → "ls"
    cmd = intent_to_command(intent)
    return terminal.execute(cmd)

Project Lineage: LaaLM Evolution

LaaLM-v1 (State-Based Approach)

Architecture: T5-base (220M parameters)
Training data: 80,000 examples
Method: External filesystem state tracking
Approach: Model generates state transitions explicitly

LaaLM-exp-v1 (Current - Conversation-Based)

Architecture: Qwen 2.5-3B-Instruct
Training data: 800,000 messages (10,000 conversations)
Method: Internal state tracking through conversation
Approach: Model infers state from command history

LaaLM-v2 (Planned)

Features: Bash scripting, pipes, command chaining
Commands: Expanded command set (50+ commands)
Capabilities: Variables, loops, conditionals

Best Practices for Inference

Always use the proper system prompt format - Don't skip it or modify it mid-conversation
Set temperature=0 - Ensures deterministic, consistent outputs
Enable fix_mistral_regex=True when using tokenizer (for transformers library)
Maintain full conversation history - The model needs all previous commands to track state
Limit max_tokens to ~150 - Commands rarely need longer outputs
Use greedy decoding (do_sample=False) for predictable behavior
Start fresh for new sessions - Don't reuse conversation context across unrelated tasks

Performance Tips

CPU Inference Optimization

llm = Llama(
    model_path="exp-v1-Q4_K_M.gguf",
    n_ctx=2048,
    n_threads=8,              # Match your CPU cores
    n_batch=512,              # Batch size for prompt processing
    use_mlock=True,           # Lock model in RAM (prevents swapping)
    use_mmap=True,            # Memory-map the model file
    verbose=False
)

GPU Acceleration

# Requires llama-cpp-python built with GPU support
llm = Llama(
    model_path="exp-v1-Q4_K_M.gguf",
    n_gpu_layers=32,          # Offload layers to GPU
    n_ctx=2048,
    verbose=False
)

Reducing Memory Usage

Use lower quantizations (Q3_K_M or Q4_K_S)
Reduce n_ctx if you don't need long conversations
Decrease n_batch (trades speed for memory)

Frequently Asked Questions

Q: Can this actually execute commands on my system?
A: No! This is pure simulation. The model learned patterns of how Linux commands work, but it doesn't execute anything. It's completely safe.

Q: Why does it sometimes make mistakes?
A: The model learned from examples, not from actual code. It's doing pattern matching, so occasionally it makes incorrect predictions, especially with complex multi-step operations.

Q: Can I use this instead of a real terminal?
A: No - this is for learning, prototyping, and research. For actual file management, use a real terminal.

Q: How long can conversations be?
A: The model was trained on 30-50 command sequences. It can handle more, but accuracy may degrade after 50-60 commands or when approaching the context limit.

Q: Can I train it on more commands?
A: Yes! The original model (non-GGUF) can be fine-tuned further. See the main model card for training details.

Q: Which quantization should I use?
A: Start with Q4_K_M. If you need better quality and have RAM, try Q6_K. If you're resource-constrained, try Q3_K_M.

Q: Does it work with other GGUF tools?
A: Yes! Any GGUF-compatible inference engine should work (llama.cpp, Ollama, text-generation-webui, LM Studio, etc.)

Technical Specifications

Model Details

Architecture: Qwen 2 (qwen2)
Parameters: 3.09 billion (3085.9M)
Model Class: AutoModelForCausalLM
Base Model: Qwen/Qwen2.5-3B-Instruct
Context Length: 2048 tokens (expandable)
Vocabulary Size: 151,936 tokens

Quantization Details

Format: GGUF (GPT-Generated Unified Format)
Quantization Tool: llama.cpp
Compatible Engines: llama.cpp, Ollama, llama-cpp-python, text-generation-webui, LM Studio, Koboldcpp, and more

Benchmark Environment

Test Suite: 130 automated tests
Categories: 6 (Basic, Creation, Operations, Content, Errors, Persistence)
Evaluation Method: Exact output matching
Base Model Score: 95.4% (124/130 passed)

License

Apache 2.0 - Free for commercial and non-commercial use

Inherited from the Qwen 2.5 base model.

Links & Resources

Original Model: LaaLM/LaaLM-exp-v1 (BFloat16 version)
Author: LaaLM
Project: Linux as a Language Model (LaaLM)
llama.cpp: GitHub
Ollama: Website

Acknowledgments

Built on Qwen 2.5-3B-Instruct by Alibaba Cloud. Quantized using llama.cpp by Georgi Gerganov and contributors.

Last Updated: January 22, 2026
Model Version: exp-v1
GGUF Quantizations: 12 variants (Q2_K through fp16)

Downloads last month: 126

GGUF

Model size

3B params

Architecture

qwen2

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

View +1 variant

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LaaLM/LaaLM-exp-v1-GGUF

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Finetuned

LaaLM/LaaLM-exp-v1

Quantized

(3)

this model