SmolLM3-3B-LAM: A Large Action Model for AI Agent Creation

A fine-tuned SmolLM3-3B model that creates complete AI agent definitions from natural language requests. Given a description like "Build me a bot that monitors S3 for sensitive data exposure", it outputs a structured JSON agent specification with tools, skills, constraints, and architectural reasoning.

What It Does

This is not a general-purpose chatbot or a function-calling model. It does one thing: design AI agents from scratch.

Input:

Build an agent that monitors my S3 buckets for sensitive data exposure and auto-remediates

Output:

{
  "reasoning": "This is a casual, vague request. The user wants a simple, focused agent that watches S3 buckets for sensitive data and automatically fixes issues...",
  "agent": {
    "name": "s3-data-guardian",
    "description": "Monitors S3 buckets for sensitive data exposure and auto-remediates",
    "role": "cloud security monitor",
    "tools": [
      {
        "name": "scan_s3_objects",
        "description": "Scans S3 objects for sensitive data patterns",
        "parameters": [{ "name": "bucket_name", "type": "string", "description": "S3 bucket to scan", "required": true }],
        "returns": "List of objects containing sensitive data with classification"
      }
    ],
    "skills": [
      {
        "name": "detect-and-remediate",
        "description": "Scans buckets and auto-remediates exposure",
        "trigger": "Scheduled scan or S3 event notification",
        "inputs": [{ "name": "bucket_name", "type": "string", "description": "Target bucket", "required": true }],
        "steps": [
          { "action": "Scan all objects in bucket for sensitive data", "tool": "scan_s3_objects" },
          { "action": "Restrict public access on flagged objects", "tool": "update_bucket_policy" },
          { "action": "Send alert with remediation summary", "tool": "send_alert" }
        ],
        "output": "Remediation report with actions taken"
      }
    ],
    "constraints": [
      "Never delete S3 objects, only restrict access",
      "Log all remediation actions to audit trail"
    ]
  }
}

Benchmark Results

Held-Out Validation Set (20 samples, never seen during training)

Model Avg Score Min Max Valid JSON %
SmolLM3-3B-LAM (this model) 96.3 70 100 100%
SmolLM3-3B (base) 79.3 60 90 100%
xLAM-1B-fc-r (Salesforce) 27.5 20 40 100%

3-Way Comparison (Hand-Crafted Prompts)

Model Params T1 T2 T3 Avg
SmolLM3-3B-LAM 3B 95 100 100 98.3
SmolLM3-3B (base) 3B 90 70 85 81.7
xLAM-1B-fc-r 1B 20 40 40 33.3

Key findings:

  • +21.4% improvement over the base SmolLM3-3B model
  • +250% improvement over Salesforce's xLAM-1B (a purpose-built Large Action Model)
  • 100/100 on 13 out of 20 held-out validation examples
  • Goes straight to clean structured JSON without <think> wrapper tags
  • Learned to characterize user tone and adjust agent complexity accordingly

Scoring Methodology

Each output scored 0-100: valid JSON (20pts) + presence of key schema fields: reasoning (10), agent (10), tools (10), skills (10), constraints (10), steps (10), trigger (5), parameters (5), on_failure (5), description (5).

Training Details

Parameter Value
Base model HuggingFaceTB/SmolLM3-3B
Method QLoRA (8-bit quantized base + full-precision adapters)
Framework Apple MLX
Trainable parameters 6.7M / 3,075M (0.218%)
Training iterations 500
Batch size 2
Learning rate 1e-5
LoRA layers 16
Max sequence length 8,192
Peak memory 59.5 GB (Apple Silicon unified memory)
Training time ~40 minutes on M-series Mac
Best val loss 0.559 (iter 250)
Final val loss 0.625 (iter 500)

Training Loss Curve

Iter Train Loss Val Loss
1 1.069
100 0.503 0.688
250 0.590 0.559
500 0.482 0.625

Training Data

~2,000 examples total:

Source Count Purpose
Synthetic agent-creation pairs 992 Core task: natural language to agent definition
ToolACE (ICLR 2025) 500 Structured JSON tool-calling patterns
Alpaca-Cleaned 500 General instruction following (prevents catastrophic forgetting)

The synthetic data was generated using Claude Sonnet 4.6 via the Anthropic Batch API with an instruction repetition technique that improved output quality by 7.1% in A/B testing (specifically improving reasoning quality -- the repeated instruction variant was the only one that produced reasoning explaining why an architecture fits, not just what it is).

Usage

With MLX (Apple Silicon)

from mlx_lm import load, generate

model, tokenizer = load("chendren/smollm3-3b-lam")

prompt = """You are a Large Action Model that creates AI agents and skills from user requests.

When given a request, you:
1. Reason about what agent architecture best serves the need
2. Define the tools the agent requires
3. Define skills as composable, multi-step workflows
4. Set constraints to keep the agent safe and focused

Respond with a JSON object containing:
- reasoning: your thought process for the design
- agent: the complete agent definition with name, description, role, tools, skills, and constraints

User request: Create an agent that reviews PRs for security vulnerabilities"""

response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)
print(response)

Output Schema

The model generates JSON conforming to this structure:

{
  reasoning: string,           // WHY this architecture fits
  agent: {
    name: string,              // kebab-case agent name
    description: string,       // what the agent does
    role: string,              // primary role in one phrase
    tools: [{                  // tools the agent needs
      name: string,            // snake_case tool name
      description: string,
      parameters: [{ name, type, description, required }],
      returns: string
    }],
    skills: [{                 // composable multi-step workflows
      name: string,            // kebab-case skill name
      description: string,
      trigger: string,         // when the skill activates
      inputs: [{ name, type, description, required }],
      steps: [{ action, tool?, input?, on_failure? }],
      output: string
    }],
    constraints: string[]      // behavioral guardrails
  }
}

Limitations

  • Trained on 20 categories of agent types -- may produce lower quality output for highly specialized domains not represented in training
  • Generates agent definitions, not executable code -- the output is a specification that needs a runtime to execute
  • Best validation loss was at iteration 250; the final model (iteration 500) shows slight overfitting -- the iteration 250 checkpoint may perform marginally better
  • Scoring is structural (checks for field presence), not semantic -- a high score does not guarantee the agent design is good, only that it is complete

Citation

@misc{smollm3-3b-lam-2026,
  title={SmolLM3-3B-LAM: Fine-Tuning a 3B Model as a Large Action Model for AI Agent Creation},
  author={Chad Hendren},
  year={2026},
  url={https://huggingface.co/chendren/smollm3-3b-lam}
}

Acknowledgments

Downloads last month
117
Safetensors
Model size
0.9B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chendren/smollm3-3b-lam

Adapter
(26)
this model

Datasets used to train chendren/smollm3-3b-lam