SmolLM3-3B-LAM: A Large Action Model for AI Agent Creation

A fine-tuned SmolLM3-3B model that creates complete AI agent definitions from natural language requests. Given a description like "Build me a bot that monitors S3 for sensitive data exposure", it outputs a structured JSON agent specification with tools, skills, constraints, and architectural reasoning.

What It Does

This is not a general-purpose chatbot or a function-calling model. It does one thing: design AI agents from scratch.

Input:

Build an agent that monitors my S3 buckets for sensitive data exposure and auto-remediates

Output:

{
  "reasoning": "This is a casual, vague request. The user wants a simple, focused agent that watches S3 buckets for sensitive data and automatically fixes issues...",
  "agent": {
    "name": "s3-data-guardian",
    "description": "Monitors S3 buckets for sensitive data exposure and auto-remediates",
    "role": "cloud security monitor",
    "tools": [
      {
        "name": "scan_s3_objects",
        "description": "Scans S3 objects for sensitive data patterns",
        "parameters": [{ "name": "bucket_name", "type": "string", "description": "S3 bucket to scan", "required": true }],
        "returns": "List of objects containing sensitive data with classification"
      }
    ],
    "skills": [
      {
        "name": "detect-and-remediate",
        "description": "Scans buckets and auto-remediates exposure",
        "trigger": "Scheduled scan or S3 event notification",
        "inputs": [{ "name": "bucket_name", "type": "string", "description": "Target bucket", "required": true }],
        "steps": [
          { "action": "Scan all objects in bucket for sensitive data", "tool": "scan_s3_objects" },
          { "action": "Restrict public access on flagged objects", "tool": "update_bucket_policy" },
          { "action": "Send alert with remediation summary", "tool": "send_alert" }
        ],
        "output": "Remediation report with actions taken"
      }
    ],
    "constraints": [
      "Never delete S3 objects, only restrict access",
      "Log all remediation actions to audit trail"
    ]
  }
}

Benchmark Results

Held-Out Validation Set (20 samples, never seen during training)

Model	Avg Score	Min	Max	Valid JSON %
SmolLM3-3B-LAM (this model)	96.3	70	100	100%
SmolLM3-3B (base)	79.3	60	90	100%
xLAM-1B-fc-r (Salesforce)	27.5	20	40	100%

3-Way Comparison (Hand-Crafted Prompts)

Model	Params	T1	T2	T3	Avg
SmolLM3-3B-LAM	3B	95	100	100	98.3
SmolLM3-3B (base)	3B	90	70	85	81.7
xLAM-1B-fc-r	1B	20	40	40	33.3

Key findings:

+21.4% improvement over the base SmolLM3-3B model
+250% improvement over Salesforce's xLAM-1B (a purpose-built Large Action Model)
100/100 on 13 out of 20 held-out validation examples
Goes straight to clean structured JSON without <think> wrapper tags
Learned to characterize user tone and adjust agent complexity accordingly

Scoring Methodology

Each output scored 0-100: valid JSON (20pts) + presence of key schema fields: reasoning (10), agent (10), tools (10), skills (10), constraints (10), steps (10), trigger (5), parameters (5), on_failure (5), description (5).

Training Details

Parameter	Value
Base model	HuggingFaceTB/SmolLM3-3B
Method	QLoRA (8-bit quantized base + full-precision adapters)
Framework	Apple MLX
Trainable parameters	6.7M / 3,075M (0.218%)
Training iterations	500
Batch size	2
Learning rate	1e-5
LoRA layers	16
Max sequence length	8,192
Peak memory	59.5 GB (Apple Silicon unified memory)
Training time	~40 minutes on M-series Mac
Best val loss	0.559 (iter 250)
Final val loss	0.625 (iter 500)

Training Loss Curve

Iter	Train Loss	Val Loss
1	—	1.069
100	0.503	0.688
250	0.590	0.559
500	0.482	0.625

Training Data

~2,000 examples total:

Source	Count	Purpose
Synthetic agent-creation pairs	992	Core task: natural language to agent definition
ToolACE (ICLR 2025)	500	Structured JSON tool-calling patterns
Alpaca-Cleaned	500	General instruction following (prevents catastrophic forgetting)

The synthetic data was generated using Claude Sonnet 4.6 via the Anthropic Batch API with an instruction repetition technique that improved output quality by 7.1% in A/B testing (specifically improving reasoning quality -- the repeated instruction variant was the only one that produced reasoning explaining why an architecture fits, not just what it is).

Usage

With MLX (Apple Silicon)

from mlx_lm import load, generate

model, tokenizer = load("chendren/smollm3-3b-lam")

prompt = """You are a Large Action Model that creates AI agents and skills from user requests.

When given a request, you:
1. Reason about what agent architecture best serves the need
2. Define the tools the agent requires
3. Define skills as composable, multi-step workflows
4. Set constraints to keep the agent safe and focused

Respond with a JSON object containing:
- reasoning: your thought process for the design
- agent: the complete agent definition with name, description, role, tools, skills, and constraints

User request: Create an agent that reviews PRs for security vulnerabilities"""

response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)
print(response)

Output Schema

The model generates JSON conforming to this structure:

{
  reasoning: string,           // WHY this architecture fits
  agent: {
    name: string,              // kebab-case agent name
    description: string,       // what the agent does
    role: string,              // primary role in one phrase
    tools: [{                  // tools the agent needs
      name: string,            // snake_case tool name
      description: string,
      parameters: [{ name, type, description, required }],
      returns: string
    }],
    skills: [{                 // composable multi-step workflows
      name: string,            // kebab-case skill name
      description: string,
      trigger: string,         // when the skill activates
      inputs: [{ name, type, description, required }],
      steps: [{ action, tool?, input?, on_failure? }],
      output: string
    }],
    constraints: string[]      // behavioral guardrails
  }
}

Limitations

Trained on 20 categories of agent types -- may produce lower quality output for highly specialized domains not represented in training
Generates agent definitions, not executable code -- the output is a specification that needs a runtime to execute
Best validation loss was at iteration 250; the final model (iteration 500) shows slight overfitting -- the iteration 250 checkpoint may perform marginally better
Scoring is structural (checks for field presence), not semantic -- a high score does not guarantee the agent design is good, only that it is complete

Citation

@misc{smollm3-3b-lam-2026,
  title={SmolLM3-3B-LAM: Fine-Tuning a 3B Model as a Large Action Model for AI Agent Creation},
  author={Chad Hendren},
  year={2026},
  url={https://huggingface.co/chendren/smollm3-3b-lam}
}

Acknowledgments

HuggingFace SmolLM3 -- base model
Apple MLX -- training framework
Salesforce xLAM -- Large Action Model research
ToolACE -- tool-calling training data
Anthropic Claude -- synthetic data generation

Downloads last month: 20

Safetensors

Model size

0.9B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for chendren/smollm3-3b-lam

Base model

HuggingFaceTB/SmolLM3-3B-Base

Finetuned

HuggingFaceTB/SmolLM3-3B

Adapter

(36)

this model

chendren
/

smollm3-3b-lam