LFM2.5-1.2B-MOAT

Multi-task Optimized Assessment Tool — a finetuned LiquidAI/LFM2.5-1.2B-Instruct model for recruitment AI.

Handles two tasks with a single model:

CV-JD Assessment — Match scoring + qualitative analysis
Keyword Extraction — Structured keyword extraction from job descriptions and CVs

Training

Base model: LiquidAI/LFM2.5-1.2B-Instruct (1.2B params, hybrid Mamba2 + Attention)
Stage 1 — Multi-task SFT: 39,641 examples (19,588 assessments + 20,053 keywords), LoRA r=32/α=64, 1 epoch, LR=5e-5
Stage 2 — Targeted DPO: 2,374 filtered problematic pairs (|score diff| ≥ 5pts), LoRA r=16/α=32, beta=0.2, LR=5e-6
Hardware: NVIDIA RTX 5080 16GB, total training time ~3.5 hours
Training data: Gemini-generated assessments and keyword extractions across tech, healthcare, finance, and blue collar domains

Performance

CV-JD Assessment (4,898 held-out samples)

Metric	V1 Baseline	MOAT V2	Target
JSON Parse Rate	97.0%	99.9%	≥95%
Score MAE	13.1 pts	6.82 pts	<8
Score Bias	-13.0 pts	+1.53 pts	~0
Verdict Accuracy	50.0%	76.8%	>60%
Within 5 pts	—	51.4%	—
Within 10 pts	—	77.5%	—
Median Absolute Error	—	4.90 pts	—

Keyword Extraction (10 diverse samples across domains)

Field	Accuracy
JSON Parse Rate	100%
Schema Complete	100%
Experience Years	100%
Domain	90%
Education	80%
Seniority	80%
Skills (avg F1)	0.58

Skills F1 varies by domain: white collar (0.74-0.84) > blue collar/healthcare (0.33-0.58). The model extracts correct skills but sometimes at different granularity than reference labels.

Usage with vLLM

from vllm import LLM, SamplingParams

model = LLM(
    model="GazTrab/LFM2.5-1.2B-MOAT",
    max_model_len=4096,
    gpu_memory_utilization=0.85,
    dtype="bfloat16",
    trust_remote_code=True,
    max_num_seqs=64,
)
tokenizer = model.get_tokenizer()

sampling_params = SamplingParams(
    temperature=0.1,
    top_p=0.1,
    top_k=50,
    repetition_penalty=1.05,
    max_tokens=2048,
)

# Build prompt using chat template
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": USER_PROMPT},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = model.generate([prompt], sampling_params)
print(outputs[0].outputs[0].text)

Important Notes

max_model_len=4096 — the model was trained with this context length
temperature=0.1, top_p=0.1 — low temperature for consistent structured output
trust_remote_code=True — required for the LFM2.5 architecture (hybrid Mamba2 + Attention)
Prompts exceeding ~2048 tokens should be truncated (leave room for generation)
The model outputs raw JSON — no markdown fences needed

Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "GazTrab/LFM2.5-1.2B-MOAT"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": USER_PROMPT},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)

output = model.generate(
    input_ids,
    max_new_tokens=2048,
    temperature=0.1,
    top_p=0.1,
    top_k=50,
    repetition_penalty=1.05,
    do_sample=True,
)
response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Task Prompts

Task 1: CV-JD Assessment

System prompt:

You are an expert recruitment AI that analyzes CV-JD compatibility.
You MUST respond with valid JSON only. No additional text before or after the JSON.

Output schema:
{
  "match_score": <float 0-100>,
  "executive_summary": "<2-3 sentence overview>",
  "strengths": ["<quantified strength 1>", "<quantified strength 2>", ...],
  "gaps": ["<specific gap 1>", "<specific gap 2>", ...],
  "recommendation": "Interview|Consider|Not recommended",
  "verdict": "STRONG_MATCH|GOOD_MATCH|MODERATE_MATCH|WEAK_MATCH|NOT_SUITABLE"
}

Guidelines:
- Be specific and quantified in strengths/gaps (e.g., "5/7 required skills", "3 years below requirement")
- Reference actual skills from the JD and CV
- Verdict must align with match_score brackets
- Keep strengths and gaps to 2-4 items each

User prompt format:

Analyze the following CV against the Job Description and provide a structured assessment.

=== JOB DESCRIPTION ===
{jd_text}

=== CANDIDATE CV ===
{cv_text}

Respond with JSON only:

Verdict-to-score mapping:

Verdict	Score Range
STRONG_MATCH	85-100
GOOD_MATCH	70-84
MODERATE_MATCH	50-69
WEAK_MATCH	30-49
NOT_SUITABLE	0-29

Task 2: Keyword Extraction

System prompt:

You are an expert recruitment AI that extracts structured keywords from documents.
You MUST respond with valid JSON only. No additional text before or after the JSON.

Output schema:
{
  "skills": ["<skill 1>", "<skill 2>", ...],
  "experience_years": <integer>,
  "education": "<phd|master|bachelor|associate|diploma|certificate|high_school|none>",
  "certifications": ["<cert 1>", "<cert 2>", ...],
  "domain": "<2-4 word domain>",
  "seniority": "<intern|junior|mid|senior|lead|principal|director|manager>"
}

Guidelines:
- Extract only explicitly stated skills, not inferred ones
- For CVs: infer experience_years from work history dates
- For JDs: use the stated requirement, or 0 if not specified
- Skills should be lowercase
- Keep domain to 2-4 words

User prompt format (for JDs):

Extract structured keywords from the following Job Description.

=== JOB DESCRIPTION ===
{jd_text}

Respond with JSON only:

User prompt format (for CVs):

Extract structured keywords from the following CV/Resume.

=== CANDIDATE CV ===
{cv_text}

Respond with JSON only:

Limitations

Low-score bias: Scores in the 0-20 range tend to be overestimated by ~8 points (model struggles to score below ~17)
Blue collar granularity: Keyword extraction for trade/blue collar roles sometimes outputs overly verbose skill descriptions
Training data domains: Primarily trained on tech, healthcare, and finance — generalizes to other domains but with slightly lower quality
Context length: Long CVs or JDs may need truncation to stay within the 2048-token prompt budget

Citation

@misc{gaztrab2026moat,
  title={LFM2.5-1.2B-MOAT: Multi-task Optimized Assessment Tool for Recruitment},
  author={GazTrab},
  year={2026},
  url={https://huggingface.co/GazTrab/LFM2.5-1.2B-MOAT}
}

Downloads last month: 38

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for GazTrab/LFM2.5-1.2B-MOAT

Base model

LiquidAI/LFM2.5-1.2B-Base

Finetuned

LiquidAI/LFM2.5-1.2B-Instruct

Adapter

(34)

this model

Adapters

2 models

Evaluation results

Score MAE
self-reported

6.820
JSON Parse Rate
self-reported

99.900
Verdict Accuracy
self-reported

76.800
Score Bias
self-reported

1.530