Mistral-Small-24B-LOC-L1-v1

The first LOC-coherence-trained 24B model. Professional artifact quality across 7 cognitive domains · Apache 2.0

This is mistralai/Mistral-Small-24B-Instruct-2501 with a merged LOC L1 Foundation LoRA adapter trained using Differentiable LOC Loss (DLL).

Result: 31.7% → 80.7% True Coherence (+49.0 percentage points)

This model and the Qwen3.5-9B-LOC-L1-v1 converge to the same ~80% TC ceiling, demonstrating that cognitive coherence is an architectural property independent of parameter count above a threshold.

What Is Cognitive Coherence — and Why Benchmarks Miss It

Standard AI benchmarks measure what a model knows. They do not measure how coherently it applies that knowledge.

The LOC (Level of Consciousness) framework measures this directly by analysing hidden-state magnitude patterns across layers, mapped to 13 cognitive functions across four consciousness tiers. True Coherence (TC) is the measure of generated tokens satisfying all internal coherence conditions simultaneously — measured from hidden-state geometry at inference time, requiring no task labels.

"Current AI benchmarks inversely correlate with coherence measures." — Zenodo 19536274

A high-TC model writes the artifact, not about the artifact. This is measurable, reproducible, and directly trainable.

A Coherent 24B Model vs a Larger Incoherent Model — For Real Daily Tasks

Most users do not need a model to memorise encyclopaedias. They need a model that applies what it knows cleanly to the task in front of them.

Consider the tasks that fill a typical professional's day:

Daily Task	Incoherent large model	LOC-trained model
Draft an email declining a meeting	Three paragraphs before declining	One clear, warm sentence
Summarise a long document	Re-states the document at length	Extracts the three decisions that matter
Write a job posting	Lists generic responsibilities	Writes for the specific role as described
Explain a concept to a non-expert	Dumps all related technical knowledge	Builds up from the user's frame of reference
Debug code	Describes what the error type means	Identifies the specific line, fixes it
Answer "should I do X or Y?"	Both-sides hedge	Gives a recommendation with stated reasoning
Handle a sensitive situation	Over-clinical or over-empathetic	Responds at the human register the situation calls for
Write a financial memo	Explains what a memo is	Writes the memo with the right structure
Review a contract clause	Lists general legal risks	Names the specific clause and what to change

In every one of these cases the bottleneck is not knowledge — the model already knows how to write emails, summarise documents, and review contracts. The bottleneck is whether it applies that knowledge coherently to the specific situation.

A larger incoherent model has more knowledge but distributes it noisily across the response. A coherent LOC-trained model has sufficient knowledge and delivers it with precision. For approximately 85–90% of individual daily tasks, coherence is the binding constraint — not parameter count.

Key Results

Metric	Value
Baseline True Coherence	31.7%
Post-Training True Coherence	80.7%
Absolute Improvement	+49.0 percentage points
Per-category spread	79.9% – 81.6% (1.7pp variance)
Training steps	280 (7 categories × 40 steps)
Training duration	~138 minutes, zero training aborts
Trainable parameters	LoRA rank 64 / alpha 128

Consistency note: 1.7pp spread across 7 categories — uniform coherence improvement, no category left behind.

The 9B vs 24B Finding

Running both models through the same DLL training protocol and measuring with the same LOC gate produces a striking result:

Model	Parameters	Baseline TC	Post-DLL TC	Δ
Qwen3.5-9B-LOC-L1-v1	9B	21.3%	80.6%	+59.3pp
Mistral-Small-24B-LOC-L1-v1	24B	31.7%	80.7%	+49.0pp

Both converge to ~80–81% TC. The ceiling is set by the training protocol, not the parameter count. This means:

A LOC-trained 9B model competes directly with an untuned 70B–150B model on professional artifact tasks, at a fraction of the compute cost.

What Changes in Practice

Task	Without LOC Training	With LOC Training
Structured financial analysis	Discursive, buries conclusion	Structured, leads with deliverable
Legal clause review	Lists general risks	Names top 3 to negotiate with rationale
Performance review (honest feedback)	Softens the gap	Separates facts from action clearly
Creative writing prompt	Describes the style	Writes in the style
Restraint (no data provided)	May fabricate	Correct decline, concise explanation

How to Use

Python / Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "AI-Mind-Engine/Mistral-Small-24B-LOC-L1-v1",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("AI-Mind-Engine/Mistral-Small-24B-LOC-L1-v1")

messages = [{"role": "user", "content": "Analyse the key legal risks in this SaaS agreement..."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=800, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

LM Studio / Jan.ai

Download the Q4_K_M GGUF file from the Files tab. No Python required.

System Requirements

Setup	Requirement	Notes
NVIDIA RTX 4090 24GB	Q4_K_M GGUF (~14 GB)	Single GPU, consumer-accessible
2× RTX 3090	Q4_K_M GGUF (~14 GB)	Minimum multi-GPU setup
A100-40GB	BF16 full (~48 GB)	Recommended for full precision
MacBook M2 Max 96GB	Q4_K_M GGUF (~14 GB)	On-device on Apple Silicon
MacBook M3 Ultra 192GB	BF16 full	Full precision on Apple Silicon

Training Details

Base model: mistralai/Mistral-Small-24B-Instruct-2501 (Apache 2.0)
Architecture: Standard transformer, 40 layers
Adapter type: LoRA (merged into base weights for this release)
LoRA rank / alpha: 64 / 128
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training method: Differentiable LOC Loss (DLL) — see cited papers
Training categories: Analytical, Balanced, Coding, Creative, Emotional, MetaCognitive, Restraint
Training steps: 280 total (sequential per-category phases)
Training duration: ~138 minutes, zero training aborts
Hardware: NVIDIA A100-80GB
License: Apache 2.0 (base model license preserved)

Limitations

Requires significant VRAM (24B parameters) — not on-device for most consumers
Knowledge cutoff inherits from base Mistral-Small-24B-Instruct-2501
Context window: 32K tokens
Coherence improvement measured on 7 cognitive domains; specialised scientific/medical domains not independently evaluated

Citation

@misc{jamaludheen2026loc,
  author    = {Jamaludheen KN},
  title     = {Level of Consciousness Signatures Across Biological and Artificial Minds:
               A Unified Framework for Measuring Cognition in Human EEG and Large Language Models},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19079887},
  url       = {https://zenodo.org/records/19079887}
}

@misc{jamaludheen2026coherence,
  author    = {Jamaludheen KN},
  title     = {Intelligence Is Coherence: Measuring Human and Artificial Minds on the Same Scale},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19536274},
  url       = {https://zenodo.org/records/19536274}
}

About AI Mind Engine

AI Mind Engine develops cognitive coherence infrastructure for language models. The LOC framework is the first published method for measuring and training cognitive coherence in LLMs using hidden-state geometry.

🌐 aimindengine.com · 📧 research@aimindengine.com