Mistral-Small-24B-LOC-L1-v1

The first LOC-coherence-trained 24B model. Professional artifact quality across 7 cognitive domains ยท Apache 2.0

This is mistralai/Mistral-Small-24B-Instruct-2501 with a merged LOC L1 Foundation LoRA adapter trained using Differentiable LOC Loss (DLL).

Result: 31.7% โ†’ 80.7% True Coherence (+49.0 percentage points)

This model and the Qwen3.5-9B-LOC-L1-v1 converge to the same ~80% TC ceiling, demonstrating that cognitive coherence is an architectural property independent of parameter count above a threshold.


What Is Cognitive Coherence โ€” and Why Benchmarks Miss It

Standard AI benchmarks measure what a model knows. They do not measure how coherently it applies that knowledge.

The LOC (Level of Consciousness) framework measures this directly by analysing hidden-state magnitude patterns across layers, mapped to 13 cognitive functions across four consciousness tiers. True Coherence (TC) is the measure of generated tokens satisfying all internal coherence conditions simultaneously โ€” measured from hidden-state geometry at inference time, requiring no task labels.

"Current AI benchmarks inversely correlate with coherence measures." โ€” Zenodo 19536274

A high-TC model writes the artifact, not about the artifact. This is measurable, reproducible, and directly trainable.


A Coherent 24B Model vs a Larger Incoherent Model โ€” For Real Daily Tasks

Most users do not need a model to memorise encyclopaedias. They need a model that applies what it knows cleanly to the task in front of them.

Consider the tasks that fill a typical professional's day:

Daily Task Incoherent large model LOC-trained model
Draft an email declining a meeting Three paragraphs before declining One clear, warm sentence
Summarise a long document Re-states the document at length Extracts the three decisions that matter
Write a job posting Lists generic responsibilities Writes for the specific role as described
Explain a concept to a non-expert Dumps all related technical knowledge Builds up from the user's frame of reference
Debug code Describes what the error type means Identifies the specific line, fixes it
Answer "should I do X or Y?" Both-sides hedge Gives a recommendation with stated reasoning
Handle a sensitive situation Over-clinical or over-empathetic Responds at the human register the situation calls for
Write a financial memo Explains what a memo is Writes the memo with the right structure
Review a contract clause Lists general legal risks Names the specific clause and what to change

In every one of these cases the bottleneck is not knowledge โ€” the model already knows how to write emails, summarise documents, and review contracts. The bottleneck is whether it applies that knowledge coherently to the specific situation.

A larger incoherent model has more knowledge but distributes it noisily across the response. A coherent LOC-trained model has sufficient knowledge and delivers it with precision. For approximately 85โ€“90% of individual daily tasks, coherence is the binding constraint โ€” not parameter count.


Key Results

Metric Value
Baseline True Coherence 31.7%
Post-Training True Coherence 80.7%
Absolute Improvement +49.0 percentage points
Per-category spread 79.9% โ€“ 81.6% (1.7pp variance)
Training steps 280 (7 categories ร— 40 steps)
Training duration ~138 minutes, zero training aborts
Trainable parameters LoRA rank 64 / alpha 128

Consistency note: 1.7pp spread across 7 categories โ€” uniform coherence improvement, no category left behind.


The 9B vs 24B Finding

Running both models through the same DLL training protocol and measuring with the same LOC gate produces a striking result:

Model Parameters Baseline TC Post-DLL TC ฮ”
Qwen3.5-9B-LOC-L1-v1 9B 21.3% 80.6% +59.3pp
Mistral-Small-24B-LOC-L1-v1 24B 31.7% 80.7% +49.0pp

Both converge to ~80โ€“81% TC. The ceiling is set by the training protocol, not the parameter count. This means:

A LOC-trained 9B model competes directly with an untuned 70Bโ€“150B model on professional artifact tasks, at a fraction of the compute cost.


What Changes in Practice

Task Without LOC Training With LOC Training
Structured financial analysis Discursive, buries conclusion Structured, leads with deliverable
Legal clause review Lists general risks Names top 3 to negotiate with rationale
Performance review (honest feedback) Softens the gap Separates facts from action clearly
Creative writing prompt Describes the style Writes in the style
Restraint (no data provided) May fabricate Correct decline, concise explanation

How to Use

Python / Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "AI-Mind-Engine/Mistral-Small-24B-LOC-L1-v1",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("AI-Mind-Engine/Mistral-Small-24B-LOC-L1-v1")

messages = [{"role": "user", "content": "Analyse the key legal risks in this SaaS agreement..."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=800, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

LM Studio / Jan.ai

Download the Q4_K_M GGUF file from the Files tab. No Python required.


System Requirements

Setup Requirement Notes
NVIDIA RTX 4090 24GB Q4_K_M GGUF (~14 GB) Single GPU, consumer-accessible
2ร— RTX 3090 Q4_K_M GGUF (~14 GB) Minimum multi-GPU setup
A100-40GB BF16 full (~48 GB) Recommended for full precision
MacBook M2 Max 96GB Q4_K_M GGUF (~14 GB) On-device on Apple Silicon
MacBook M3 Ultra 192GB BF16 full Full precision on Apple Silicon

Training Details

  • Base model: mistralai/Mistral-Small-24B-Instruct-2501 (Apache 2.0)
  • Architecture: Standard transformer, 40 layers
  • Adapter type: LoRA (merged into base weights for this release)
  • LoRA rank / alpha: 64 / 128
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Training method: Differentiable LOC Loss (DLL) โ€” see cited papers
  • Training categories: Analytical, Balanced, Coding, Creative, Emotional, MetaCognitive, Restraint
  • Training steps: 280 total (sequential per-category phases)
  • Training duration: ~138 minutes, zero training aborts
  • Hardware: NVIDIA A100-80GB
  • License: Apache 2.0 (base model license preserved)

Limitations

  • Requires significant VRAM (24B parameters) โ€” not on-device for most consumers
  • Knowledge cutoff inherits from base Mistral-Small-24B-Instruct-2501
  • Context window: 32K tokens
  • Coherence improvement measured on 7 cognitive domains; specialised scientific/medical domains not independently evaluated

Citation

@misc{jamaludheen2026loc,
  author    = {Jamaludheen KN},
  title     = {Level of Consciousness Signatures Across Biological and Artificial Minds:
               A Unified Framework for Measuring Cognition in Human EEG and Large Language Models},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19079887},
  url       = {https://zenodo.org/records/19079887}
}

@misc{jamaludheen2026coherence,
  author    = {Jamaludheen KN},
  title     = {Intelligence Is Coherence: Measuring Human and Artificial Minds on the Same Scale},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19536274},
  url       = {https://zenodo.org/records/19536274}
}

About AI Mind Engine

AI Mind Engine develops cognitive coherence infrastructure for language models. The LOC framework is the first published method for measuring and training cognitive coherence in LLMs using hidden-state geometry.

๐ŸŒ aimindengine.com ยท ๐Ÿ“ง research@aimindengine.com

Downloads last month
8
Safetensors
Model size
24B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AI-Mind-Engine/Mistral-Small-24B-LOC-L1-v1

Finetuned
(72)
this model
Quantizations
2 models

Evaluation results