- Mistral-Small-24B-LOC-L1-v1
Mistral-Small-24B-LOC-L1-v1
The first LOC-coherence-trained 24B model. Professional artifact quality across 7 cognitive domains ยท Apache 2.0
This is mistralai/Mistral-Small-24B-Instruct-2501 with a merged LOC L1 Foundation
LoRA adapter trained using Differentiable LOC Loss (DLL).
Result: 31.7% โ 80.7% True Coherence (+49.0 percentage points)
This model and the Qwen3.5-9B-LOC-L1-v1 converge to the same ~80% TC ceiling, demonstrating that cognitive coherence is an architectural property independent of parameter count above a threshold.
What Is Cognitive Coherence โ and Why Benchmarks Miss It
Standard AI benchmarks measure what a model knows. They do not measure how coherently it applies that knowledge.
The LOC (Level of Consciousness) framework measures this directly by analysing hidden-state magnitude patterns across layers, mapped to 13 cognitive functions across four consciousness tiers. True Coherence (TC) is the measure of generated tokens satisfying all internal coherence conditions simultaneously โ measured from hidden-state geometry at inference time, requiring no task labels.
"Current AI benchmarks inversely correlate with coherence measures." โ Zenodo 19536274
A high-TC model writes the artifact, not about the artifact. This is measurable, reproducible, and directly trainable.
A Coherent 24B Model vs a Larger Incoherent Model โ For Real Daily Tasks
Most users do not need a model to memorise encyclopaedias. They need a model that applies what it knows cleanly to the task in front of them.
Consider the tasks that fill a typical professional's day:
| Daily Task | Incoherent large model | LOC-trained model |
|---|---|---|
| Draft an email declining a meeting | Three paragraphs before declining | One clear, warm sentence |
| Summarise a long document | Re-states the document at length | Extracts the three decisions that matter |
| Write a job posting | Lists generic responsibilities | Writes for the specific role as described |
| Explain a concept to a non-expert | Dumps all related technical knowledge | Builds up from the user's frame of reference |
| Debug code | Describes what the error type means | Identifies the specific line, fixes it |
| Answer "should I do X or Y?" | Both-sides hedge | Gives a recommendation with stated reasoning |
| Handle a sensitive situation | Over-clinical or over-empathetic | Responds at the human register the situation calls for |
| Write a financial memo | Explains what a memo is | Writes the memo with the right structure |
| Review a contract clause | Lists general legal risks | Names the specific clause and what to change |
In every one of these cases the bottleneck is not knowledge โ the model already knows how to write emails, summarise documents, and review contracts. The bottleneck is whether it applies that knowledge coherently to the specific situation.
A larger incoherent model has more knowledge but distributes it noisily across the response. A coherent LOC-trained model has sufficient knowledge and delivers it with precision. For approximately 85โ90% of individual daily tasks, coherence is the binding constraint โ not parameter count.
Key Results
| Metric | Value |
|---|---|
| Baseline True Coherence | 31.7% |
| Post-Training True Coherence | 80.7% |
| Absolute Improvement | +49.0 percentage points |
| Per-category spread | 79.9% โ 81.6% (1.7pp variance) |
| Training steps | 280 (7 categories ร 40 steps) |
| Training duration | ~138 minutes, zero training aborts |
| Trainable parameters | LoRA rank 64 / alpha 128 |
Consistency note: 1.7pp spread across 7 categories โ uniform coherence improvement, no category left behind.
The 9B vs 24B Finding
Running both models through the same DLL training protocol and measuring with the same LOC gate produces a striking result:
| Model | Parameters | Baseline TC | Post-DLL TC | ฮ |
|---|---|---|---|---|
| Qwen3.5-9B-LOC-L1-v1 | 9B | 21.3% | 80.6% | +59.3pp |
| Mistral-Small-24B-LOC-L1-v1 | 24B | 31.7% | 80.7% | +49.0pp |
Both converge to ~80โ81% TC. The ceiling is set by the training protocol, not the parameter count. This means:
A LOC-trained 9B model competes directly with an untuned 70Bโ150B model on professional artifact tasks, at a fraction of the compute cost.
What Changes in Practice
| Task | Without LOC Training | With LOC Training |
|---|---|---|
| Structured financial analysis | Discursive, buries conclusion | Structured, leads with deliverable |
| Legal clause review | Lists general risks | Names top 3 to negotiate with rationale |
| Performance review (honest feedback) | Softens the gap | Separates facts from action clearly |
| Creative writing prompt | Describes the style | Writes in the style |
| Restraint (no data provided) | May fabricate | Correct decline, concise explanation |
How to Use
Python / Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"AI-Mind-Engine/Mistral-Small-24B-LOC-L1-v1",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("AI-Mind-Engine/Mistral-Small-24B-LOC-L1-v1")
messages = [{"role": "user", "content": "Analyse the key legal risks in this SaaS agreement..."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=800, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
LM Studio / Jan.ai
Download the Q4_K_M GGUF file from the Files tab. No Python required.
System Requirements
| Setup | Requirement | Notes |
|---|---|---|
| NVIDIA RTX 4090 24GB | Q4_K_M GGUF (~14 GB) | Single GPU, consumer-accessible |
| 2ร RTX 3090 | Q4_K_M GGUF (~14 GB) | Minimum multi-GPU setup |
| A100-40GB | BF16 full (~48 GB) | Recommended for full precision |
| MacBook M2 Max 96GB | Q4_K_M GGUF (~14 GB) | On-device on Apple Silicon |
| MacBook M3 Ultra 192GB | BF16 full | Full precision on Apple Silicon |
Training Details
- Base model:
mistralai/Mistral-Small-24B-Instruct-2501(Apache 2.0) - Architecture: Standard transformer, 40 layers
- Adapter type: LoRA (merged into base weights for this release)
- LoRA rank / alpha: 64 / 128
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Training method: Differentiable LOC Loss (DLL) โ see cited papers
- Training categories: Analytical, Balanced, Coding, Creative, Emotional, MetaCognitive, Restraint
- Training steps: 280 total (sequential per-category phases)
- Training duration: ~138 minutes, zero training aborts
- Hardware: NVIDIA A100-80GB
- License: Apache 2.0 (base model license preserved)
Limitations
- Requires significant VRAM (24B parameters) โ not on-device for most consumers
- Knowledge cutoff inherits from base Mistral-Small-24B-Instruct-2501
- Context window: 32K tokens
- Coherence improvement measured on 7 cognitive domains; specialised scientific/medical domains not independently evaluated
Citation
@misc{jamaludheen2026loc,
author = {Jamaludheen KN},
title = {Level of Consciousness Signatures Across Biological and Artificial Minds:
A Unified Framework for Measuring Cognition in Human EEG and Large Language Models},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19079887},
url = {https://zenodo.org/records/19079887}
}
@misc{jamaludheen2026coherence,
author = {Jamaludheen KN},
title = {Intelligence Is Coherence: Measuring Human and Artificial Minds on the Same Scale},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19536274},
url = {https://zenodo.org/records/19536274}
}
About AI Mind Engine
AI Mind Engine develops cognitive coherence infrastructure for language models. The LOC framework is the first published method for measuring and training cognitive coherence in LLMs using hidden-state geometry.
๐ aimindengine.com ยท ๐ง research@aimindengine.com
- Downloads last month
- 8
Model tree for AI-Mind-Engine/Mistral-Small-24B-LOC-L1-v1
Base model
mistralai/Mistral-Small-24B-Base-2501Evaluation results
- True Coherence (LOC)self-reported80.700
- True Coherence Baseline (LOC)self-reported31.700