HENLA-CONFED-3B Β· PREFIX-V4 Β· Step 300

Balanced short-prompt demo branch β€” best grammar/article probes in the HENLA-CONFED family

⚠️ Research artifact. This checkpoint is part of an experimental constrained-compute research project. It is not a production model, not an AGI claim, and not a leaderboard entry. Read the limitations section before using it.


Model summary

HENLA-CONFED-3B-PREFIX-V4-STEP300 is a prefix-specialized branch of the HENLA-CONFED-3B causal language model family. It was fine-tuned from the general CLEANLM-STEP70000 base using prefix-only loss over a small set of grammar/article and HENLA-identity continuations, for 300 steps.

Within the HENLA-CONFED family it is the best balanced demonstrator: strongest on targeted grammar/article probes, competitive HENLA-identity completions, and reasonable short general continuations β€” at the cost of some reduction in open-ended diversity compared to the base.

Checkpoint Best use Limitation
PREFIX-V4-STEP300 ← this one Balanced controlled demo; grammar/article probes; short HENLA completions More specialized than CLEANLM; prefix conditioning can reduce open-ended diversity
CLEANLM-STEP70000 General base for further training and continuation Weak HENLA identity; grammar/article instability
HENLA-PREFIX-150 Focused HENLA-description prompts Strongly biased toward HENLA vocabulary on unrelated prompts

Architecture

HENLA-CONFED is a non-standard causal LM. Each Transformer block replaces the single feed-forward module with K parallel cognitive-area MLPs selected by a learned softmax gate, then fused residually before the next block.

token + position embeddings
β†’ N Γ— HenlaConfederatedBlock
     β†’ causal self-attention
     β†’ K parallel cognitive-area MLPs   ← the confederated routing
     β†’ learned gate (softmax over K areas)
     β†’ weighted area fusion
     β†’ residual stream
β†’ tied LM head

Configuration

Parameter Value
Approximate parameters ~2.844 B
Layers 24
Hidden size 1 280
Attention heads 16
Cognitive areas per block 8
Context length 512
Vocabulary GPT-2 style, 50 257 tokens
Pad / EOS token 50 256
Framework PyTorch / Hugging Face Transformers (remote code)

Training lineage

Bootstrap (local corpus, smoke test)
  └─► FineWeb-Edu streaming (steps ~30k β†’ 57k)
        └─► LR3E5 branch (plateaued ~57k, degenerate samples)
              └─► GATEFIX 65k (auxiliary gate-entropy + area-usage loss)
                    └─► CLEANLM 70k ← general base  ●
                          └─► PREFIX-V4 300 steps ← this checkpoint  ●

PREFIX-V4 fine-tuning details:

Setting Value
Base CLEANLM-STEP70000
Loss Prefix-only (loss computed only on the forced continuation, not the prompt)
Steps 300 (checkpoint published at step 300)
Learning rate 2e-5
Batch size 2
Sequence length 128

Progression observed during training: at step 50 behavior was partial; at step 200 HENLA identity was strong; at step 250–300 the best grammar/identity balance was reached.


Expected behaviors (short-prompt probes)

These are illustrative examples from internal diagnostics. They are not guaranteed outputs and can vary with temperature, sampling parameters, and context.

Prompt Expected completion style
HENLA is experimental neuro-symbolic cognitive architecture …
HENLA is not not conscious / not human-level …
Artificial intelligence is an important tool …
The researchers used a device …
The solar energy system is an important source …

Internal benchmark results

HENLA family benchmark

Categories: HENLA identity, grammar/article probes, short general continuation, repetition, bad-pattern checks, top-token sanity.

Verdict within HENLA family:

  • Best overall: PREFIX-V4-STEP300
  • Best HENLA identity: HENLA-PREFIX-150
  • Best general base: CLEANLM-70K

Small-LM heuristic comparison

Non-HENLA short prompts, heuristic deterministic scoring. Not a standardized leaderboard.

Model Overall (mean) General (mean) Grammar/article (mean)
Phi-3.5-mini-instruct 2.25 3.00 1.00
HENLA PREFIX-V4 ← this 2.13 2.00 2.33
Qwen2.5-3B 1.81 2.30 1.00
HENLA CLEANLM 70K 1.50 2.10 0.50

Correct interpretation: PREFIX-V4 ranked second overall and first on grammar/article probes under this heuristic. Phi-3.5-mini is stronger on general prompts. This comparison uses a small, targeted prompt set and should not be read as a broad claim of superiority over mature small LMs.


Inference economics

HENLA-CONFED-3B-PREFIX-V4 is a constrained-compute experimental confederated neuro-symbolic language model produced with approximately €50 total compute cost.

On a rented NVIDIA A40 48 GB using Hugging Face Transformers bf16 greedy decoding, HENLA reaches 142.4 decode tokens/s in a short-prompt batch-4 scenario, with ~5.34 GB peak VRAM and an estimated €0.78 per 1M output tokens at €0.40/GPU-hour.

In a heavier telemetry run using a longer technical prompt, batch size 4, and 128 forced output tokens, HENLA reaches 92.3 decode tokens/s with ~5.37 GB allocated VRAM, 94% average GPU utilization, ~287 W average power draw, and **€1.20 per 1M output tokens**.

Scenario Tokens/s Peak VRAM Cost / 1M tokens
Short prompt, batch 4, greedy 142.4 ~5.34 GB ~€0.78
Long technical prompt, batch 4, 128 tokens 92.3 ~5.37 GB ~€1.20

Measured on NVIDIA A40 48 GB, CUDA 12.8, bf16, HF Transformers 4.44.2, greedy decoding. Cost estimate assumes €0.40/GPU-hour.

Local diagnostics show that HENLA is weaker than mature external baselines on general and technical text quality, but obtains the lowest local perplexity on a small HENLA-domain corpus. The appropriate positioning is low-cost experimental architecture, low-memory inference, and HENLA-domain specialization β€” not general benchmark leadership.


How to load

Custom architecture requires trust_remote_code=True.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

repo = "RthItalia/HENLA-CONFED-3B-PREFIX-V4-STEP300"

tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()

Basic generation

prompt = "HENLA is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=64,
        do_sample=True,
        temperature=0.8,
        top_p=0.95,
        repetition_penalty=1.1,
    )

print(tokenizer.decode(out[0], skip_special_tokens=True))

Stable software environment

Component Version
PyTorch 2.4.1
Transformers 4.44.2
Tokenizers 0.19.1
Accelerate 0.33.0

Note: Updating Transformers beyond 4.44.2 can trigger compatibility issues with this checkpoint family. Pin the versions above for reproducible loading.

Serialization note: The tied embedding / LM-head weight requires safe_serialization=False at save time. Remote-code loading handles this transparently.


Inference economics

Measured on a rented NVIDIA A40 48 GB, HF Transformers, bf16, greedy decoding.

Short-prompt scenario (batch 4)

Metric Value
Decode throughput 142.4 tokens / s
Peak VRAM ~5.34 GB
Estimated cost ~€0.78 / 1M output tokens (at €0.40 / GPU-hour)

Heavy telemetry scenario (longer technical prompt, batch 4, 128 generated tokens)

Metric Value
Decode throughput 92.3 tokens / s
Peak VRAM allocated ~5.37 GB
Average GPU utilization ~94 %
Average power draw ~287 W
Estimated cost ~€1.20 / 1M output tokens (at €0.40 / GPU-hour)

Notes

  • ~5.3 GB peak VRAM is close to the theoretical minimum for a 2.84B bf16 model, enabled by the tied embedding / LM-head architecture.
  • The model fits comfortably on consumer GPUs with β‰₯8 GB VRAM (RTX 3070/4060 Ti class and above).
  • HENLA is not competitive with mature external baselines on general or technical text quality. It achieves the lowest perplexity on a small HENLA-domain corpus. The appropriate positioning is low-cost experimental architecture, low-memory inference, and HENLA-domain specialization β€” not general benchmark leadership.

Limitations

  • Internal validation only. All benchmarks are internal diagnostics, not independent external evaluations.
  • Prefix specialization. The 300-step prefix run narrows the conditional distribution. Open-ended generation on prompts far from the training prefixes may drift or degrade compared to the CLEANLM base.
  • Linguistically weak compared to mature small LMs. HENLA-CONFED was trained at a fraction of the compute and data scale of Phi, Qwen, Llama, or Gemma.
  • Short context. Maximum context length is 512 tokens.
  • No instruction-following. This is not an instruction-tuned model. It does not follow chat templates or system prompts.
  • Grammar and article errors. Expect residual grammatical instability, especially on longer continuations.
  • No multi-seed confidence. Results are single-run diagnostics without statistical confidence intervals.
  • No external human evaluation.

Project context and related checkpoints

HENLA (Hypergraph Embodied Neural Learning Architecture) is a constrained-compute research program that progressed from an embodied hypergraph learner (HENLA-0) to a modular evidence-routing cognitive architecture (HENLA-MoC) and finally to this confederated-area causal LM family. The full development trajectory is documented in the companion white paper.

Total external compute for the HENLA-CONFED line: ~EUR 325 of rented GPU time (NVIDIA A40, CUDA 12.8), ~2-day development cycle.

Family

Repository Role
RthItalia/HENLA-CONFED-3B-FINEWEB-CLEANLM-STEP70000 General base β€” recommended for continuation and further fine-tuning
RthItalia/HENLA-CONFED-3B-PREFIX-V4-STEP300 ← you are here Balanced demo branch β€” grammar/article probes, controlled HENLA identity
RthItalia/HENLA-CONFED-3B-HENLA-PREFIX-150 HENLA-identity branch β€” use only for HENLA-description prompts

License

HENLA Research and Education Non-Commercial License

license: other
license_name: henla-research-education-non-commercial

Permitted: academic research Β· independent research Β· educational use Β· student projects Β· evaluation and benchmarking Β· non-commercial experimentation.

Not permitted without prior written permission: commercial use Β· paid products or services Β· hosted commercial inference Β· resale Β· integration into commercial applications Β· training / distillation / fine-tuning for commercial deployment.


Citation / acknowledgment

If you use this model in research, please cite the companion white paper or link to this repository and the HENLA project logs.

Road / RthItalia (2026). HENLA: a constrained-compute study of hypergraph memory,
federated cognitive routing, and a 3B confederated language model. Draft white paper,
May 2026. https://huggingface.co/RthItalia

Ethics and safety

This checkpoint is a non-commercial research artifact. It is not intended for medical, legal, financial, safety-critical, or automated decision-making use. It does not represent AGI, consciousness, or human-level intelligence. Correct use is research, education, benchmarking, and non-commercial experimentation.

Downloads last month
48
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support