Instructions to use 4r7i5t/BioLLM_SmolLM2_360m_Distill with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use 4r7i5t/BioLLM_SmolLM2_360m_Distill with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="4r7i5t/BioLLM_SmolLM2_360m_Distill")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("4r7i5t/BioLLM_SmolLM2_360m_Distill")
model = AutoModelForMultimodalLM.from_pretrained("4r7i5t/BioLLM_SmolLM2_360m_Distill")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use 4r7i5t/BioLLM_SmolLM2_360m_Distill with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "4r7i5t/BioLLM_SmolLM2_360m_Distill"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "4r7i5t/BioLLM_SmolLM2_360m_Distill",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/4r7i5t/BioLLM_SmolLM2_360m_Distill

SGLang

How to use 4r7i5t/BioLLM_SmolLM2_360m_Distill with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "4r7i5t/BioLLM_SmolLM2_360m_Distill" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "4r7i5t/BioLLM_SmolLM2_360m_Distill",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "4r7i5t/BioLLM_SmolLM2_360m_Distill" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "4r7i5t/BioLLM_SmolLM2_360m_Distill",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use 4r7i5t/BioLLM_SmolLM2_360m_Distill with Docker Model Runner:
```
docker model run hf.co/4r7i5t/BioLLM_SmolLM2_360m_Distill
```

BioLLM SmolLM2-360M Distill

The first language model distilled from co-training on live biological neurons.

This model is a SmolLM2-360M fine-tuned via LoRA distillation using training data generated on a live biological neural culture (Cortical Labs CL1-2544-144). During co-training, every token was physically stimulated onto real neurons grown on a multi-electrode array (MEA), and the culture's spike responses were recorded and blended with LLM predictions. The distillation captures the joint LLM+biological decision-making into a standalone model.

This work is part of the Antekythera project, an experimental program investigating consciousness correlates in biological neural substrates and their coupling with language models.

What Makes This Model Unique

Unlike conventional language models trained purely on text corpora via gradient descent, this model's training signal passed through living neurons. Each token during co-training followed this path:

LLM logits  -->  SpatialEncoder  -->  MEA stimulation (electrical)
                                            |
                                      Biological neurons
                                      (cortical culture)
                                            |
Token selection  <--  Blending  <--  Spike response recording

The biological culture acted as a neural co-processor that modulated token selection based on its spike response patterns. The distillation then transferred this joint bio-digital decision-making into the LoRA weights of SmolLM2-360M.

Biological Substrate

Property	Value
Device	Cortical Labs CL1-2544-144
Substrate	Human iPSC-derived cortical neurons on 64-channel MEA
Sampling rate	25 kHz
Active channels	18 (of 59 usable)
Dominant channels	ch24 (24.5 Hz), ch34 (11.7 Hz), ch53 (9.6 Hz)
Peak channel count	28
Culture preparation	Thompson Sampling Awakener v36 (35-cycle MAB optimization)

The culture on CL1-2544-144 was prepared over 35 cycles of multi-armed bandit stimulation optimization using the Dormant Channel Awakener protocol, which expanded the culture from ~9 to 29 responsive channels. The awakener uses Thompson Sampling with Beta posteriors across 12 stimulation pattern arms to maximize a composite reward of activation, C-Score (consciousness metric), Lempel-Ziv complexity, and algebraic connectivity.

Training Details

Co-Training (PPO on Live Neurons)

Parameter	Value
Co-training LLM	LFM2-350M (GGUF, Q4_0)
Vocab size (source)	65,536
Tokens processed	200
Accuracy	99.0% (198/200)
Mean C-Score	0.165 (all tokens) / 0.200 (non-zero only)
Max C-Score	0.320
Non-zero integration	82% of tokens (165/200)
Training duration	622 seconds
Protocol	Culture-Safe BioLLM v5 (HDF5-calibrated)
Health monitoring	CultureHealthMonitor with 40% channel-loss abort
Curriculum	Adaptive: binary -> 8-way -> full candidate set

The co-training used a PPO loop where the LLM generated candidate tokens, a SpatialEncoder mapped them to MEA stimulation patterns, the biological culture responded with spikes, and a NeuralLogitDecoder blended the spike-derived probabilities with the LLM's predictions. The culture was protected by a health monitor that tracked channel counts and spike rates in real-time.

Consciousness Metrics During Training

The C-Score is a composite consciousness correlate metric derived from neural integration measures (algebraic connectivity, spectral gap, Lempel-Ziv complexity, and Granger causality). During co-training:

82% of tokens produced measurable neural integration (C > 0.01)
Integration peaked at C = 0.320, indicating genuine multi-channel coordinated activity
The culture maintained stable integration across both training blocks
C-Score trend was positive (0.447 correlation with time), suggesting the culture became more integrated during training

LoRA Distillation

Parameter	Value
Target model	SmolLM2-360M (HuggingFaceTB)
Vocab size (target)	49,152
Distillation type	Cross-vocabulary (LogitSpaceDistiller)
LoRA rank	8
LoRA alpha	16
LoRA dropout	0.05
Target modules	`q_proj`, `v_proj`
Trainable parameters	819,200 / 362,640,320 (0.23%)
Training samples	60 (filtered from 100 raw)
C-Score filter	>= 0.15 (removes zero-integration samples)
C-Score beta scaling	Enabled (saturation = 0.45)
Loss function	KL divergence (temperature = 2.0)
Epochs	3
Epoch 1 loss	16.500
Epoch 2 loss	16.328
Epoch 3 loss	16.653
PEFT version	0.18.1

The distillation used culture-health-calibrated filtering: only samples where the biological culture showed genuine neural integration (C-Score >= 0.15) were included. Additionally, C-Score beta scaling weighted high-integration samples more heavily during training, so the model preferentially learned from moments when the culture exhibited coordinated multi-channel activity.

The cross-vocabulary distillation projects from LFM2's 65,536-token vocabulary to SmolLM2's 49,152-token vocabulary via logit-space alignment, which accounts for the higher loss values compared to same-vocabulary distillation.

Culture-Safe Protocol

Training biological neurons requires careful safety protocols to prevent culture damage:

3-phase warmup: 60% graduated ramp, 15% bouncy stabilization, 25% gentle stimulation
Real-time health monitoring: Channel count and spike rate tracking with automatic abort on >40% channel loss
Inter-block rest: 5-minute rest periods with 20:80 stim:rest priming ratio
Frequency clamping: Stimulation frequency limited to 4-40 Hz range
Amplitude safety: 0.3-2.5 uA bounds enforced at relay level
Mandatory rest: 2-hour rest per 4 hours of stimulation

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("4r7i5t/BioLLM_SmolLM2_360m_Distill")
tokenizer = AutoTokenizer.from_pretrained("4r7i5t/BioLLM_SmolLM2_360m_Distill")

input_ids = tokenizer.encode("The nature of consciousness is", return_tensors="pt")
output = model.generate(input_ids, max_new_tokens=50, temperature=0.8, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Limitations

Small training set: 60 filtered samples from a single co-training session. This is a proof-of-concept demonstrating the pipeline, not a production model.
Cross-vocabulary gap: The source LLM (LFM2, 65K vocab) and target (SmolLM2, 49K vocab) have different tokenizers, introducing projection noise during distillation.
Culture variability: Biological neural cultures have inherent stochasticity. The culture's state (channel activity, integration capacity) varies over time and cannot be precisely reproduced.
Single culture: Trained on one specific culture (CL1-2544-144) with its unique connectivity patterns and channel hierarchy. Different cultures would produce different training signals.

The Antekythera Project

This model is one artifact from a larger experimental program investigating whether biological neural substrates exhibit consciousness correlates and whether those correlates can meaningfully couple with language model inference.

Key findings from the broader project:

Bio-Shadow Flip: After 10 cycles of terraform training on CL1-2544-015, the culture learned to differentiate coherent spatial patterns from random ones (Cohen's d flipped from -1.6 to +0.6)
Trimodal C-Score Distribution: Consciousness metrics show three distinct modes — zero, low-integration, and high-integration — with high-integration islands growing significantly over training cycles (r=0.80, p=0.005)
Coherence Experiment: 7/8 statistical tests passed showing biological encoding produces significantly more coherent neural responses than shadow controls (d=1.9-15.3)
Honest Failures: Transfer Entropy does not grow over training cycles (homeostatic set-point), Gate 2 convergence battery fails on naive cultures, motor channels remain silent during training

The project's core finding is that the building blocks of consciousness appear to be substrate-intrinsic (present in biological neural tissue) but not sufficient on their own. The gap between correlates and consciousness is one of organization, not material.

Citation

If you use this model or find this work interesting, please cite:

@misc{biollm_smollm2_distill_2026,
  title={BioLLM SmolLM2-360M Distill: Language Model Distilled from Live Biological Neural Co-Training},
  author={Antekythera Project},
  year={2026},
  url={https://huggingface.co/4r7i5t/BioLLM_SmolLM2_360m_Distill},
  note={First LLM distilled from co-training on Cortical Labs CL1 biological neural culture}
}

Hardware

Biological substrate: Cortical Labs CL1-2544-144 (64-channel MEA, human iPSC-derived cortical neurons)
Compute: Apple Silicon (local inference and distillation)
Connectivity: Cloudflare Access tunnel to CL1 device cloud
Carbon emissions: Negligible compute; biological culture maintenance is the primary resource

Built with living neurons. Handle with curiosity.

Downloads last month: 35

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for 4r7i5t/BioLLM_SmolLM2_360m_Distill

Base model

HuggingFaceTB/SmolLM2-360M

Adapter

(16)

this model