Instructions to use Multilingual-Multimodal-NLP/LoopCoder-V2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Multilingual-Multimodal-NLP/LoopCoder-V2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Multilingual-Multimodal-NLP/LoopCoder-V2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Multilingual-Multimodal-NLP/LoopCoder-V2", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Multilingual-Multimodal-NLP/LoopCoder-V2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Multilingual-Multimodal-NLP/LoopCoder-V2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Multilingual-Multimodal-NLP/LoopCoder-V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Multilingual-Multimodal-NLP/LoopCoder-V2

SGLang

How to use Multilingual-Multimodal-NLP/LoopCoder-V2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Multilingual-Multimodal-NLP/LoopCoder-V2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Multilingual-Multimodal-NLP/LoopCoder-V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Multilingual-Multimodal-NLP/LoopCoder-V2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Multilingual-Multimodal-NLP/LoopCoder-V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Multilingual-Multimodal-NLP/LoopCoder-V2 with Docker Model Runner:
```
docker model run hf.co/Multilingual-Multimodal-NLP/LoopCoder-V2
```

LoopCoder-V2: Brain Atlas SQLite Database (Community Forward Pass Patch Included)

by juiceb0xc0de - opened 1 day ago

Discussion

juiceb0xc0de

1 day ago

Brain Atlas SQLite Database and Forward Pass Patch

Cross-post: I ran a full GWIQ-style brain atlas on a 7.4 B-parameter Parallel Loop Transformer. It is not a downstream benchmark; it is a look at what the tensors are actually doing, plus the story of how I had to reconstruct the forward pass to even start.

model: Multilingual-Multimodal-NLP/LoopCoder-V2
atlas type: activation census + reasoning-axis brain atlas + OV-circuit SVD
corpus: 8,965 reasoning prompts
layers: 14 (all shared)
loops: 2 per token
dataset: juiceb0xc0de/LoopCoder-V2-atlas

Files available

The dataset repo contains:

loopcoder-v2-atlas.sqlite — the full atlas SQLite database.
forward_pass/modeling_iquestpltcoder.py — community forward-pass reconstruction.
forward_pass/config.json — patched auto_map so AutoModelForCausalLM loads.
forward_pass/configuration_iquestpltcoder.py — config class the model imports.
forward_pass/smoke_test.py — quick load-and-forward sanity check.

What this is

LoopCoder-V2 is a Parallel Loop Transformer (PLT). The config class is IQuestPLTCoderForCausalLM. It has 14 transformer layers, and every one of those layers is run twice per token, with an explicit Cross-Loop Processing (CLP) module that lets the second pass attend back to the first pass. I wanted to see whether a model that reuses the same weights twice can still build clean late-layer subspaces and surgical directions.

Short answer: the architecture is clean, but reasoning does not live in a single place you can patch. I am willing to bet that is because the two loops are entangled by design.

What was run

Activation census over 8,965 prompts.
Per-layer feature taxonomy for every language-tokenizing component.
Per-head analysis on every attention head across all 14 layers.
OV-circuit SVD (W_V @ W_O) on every head.
Logit-lens pass to see which internal directions predict output tokens.
Reasoning-axis contrast and DAS rotation.
Causal validation with capability fence across code, math, reasoning, factual, and multilingual.

The shape of the thing

Property	Value
Layers	14 (shared)
Loops per token	2
Q heads	40
KV heads	8
Head dim	128
Hidden size	5120
MLP intermediate	~15360 (SwiGLU, multiplier 6)
Attention style	GQA + G-SWA mixed local/global
Cross-loop module	CLP between loop 0 and loop 1
Components probed	`q`, `k`, `v`, `o`, `gate`, `up`, `down`, `attn`, `heads`, `mlp`

The missing forward pass

I tried the obvious load first:

AutoModelForCausalLM.from_pretrained(
    "Multilingual-Multimodal-NLP/LoopCoder-V2",
    trust_remote_code=True
)

And got:

ValueError: Unrecognized configuration class

The repo had safetensors and config.json, but no modeling_iquestpltcoder.py. The auto_map pointed to a module that did not exist. To atlas the model I had to:

Reconstruct modeling_iquestpltcoder.py as a community implementation (Fable 5 wrote the first draft from the config and paper clues).
Patch config.json auto_map to point at it.
Overlay both files into a local snapshot_download directory and load from there.
Add scaled_dot_product_attention to the eager attention path so census extraction is fast enough to run.
Patch the custom tokenizer because it was missing attributes the transformers runtime expects.

After that the model loaded and generated coherent code. I mention this because the same workflow probably works for other interesting HF repos that ship weights with broken or not code.

What the numbers suggest

Reasoning does not isolate as a manipulable direction

The reasoning-axis pass found 98 candidate singular-value directions. None of them survived causal validation. Patch a direction that looks like "reasoning" in activation space, and the capability fence rejects it across the board.

I wonder if that is structural: in a PLT, reasoning is a two-step algorithm on shared weights. Loop 0 might sketch a plan, loop 1 might refine and commit, and CLP mixes them. A direction measured in the fused stream is probably a blend of both stages, so editing it breaks the continuation from loop 0 into loop 1. That's just my hypothesis. My next experiment is a loop-separated census.

Middle layers route code structure

The strongest logit-lens signals in layers 5–9 are structural code tokens: braces, indentation, def, return, if, else. Early layers (0–3) are mostly lexical disambiguation, and late layers (10–13) look like next-token commitment and repetition control. This progression is familiar from dense transformers, but it is happening inside layers that are each executed twice.

Cross-loop heads carry the most spectral concentration

OV-circuit spectral concentration is highest in the heads attached to the CLP cross-loop channel. I hink this i where the model invests representational capacity in the loop-to-loop communication path, not in any single forward step. The heads that talk across loops are doing more of the work than the heads inside a single loop.

Features are selective, not monolithic

Auto code-role analysis tagged 3,360 features as selective (sparse, context-dependent). No broad monolithic directions qppear groupedd. That fits a capacity split across many small specialists, which again feels very PLT-flavored: if the same weights have to do two different jobs, the network probably tunnels the hidden space into narrow experts.

Coactivation came up empty

The coactivation table has no surviving pairs in this run. I am not sure yet whether that means feature pairs are genuinely uncorrelated, or whether the prompt set and threshold just did not produce strong pairwise structure. I would not read too much into it until I compare with a non-reasoning corpus.

What the reasoning-axis pass is actually measuring here

The reasoning-axis pass is not a generic "find all important directions" sweep. It specifically looks for directions that separate reasoning prompts from non-reasoning prompts, then uses DAS rotation and the capability fence to check whether removing those directions damages code, math, reasoning, factual, or multilingual ability. So the 98 tested axes are reasoning candidate directions, not a census of every load-bearing direction in the model. None of the pairs passed the fence, which points to reasoning being distributed across the loop structure rather than localized in a clean subspace.

The stuff I deliberately did not separate

This atlas fuses loop 0 and loop 1 into a single activation stream because the hooks captured the standard forward path. I did not yet build a loop-separated census, so I cannot tell you whether loop 0 alone has a planning sub-circuit or loop 1 alone has a refinement sub-circuit. That is the obvious next post.

Caveats

This is a reasoning-axis run. Other axes (style, syntax, domain) might isolate more cleanly.
0 causal survivors is a finding about the architecture, not proof that the atlas pipeline failed.
The coactivation table is empty; read it as "not measured yet" rather than "no structure."
The forward pass is a community reconstruction. It matches the config and generates sensible code, but it is not the authors' implementation.

Bottom line

LoopCoder-V2 is a complete little brain atlas, and the architecture makes it interesting. Distributed reasoning across two loops, cross-loop heads carrying the most spectral weight, and middle layers routing code structure through the logit lens. The surprising part is not what any single layer does; it is that the same layers have to do two different things, and the network has organized itself around that constraint. The next step is to census each loop separately and see if the two-pass story is really what is going on.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment