Instructions to use Multilingual-Multimodal-NLP/LoopCoder-V2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Multilingual-Multimodal-NLP/LoopCoder-V2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Multilingual-Multimodal-NLP/LoopCoder-V2", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Multilingual-Multimodal-NLP/LoopCoder-V2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Multilingual-Multimodal-NLP/LoopCoder-V2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Multilingual-Multimodal-NLP/LoopCoder-V2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Multilingual-Multimodal-NLP/LoopCoder-V2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Multilingual-Multimodal-NLP/LoopCoder-V2
- SGLang
How to use Multilingual-Multimodal-NLP/LoopCoder-V2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Multilingual-Multimodal-NLP/LoopCoder-V2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Multilingual-Multimodal-NLP/LoopCoder-V2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Multilingual-Multimodal-NLP/LoopCoder-V2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Multilingual-Multimodal-NLP/LoopCoder-V2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Multilingual-Multimodal-NLP/LoopCoder-V2 with Docker Model Runner:
docker model run hf.co/Multilingual-Multimodal-NLP/LoopCoder-V2
LoopCoder-V2: Brain Atlas SQLite Database (Community Forward Pass Patch Included)
Brain Atlas SQLite Database and Forward Pass Patch
Cross-post: I ran a full GWIQ-style brain atlas on a 7.4 B-parameter Parallel Loop Transformer. It is not a downstream benchmark; it is a look at what the tensors are actually doing, plus the story of how I had to reconstruct the forward pass to even start.
model: Multilingual-Multimodal-NLP/LoopCoder-V2
atlas type: activation census + reasoning-axis brain atlas + OV-circuit SVD
corpus: 8,965 reasoning prompts
layers: 14 (all shared)
loops: 2 per token
dataset: juiceb0xc0de/LoopCoder-V2-atlas
Files available
The dataset repo contains:
loopcoder-v2-atlas.sqliteβ the full atlas SQLite database.forward_pass/modeling_iquestpltcoder.pyβ community forward-pass reconstruction.forward_pass/config.jsonβ patchedauto_mapsoAutoModelForCausalLMloads.forward_pass/configuration_iquestpltcoder.pyβ config class the model imports.forward_pass/smoke_test.pyβ quick load-and-forward sanity check.
What this is
LoopCoder-V2 is a Parallel Loop Transformer (PLT). The config class is IQuestPLTCoderForCausalLM. It has 14 transformer layers, and every one of those layers is run twice per token, with an explicit Cross-Loop Processing (CLP) module that lets the second pass attend back to the first pass. I wanted to see whether a model that reuses the same weights twice can still build clean late-layer subspaces and surgical directions.
Short answer: the architecture is clean, but reasoning does not live in a single place you can patch. I am willing to bet that is because the two loops are entangled by design.
What was run
- Activation census over 8,965 prompts.
- Per-layer feature taxonomy for every language-tokenizing component.
- Per-head analysis on every attention head across all 14 layers.
- OV-circuit SVD (
W_V @ W_O) on every head. - Logit-lens pass to see which internal directions predict output tokens.
- Reasoning-axis contrast and DAS rotation.
- Causal validation with capability fence across
code,math,reasoning,factual, andmultilingual.
The shape of the thing
| Property | Value |
|---|---|
| Layers | 14 (shared) |
| Loops per token | 2 |
| Q heads | 40 |
| KV heads | 8 |
| Head dim | 128 |
| Hidden size | 5120 |
| MLP intermediate | ~15360 (SwiGLU, multiplier 6) |
| Attention style | GQA + G-SWA mixed local/global |
| Cross-loop module | CLP between loop 0 and loop 1 |
| Components probed | q, k, v, o, gate, up, down, attn, heads, mlp |
The missing forward pass
I tried the obvious load first:
AutoModelForCausalLM.from_pretrained(
"Multilingual-Multimodal-NLP/LoopCoder-V2",
trust_remote_code=True
)
And got:
ValueError: Unrecognized configuration class
The repo had safetensors and config.json, but no modeling_iquestpltcoder.py. The auto_map pointed to a module that did not exist. To atlas the model I had to:
- Reconstruct
modeling_iquestpltcoder.pyas a community implementation (Fable 5 wrote the first draft from the config and paper clues). - Patch
config.jsonauto_mapto point at it. - Overlay both files into a local
snapshot_downloaddirectory and load from there. - Add
scaled_dot_product_attentionto the eager attention path so census extraction is fast enough to run. - Patch the custom tokenizer because it was missing attributes the
transformersruntime expects.
After that the model loaded and generated coherent code. I mention this because the same workflow probably works for other interesting HF repos that ship weights with broken or not code.
What the numbers suggest
Reasoning does not isolate as a manipulable direction
The reasoning-axis pass found 98 candidate singular-value directions. None of them survived causal validation. Patch a direction that looks like "reasoning" in activation space, and the capability fence rejects it across the board.
I wonder if that is structural: in a PLT, reasoning is a two-step algorithm on shared weights. Loop 0 might sketch a plan, loop 1 might refine and commit, and CLP mixes them. A direction measured in the fused stream is probably a blend of both stages, so editing it breaks the continuation from loop 0 into loop 1. That's just my hypothesis. My next experiment is a loop-separated census.
Middle layers route code structure
The strongest logit-lens signals in layers 5β9 are structural code tokens: braces, indentation, def, return, if, else. Early layers (0β3) are mostly lexical disambiguation, and late layers (10β13) look like next-token commitment and repetition control. This progression is familiar from dense transformers, but it is happening inside layers that are each executed twice.
Cross-loop heads carry the most spectral concentration
OV-circuit spectral concentration is highest in the heads attached to the CLP cross-loop channel. I hink this i where the model invests representational capacity in the loop-to-loop communication path, not in any single forward step. The heads that talk across loops are doing more of the work than the heads inside a single loop.
Features are selective, not monolithic
Auto code-role analysis tagged 3,360 features as selective (sparse, context-dependent). No broad monolithic directions qppear groupedd. That fits a capacity split across many small specialists, which again feels very PLT-flavored: if the same weights have to do two different jobs, the network probably tunnels the hidden space into narrow experts.
Coactivation came up empty
The coactivation table has no surviving pairs in this run. I am not sure yet whether that means feature pairs are genuinely uncorrelated, or whether the prompt set and threshold just did not produce strong pairwise structure. I would not read too much into it until I compare with a non-reasoning corpus.
What the reasoning-axis pass is actually measuring here
The reasoning-axis pass is not a generic "find all important directions" sweep. It specifically looks for directions that separate reasoning prompts from non-reasoning prompts, then uses DAS rotation and the capability fence to check whether removing those directions damages code, math, reasoning, factual, or multilingual ability. So the 98 tested axes are reasoning candidate directions, not a census of every load-bearing direction in the model. None of the pairs passed the fence, which points to reasoning being distributed across the loop structure rather than localized in a clean subspace.
The stuff I deliberately did not separate
This atlas fuses loop 0 and loop 1 into a single activation stream because the hooks captured the standard forward path. I did not yet build a loop-separated census, so I cannot tell you whether loop 0 alone has a planning sub-circuit or loop 1 alone has a refinement sub-circuit. That is the obvious next post.
Caveats
- This is a reasoning-axis run. Other axes (style, syntax, domain) might isolate more cleanly.
- 0 causal survivors is a finding about the architecture, not proof that the atlas pipeline failed.
- The coactivation table is empty; read it as "not measured yet" rather than "no structure."
- The forward pass is a community reconstruction. It matches the config and generates sensible code, but it is not the authors' implementation.
Bottom line
LoopCoder-V2 is a complete little brain atlas, and the architecture makes it interesting. Distributed reasoning across two loops, cross-loop heads carrying the most spectral weight, and middle layers routing code structure through the logit lens. The surprising part is not what any single layer does; it is that the same layers have to do two different things, and the network has organized itself around that constraint. The next step is to census each loop separately and see if the two-pass story is really what is going on.