Instructions to use Multilingual-Multimodal-NLP/LoopCoder-V2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Multilingual-Multimodal-NLP/LoopCoder-V2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Multilingual-Multimodal-NLP/LoopCoder-V2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Multilingual-Multimodal-NLP/LoopCoder-V2", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Multilingual-Multimodal-NLP/LoopCoder-V2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Multilingual-Multimodal-NLP/LoopCoder-V2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Multilingual-Multimodal-NLP/LoopCoder-V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Multilingual-Multimodal-NLP/LoopCoder-V2

SGLang

How to use Multilingual-Multimodal-NLP/LoopCoder-V2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Multilingual-Multimodal-NLP/LoopCoder-V2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Multilingual-Multimodal-NLP/LoopCoder-V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Multilingual-Multimodal-NLP/LoopCoder-V2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Multilingual-Multimodal-NLP/LoopCoder-V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Multilingual-Multimodal-NLP/LoopCoder-V2 with Docker Model Runner:
```
docker model run hf.co/Multilingual-Multimodal-NLP/LoopCoder-V2
```

This model is extremely good

by JohnMolotov - opened about 4 hours ago

Discussion

JohnMolotov

about 4 hours ago

•

edited about 4 hours ago

I don't know if this is the right place to put this, but I'd just like to give my own feedback on the model based on my local testing. Benchmarks can be overfitted, private and subjective testing cannot, though of course is less precise.

All models were run at 4-bit quantization. I made some modifications to yxing-bj/vllm to get it running, as well as to fix bugs effecting the Turing architecture; the most notable changes being the use of fp16 instead of bf16, and Triton attention instead of FlashAttention-2, as it was required for my hardware. The variant I tested was plt_num_loops=2. All other models were run with stock ollama (Q4_K_M).

I tested models in two major size categories, ~8b and ~30b, with one 80b model. To be specific: gemma4:e4b, Qwen2.5-Coder:7B, granite-code:8b, ministral-3:8b, Yi-Coder:9b, qwen3.5:9b, devstral-small-2:24b, gemma4:26b, glm-4.7-flash:30b, qwen3-coder:30b, nemotron-3-nano:30b, qwen3.6:35b, Qwen3-Coder-Next:80b. Models were tasked with solving 10 problems from my codebases (covering bugfixing, refactoring, writing greenfield projects/algorithms, re-writing code in other languages, writing tests, and planning) across 3 languages. Models were first tested one-shot, then a subset were tested in aider, and a smaller subset in opencode. The results were first checked against automated tests and then pairwise blind ranked by myself.

Results:
One-shot, LoopCoder-V2 was ahead of every ~8b model (I'd initially ranked ministral above it on apparent code quality, but ministral's code often didn't actually compile), but beneath every large model except for glm-4.7 and nemotron. This is very good given some of these are leading models or significantly larger than it, however not yet something to write home about. The agentic performance however, is truly mindblowing. In opencode it was only beaten by Qwen3-Coder-Next:80b, while in aider it beat every other model (using either harness). I am truly staggered it beat a 80b model in a blind ranking, even if it's somewhat subjective.

However I will note some specific areas it was notably lacking. The biggest being that without an agentic harness and test cases it completely failed at bugfixing and performed poorly at refactoring, though both were good when agentic. Other than that its weakest areas were planning (which is somewhat inherently one-shot) and test generation which it wasn't bad at but wasn't notably good either. I also ran separate tests on just codebase exploration, and found it was middle of the pack for the ~8b models there. Its strength seems to be precisely in agentic code generation, it isn't a general model, and as long as it's approached with that in mind it's absolutely fantastic.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment