Instructions to use srivarenya/MoM-python-slm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use srivarenya/MoM-python-slm with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="srivarenya/MoM-python-slm")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("srivarenya/MoM-python-slm")
model = AutoModelForCausalLM.from_pretrained("srivarenya/MoM-python-slm")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use srivarenya/MoM-python-slm with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "srivarenya/MoM-python-slm"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srivarenya/MoM-python-slm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/srivarenya/MoM-python-slm

SGLang

How to use srivarenya/MoM-python-slm with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "srivarenya/MoM-python-slm" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srivarenya/MoM-python-slm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "srivarenya/MoM-python-slm" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srivarenya/MoM-python-slm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use srivarenya/MoM-python-slm with Docker Model Runner:
```
docker model run hf.co/srivarenya/MoM-python-slm
```

MoM-python-slm / README.md

srivarenya

Upload README.md with huggingface_hub

a29c68f verified 6 days ago

preview code

Raw

History Blame Contribute Delete

2.77 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct
	pipeline_tag: text-generation
	library_name: transformers
	tags: [code, python, qwen2.5-coder, dora, mixture-of-models, code-generation]
	language: [en]
	---

	# MoM-Python-SLM (1.5B)

	The Python code-generation node of a Mixture-of-Models (MoM) mesh — a set of small,
	specialized Qwen2.5-Coder SLMs (shared tokenizer) coordinated by a lightweight router, aiming to beat
	frontier generalists on coding by specialization depth rather than parameter count.

	This node is a single-turn code generator (not an agent): given a Python task (optionally with an
	upstream context packet), it returns reasoning followed by code. It shares the Qwen2.5-Coder
	tokenizer with the other generative nodes, which is what makes logit-space fusion across the mesh
	valid.

	- Base: [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct)
	- Method: DoRA r=64 (≈4.6% trainable), SFT (Phase A 1ep + Phase B 2ep), then merged.
	- Data: 476K instances (decontaminated vs HumanEval/MBPP, 0 overlap) built from the complete
	CPython docs + Flask/Requests source, issues/PRs, CVEs, and execution-verified synthetic problems.

	## Benchmarks (greedy pass@1)

	\| Suite \| Metric \| base \| this model \|
	\|---\|---\|---\|---\|
	\| HumanEval \| pass@1 \| 68.9 \| 70.7 \|
	\| MBPP \| pass@1 \| 66.7 \| 69.6 \|
	\| Domain (held-out) \| `spec_to_code` exec \| 0.632 \| 0.714 (+8.2) \|
	\| Domain (held-out) \| `api_signature` param-recall \| 0.217 \| 0.299 (+8.2) \|
	\| Domain (held-out) \| `problem_solving` exec \| 0.700 \| 0.713 (parity) \|

	The largest gains are on library/API capability (writing correct code from a spec, recalling API
	signatures) — the dimension HumanEval/MBPP are saturated on and can't measure. The repo's
	self-contained domain-eval notebook reproduces these.

	## Recipe findings (load-bearing)
	- Low DoRA rank wins: r=64 specializes without forgetting; r=256 catastrophically regressed
	(HumanEval 60.4 < base).
	- Moderate reasoning wins: the ~25%-reasoning recipe (this model) beat a 98%-reasoning sibling,
	whose HumanEval collapsed to 47 (always-reason prose fights the signature-completion format).

	## Usage
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	tok = AutoTokenizer.from_pretrained("srivarenya/MoM-python-slm")
	model = AutoModelForCausalLM.from_pretrained(
	"srivarenya/MoM-python-slm", dtype="bfloat16", device_map="auto")
	```
	Prompt with the training system prompt + a Python task; the model returns reasoning then code.

	Next step in the pipeline: GRPO/RLVR against an execution-grounded reward to push past the
	instruct-tuning ceiling. Code, training recipe, and eval harnesses: project repository.