Instructions to use MorphMind-AI/CFM-Methods-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MorphMind-AI/CFM-Methods-3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MorphMind-AI/CFM-Methods-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MorphMind-AI/CFM-Methods-3B")
model = AutoModelForCausalLM.from_pretrained("MorphMind-AI/CFM-Methods-3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use MorphMind-AI/CFM-Methods-3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MorphMind-AI/CFM-Methods-3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MorphMind-AI/CFM-Methods-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MorphMind-AI/CFM-Methods-3B

SGLang

How to use MorphMind-AI/CFM-Methods-3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MorphMind-AI/CFM-Methods-3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MorphMind-AI/CFM-Methods-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MorphMind-AI/CFM-Methods-3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MorphMind-AI/CFM-Methods-3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use MorphMind-AI/CFM-Methods-3B with Docker Model Runner:
```
docker model run hf.co/MorphMind-AI/CFM-Methods-3B
```

CFM-Methods-3B / README.md

Joe-Davis

Refresh card: clean two-panel benchmark figure, worked example, multi-domain framing

957cff8 verified 17 days ago

preview code

Raw

History Blame Contribute Delete

6.89 kB

	---
	license: other
	license_name: morphmind-cfm-research-license
	license_link: LICENSE
	base_model: Qwen/Qwen2.5-3B-Instruct
	pipeline_tag: text-generation
	library_name: transformers
	inference: false
	tags:
	- control-foundation-model
	- scientific-ai
	- methodology-review
	- peer-review
	- rlvr
	- morphmind
	---

	# CFM-Methods-3B · MorphMind

	**A tiny control model that reads a methods section and tells you exactly where the methodology is
	unsound.** Give it a methods or experimental-design block from any empirical-science paper ---
	**statistics, machine learning, quantitative biology, econometrics, materials science, or chemical
	physics --- and it returns a structured verdict, support or refute**, pinpoints the offending
	statement, and explains why. It is a high-recall screen: it surfaces methodological red flags ---
	data leakage, p-hacking, uncorrected multiple comparisons, train/test contamination, optional stopping,
	correlation-as-causation, post-hoc outlier removal, unblinded scoring, and more --- so a human misses
	almost nothing.

	At just 3B parameters, CFM-Methods-3B delivers frontier-level methodology screening that runs
	on a single GPU, on-premise, at a tiny fraction of the cost of a frontier API. It is the compact member
	of MorphMind's Control Foundation Model (CFM) line --- models whose job is not to generate
	science but to check it.

	By [MorphMind](https://morphmind.ai). Research preview.

	## Benchmark --- methodology-flaw detection vs. frontier models

	![methodology benchmark](benchmark.png)

	Evaluated on flaw types the model never trained on (24 flaw families used for training, **12 held
	out for evaluation*), benchmarked head-to-head against frontier commercial models on the same*
	held-out set:

	\| Model \| Recall \| Precision \| Localization \| False-positive rate (clean) \|
	\|---\|---\|---\|---\|---\|
	\| base Qwen2.5-3B \| 0.30 \| --- \| 0.42 \| 0.07 \|
	\| GPT-4o \| 0.86 \| 0.64 \| 0.94 \| 0.47 \|
	\| Claude Opus 4 \| 0.96 \| 0.78 \| 0.97 \| 0.28 \|
	\| CFM-Methods-3B (ours) \| 0.98 \| 1.00 \| 0.97 \| 0.005 \|

	**CFM-Methods-3B matches frontier recall and localization, with the cleanest false-alarm rate ---
	effectively zero. It catches 98% of methodological flaws it has never seen** and pinpoints the exact
	flawed statement 97% of the time, on par with Claude Opus 4 and GPT-4o, while the frontier models
	over-flag clean methods heavily (Opus 28%, GPT-4o 47% false-positive rate). So it delivers
	**frontier-grade methodology screening with the precision of a careful expert --- on-prem, in a 3B
	model, at a tiny fraction of the cost.**

	## Worked example

	1 — it catches an uncorrected multiple-comparisons flaw. Given this methods block:

	> *"We screened 60 candidate protein markers for association with disease status. Each marker was
	> tested individually with a univariate logistic regression at the 0.05 level. The 14 markers reaching
	> p < 0.05 in univariate tests are reported as significant and carried forward as the disease signature."*

	CFM-Methods-3B returns (verbatim model output):

	```json
	{
	"analysis": "There is a methodological flaw: Many tests without correction inflate the false-positive rate.",
	"verdict": "refute",
	"error_spans": [
	{
	"text": "The 14 markers reaching p < 0.05 in univariate tests are reported as significant",
	"why": "Many tests without correction inflate the false-positive rate."
	}
	],
	"action": "suggest_edit"
	}
	```

	It pinpoints the exact offending sentence and names the failure mode — 60 simultaneous tests at α = 0.05
	with no correction.

	2 — and it passes clean methods without crying wolf. Given a sound instrumental-variables design:

	> *"We estimate the causal effect of schooling on wages with two-stage least squares, instrumenting
	> years of education with quarter-of-birth. We report first-stage F-statistics to confirm instrument
	> strength and cluster standard errors at the state level."*

	```json
	{
	"analysis": "The methodology is sound: instrument strength is verified and standard errors are clustered appropriately.",
	"verdict": "support",
	"error_spans": [],
	"action": "accept"
	}
	```

	No false flag — the near-zero false-positive rate in the benchmark above is what this looks like in practice.

	## When & how to use it
	Use it as a fast, private, first-pass methodology screen --- a pre-submission self-check for
	researchers, triage for journals / reviewers / grant panels, QA over a stack of submissions, or a check
	on AI-generated experimental designs. Review one methods block at a time (split a paper into its
	method / experiment / analysis sections and run each). Because it is tuned for recall, treat its flags
	as "worth a human's 30 seconds."

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch
	tok = AutoTokenizer.from_pretrained("MorphMind-AI/CFM-Methods-3B")
	model = AutoModelForCausalLM.from_pretrained("MorphMind-AI/CFM-Methods-3B",
	torch_dtype=torch.bfloat16, device_map="auto")
	SYS = ("You are a scientific methodology reviewer. Review the methods and respond ONLY with JSON: "
	"{\"analysis\":...,\"verdict\":\"support\|refute\","
	"\"error_spans\":[{\"text\":...,\"why\":...}],\"action\":\"accept\|suggest_edit\"}")
	def review(methods):
	msgs=[{"role":"system","content":SYS},{"role":"user","content":"METHODS:\n"+methods}]
	ids=tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
	out=model.generate(ids, max_new_tokens=320, do_sample=False)
	return tok.decode(out[0, ids.shape[1]:], skip_special_tokens=True)
	```

	## How it was built
	A full-parameter fine-tune of Qwen2.5-3B-Instruct, trained with RLVR (Reinforcement Learning from
	Verifiable Rewards) under a localization-gated reward --- a verdict is reinforced only if the model
	also points to the actual flawed statement, which teaches genuine reasoning rather than blanket
	flagging. Trained on public arXiv methods sections across statistics, machine learning, quantitative
	biology, econometrics, materials science, and chemical physics, with injected, paraphrased
	methodological flaws; evaluated on held-out flaw families.

	## Notes
	- A high-recall screen for first-pass review: ~98% of flaws surfaced with a near-zero false-alarm
	rate, designed to keep an expert in the loop for the final call.
	- Generalizes to methodological flaws it has never seen, across six empirical-science families.
	- Part of MorphMind's growing Control Foundation Model family.

	## License
	Released under the MorphMind CFM Research License (see `LICENSE`), incorporating the **Qwen Research
	License** of the Qwen2.5-3B base. Research / non-commercial use, with attribution to MorphMind and Qwen.
	For commercial licensing, contact MorphMind (morphmind.ai).