Instructions to use bknyaz/Qwen3-Coder-Next-REAM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use bknyaz/Qwen3-Coder-Next-REAM with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="bknyaz/Qwen3-Coder-Next-REAM")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("bknyaz/Qwen3-Coder-Next-REAM")
model = AutoModelForCausalLM.from_pretrained("bknyaz/Qwen3-Coder-Next-REAM")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use bknyaz/Qwen3-Coder-Next-REAM with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "bknyaz/Qwen3-Coder-Next-REAM"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bknyaz/Qwen3-Coder-Next-REAM",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/bknyaz/Qwen3-Coder-Next-REAM

SGLang

How to use bknyaz/Qwen3-Coder-Next-REAM with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "bknyaz/Qwen3-Coder-Next-REAM" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bknyaz/Qwen3-Coder-Next-REAM",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "bknyaz/Qwen3-Coder-Next-REAM" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bknyaz/Qwen3-Coder-Next-REAM",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use bknyaz/Qwen3-Coder-Next-REAM with Docker Model Runner:
```
docker model run hf.co/bknyaz/Qwen3-Coder-Next-REAM
```

bknyaz commited on Feb 11

Commit

2cfa165

verified ·

1 Parent(s): 5eed4eb

Upload Qwen3NextForCausalLM

Browse files

Files changed (29) hide show

README.md +199 -0
config.json +108 -0
generation_config.json +12 -0
model-00001-of-00025.safetensors +3 -0
model-00002-of-00025.safetensors +3 -0
model-00003-of-00025.safetensors +3 -0
model-00004-of-00025.safetensors +3 -0
model-00005-of-00025.safetensors +3 -0
model-00006-of-00025.safetensors +3 -0
model-00007-of-00025.safetensors +3 -0
model-00008-of-00025.safetensors +3 -0
model-00009-of-00025.safetensors +3 -0
model-00010-of-00025.safetensors +3 -0
model-00011-of-00025.safetensors +3 -0
model-00012-of-00025.safetensors +3 -0
model-00013-of-00025.safetensors +3 -0
model-00014-of-00025.safetensors +3 -0
model-00015-of-00025.safetensors +3 -0
model-00016-of-00025.safetensors +3 -0
model-00017-of-00025.safetensors +3 -0
model-00018-of-00025.safetensors +3 -0
model-00019-of-00025.safetensors +3 -0
model-00020-of-00025.safetensors +3 -0
model-00021-of-00025.safetensors +3 -0
model-00022-of-00025.safetensors +3 -0
model-00023-of-00025.safetensors +3 -0
model-00024-of-00025.safetensors +3 -0
model-00025-of-00025.safetensors +3 -0
model.safetensors.index.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

config.json ADDED Viewed

	@@ -0,0 +1,108 @@

+{
+  "architectures": [
+    "Qwen3NextForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0,
+  "bos_token_id": 151643,
+  "decoder_sparse_step": 1,
+  "dtype": "bfloat16",
+  "eos_token_id": 151645,
+  "full_attention_interval": 4,
+  "head_dim": 256,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 5120,
+  "layer_types": [
+    "linear_attention",
+    "linear_attention",
+    "linear_attention",
+    "full_attention",
+    "linear_attention",
+    "linear_attention",
+    "linear_attention",
+    "full_attention",
+    "linear_attention",
+    "linear_attention",
+    "linear_attention",
+    "full_attention",
+    "linear_attention",
+    "linear_attention",
+    "linear_attention",
+    "full_attention",
+    "linear_attention",
+    "linear_attention",
+    "linear_attention",
+    "full_attention",
+    "linear_attention",
+    "linear_attention",
+    "linear_attention",
+    "full_attention",
+    "linear_attention",
+    "linear_attention",
+    "linear_attention",
+    "full_attention",
+    "linear_attention",
+    "linear_attention",
+    "linear_attention",
+    "full_attention",
+    "linear_attention",
+    "linear_attention",
+    "linear_attention",
+    "full_attention",
+    "linear_attention",
+    "linear_attention",
+    "linear_attention",
+    "full_attention",
+    "linear_attention",
+    "linear_attention",
+    "linear_attention",
+    "full_attention",
+    "linear_attention",
+    "linear_attention",
+    "linear_attention",
+    "full_attention"
+  ],
+  "linear_conv_kernel_dim": 4,
+  "linear_key_head_dim": 128,
+  "linear_num_key_heads": 16,
+  "linear_num_value_heads": 32,
+  "linear_value_head_dim": 128,
+  "max_position_embeddings": 262144,
+  "merge_args": {
+    "balance_group_size": 32,
+    "dataset": "c4+math+the-stack-smol",
+    "expert_saliency": "reap",
+    "gate_softmax": true,
+    "group": "freq_logits",
+    "merge": "align_logits_weights",
+    "merge_size": 384,
+    "merger_bs": 3072,
+    "merger_seq_len": 512,
+    "pca_dim": 64,
+    "precompute_input": false,
+    "use_gate_output": true
+  },
+  "mlp_only_layers": [],
+  "model_type": "qwen3_next",
+  "moe_intermediate_size": 512,
+  "norm_topk_prob": true,
+  "num_attention_heads": 16,
+  "num_experts": 384,
+  "num_experts_per_tok": 10,
+  "num_hidden_layers": 48,
+  "num_key_value_heads": 2,
+  "output_router_logits": false,
+  "partial_rotary_factor": 0.25,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 5000000,
+  "router_aux_loss_coef": 0.001,
+  "shared_expert_intermediate_size": 512,
+  "tie_word_embeddings": false,
+  "transformers_version": "4.57.6",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "bos_token_id": 151643,
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "top_k": 40,
+  "top_p": 0.95,
+  "transformers_version": "4.57.6"
+}

model-00001-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2d05ac9ef4cb9ecfebb9ea0c07d827e411710e8ec89f7f8ea20336e3079805ae
+size 4998958552

model-00002-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:587958e37e3e94cdf137746cafa7fceb1eddaf9d06109531788e903cbce61f16
+size 4999205184

model-00003-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5656cec0107b77810aad983828687464050a035b9a12db63698522ffb1afd6c4
+size 4999531600

model-00004-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:78c712b841084783baff43780d57d0f595e88555fc3d3b32df8d3fafcf29cc54
+size 4999205184

model-00005-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:72d34e466df6b17ee70aae29ada78ff2d467b3329542d9dbd7733239e7f9ec4e
+size 4999531600

model-00006-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d9f12cd2d23e2dfa8cfc2aecbbb1935a757e28806cc72f1d94208c828b63b5d9
+size 4999207320

model-00007-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60e78d15e0860175eff4f0f51412cc8673e4c36f1e164d39f3b2895eec9e47e7
+size 4999533936

model-00008-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bf5937689e84a15395c84c67f5946a1d990224b008ec9bb7a9775a3b2eb1a97a
+size 4999207528

model-00009-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:16ab7edfa0d90d63ea8812ccc29a2b09e4f97ca5a191bc25b4664e6851e81789
+size 4999533936

model-00010-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:07664dbe56c474d7d2e5047935c4675c1c294bca8d90b8f283c767dffd7cd5c9
+size 4999207528

model-00011-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:de94a5395cabf9d7b98bd21dfc00aeaca32444161ab08bca4004a2f8b355d794
+size 4999533936

model-00012-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:350aa2c65beaa9e29772ae22aac53c5d967bbdc1e058ec162f343979b1a2f1db
+size 4999207528

model-00013-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a349a47a46964ce72e4ed46053cd5638f895aa8950d451c39a0abdc1293c636e
+size 4999533936

model-00014-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:06d62ea7565b30c8eba6e9664018577c468922f71a63ecd1ff2e2273e3d26372
+size 4999207528

model-00015-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:791a16c32f3344407d05614cc3cb671cb60e772614c0030b916404572006b53a
+size 4999533936

model-00016-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7d660551fbe91884f25d154a50076d13f7b75b18033b30140a8731f0832e057c
+size 4999207528

model-00017-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7d0fd2ab432804d4c0c68ae845ea0cc36a3644e051c57f2fae153310767be999
+size 4999533936

model-00018-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e860e5241fe4810ff8ee5739c5745e275b4dea2cc105d4b385e45f2a67042fcc
+size 4999207528

model-00019-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1326dfed8b8cda9f02ced8113a5a777ea0685bd141a7a2358f0eb89536bf7303
+size 4999533936

model-00020-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:792f4b7d700430d5fe5cb7501071ef136ddf35fa5647887dba40a2da51c8e272
+size 4999207528

model-00021-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:18a7f7f7078c5d97cfbaf4fe632eeff10cd66afc85d94b3892889fc3abd112d5
+size 4999533936

model-00022-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:167016a0db48a2f56a215e62341b09381c7be59152c83acaa310a2f8e1946292
+size 4999207528

model-00023-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ba06f442b5fb52063a048124528f583ee6c9fe5d3bf868ac269c911b5016a121
+size 4999533936

model-00024-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c73a7c3b68e2696d06efdeb937717341fb90a95d208be1e795943510013b9b21
+size 4999207512

model-00025-of-00025.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a666f2749a7131fc27f23420e248fd43bbbdc67a8dcb8d852c88480220d2c92d
+size 691556896

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff