Instructions to use inclusionAI/Ring-mini-2.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use inclusionAI/Ring-mini-2.0 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="inclusionAI/Ring-mini-2.0", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("inclusionAI/Ring-mini-2.0", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use inclusionAI/Ring-mini-2.0 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "inclusionAI/Ring-mini-2.0"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/Ring-mini-2.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/inclusionAI/Ring-mini-2.0

SGLang

How to use inclusionAI/Ring-mini-2.0 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "inclusionAI/Ring-mini-2.0" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/Ring-mini-2.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "inclusionAI/Ring-mini-2.0" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "inclusionAI/Ring-mini-2.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use inclusionAI/Ring-mini-2.0 with Docker Model Runner:
```
docker model run hf.co/inclusionAI/Ring-mini-2.0
```

LiangJiang commited on Sep 8, 2025

Commit

19461fd

verified ·

1 Parent(s): db274d2

Update README.md

Browse files

Files changed (1) hide show

README.md +153 -3

README.md CHANGED Viewed

@@ -1,3 +1,153 @@
----
-license: mit
----

+---
+license: mit
+language:
+- zh
+- en
+base_model:
+- inclusionAI/Ling-lite-base-1.5
+---
+# Ring-lite-2507
+<p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*4QxcQrBlTiAAAAAAQXAAAAgAemJ7AQ/original" width="100"/>
+<p>
+<p align="center">
+          🤗 <a href="https://huggingface.co/inclusionAI">Hugging Face</a>
+<p>
+## Introduction
+We present a compact yet powerful reasoning model **Ring-mini-2.0**. It has 16B total parameters, with 1.4B parameters are activated per input token (non-embedding 789M).  Trained on more than 20T tokens of high-quality data and enhanced through long-cot supervised fine-tuning and multi-stage reinforcement learning, **Ring-mini-2.0** still reaches the top-tier level of sub-10B dense LLMs and even matches or surpasses much larger MoE models.
+## Model Downloads
+<div align="center">
+|     **Model**      | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
+| :----------------: | :---------------: | :-------------------: | :----------------: | :----------: |
+| Ring-mini-2.0 |       16.8B       |         1.4B         |        128K         |      [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ring-mini-2.0) |
+</div>
+## Evaluation
+For a comprehensive evaluation of the quality of our reasoning models, we implemented automatic benchmarks to assess their performance including math, code and science.
+<p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*5F9KR7Tm4MAAAAAARzAAAAgAemJ7AQ/original" width="1000"/>
+<p>
+To compare the performance of Ring-lite-2507 and Ring-lite, we evaluate the two models on a broader range of reasoning and general-purpose benchmarks, including knowledge understanding, math, coding, reasoning & agentic and alignment.
+### Knowledge Understanding
+| **Benchmark**   | **Ring-mini-2.0** | **Ring-lite-2507** | **Qwen3-8B-Thinking**
+| :-------------: | :---------------: | :-----------: | :-------------------: |
+| MMLU-Pro (EM)         | 71.52    | 72.50    | 72.56 |
+| GPQA-Diamond (Pass@1) | 68.24    | 69.35    | 62.00 |
+| SuperGPQA (EM)        | 36.21    | 39.57    | 42.42 |
+| Phybench (Pass@1)     | 25.80    | 28.51    | 22.14 |
+### Math
+| **Benchmark**   | **Ring-lite-2507** | **Ring-lite-2506** | **Qwen3-8B-Thinking**
+| :-------------: | :---------------: | :-----------: | :-------------------: |
+| MATH-500 (Pass@1)             |   97.60 |   76.95	|   97.30       |
+| CNMO 2024 (Pass@1)            |   76.91 |   77.78 |   75.09       |
+| AIME 2024 (Pass@1)            |   79.69 |   84.06 |   79.27 |
+| AIME 2025 (Pass@1)            |   74.06 |   79.74 |   71.25 |
+| LiveMathBench (Pass@1)        |   83.98 |   84.94 |   82.92       |
+| TheoremQA (Pass@1)            |   70.09 |   70.00 |   68.81       |
+| OlympiadBench (math) (Pass@1) |   82.91 |   84.94 |   82.27       |
+### Coding
+| **Benchmark**   | **Ring-lite-2507** | **Ring-lite-2506** | **Qwen3-8B-Thinking**
+| :-------------: | :---------------: | :-----------: | :-------------------: |
+| LiveCodeBench(2408-2505) (Pass@1)     |62.56 |   63.27 | 56.94 |
+| Codeforces                    | 84.80 |   89.09 | 73.31 |
+### Reasoning \& Agentic
+| **Benchmark**   | **Ring-lite-2507** | **Ring-lite-2506** | **Qwen3-8B-Thinking**
+| :-------------: | :---------------: | :-----------: | :-------------------: |
+| DROP (zero-shot F1)    |   88.55 | 89.27 | 87.13 |
+| BBH (EM)               |   87.59 | 88.65 | 87.30 |
+| ARCPrize (Pass@1)      |   20.12 | 21.25 | 4.38 |
+| MuSR (EM)              |   75.99 | 77.19 | 76.92 |
+| BFCL_Live (Pass@1)     |   74.26 |  74.81 | 75.99 |
+### Alignment
+| **Benchmark**   | **Ring-lite-2507** | **Ring-lite-2506** | **Qwen3-8B-Thinking**
+| :-------------: | :---------------: | :-----------: | :-------------------: |
+| IFEval (Prompt Strict)    |   78.93 |   82.99 | 85.0 |
+| AlignBench v1.1(gpt-4.1)  |   80.69 | 80.90 | 74.70   |
+| FoFo (gpt-4-turbo)        |   84.11 |   85.02 | 81.93   |
+| ArenaHard (gpt-4.1)       |   85.19 |   88.85   |	86.14  |
+## Quickstart
+### 🤗 Hugging Face Transformers
+Here is a code snippet to show you how to use the chat model with `transformers`:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "inclusionAI/Ring-lite-2507"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto",
+    trust_remote_code=True
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "Give me a short introduction to large language models."
+messages = [
+    {"role": "system", "content": "You are Ring, an assistant created by inclusionAI"},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+    enable_thinking=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=8192
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+## Deployment
+Please refer to [GitHub](https://github.com/inclusionAI/Ring/blob/main/README.md)
+## License
+This code repository is licensed under [the MIT License](https://huggingface.co/inclusionAI/Ring-lite-2507/blob/main/LICENSE).
+## Citation
+```
+@misc{ringteam2025ringlitescalablereasoningc3postabilized,
+      title={Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs},
+      author={Ling Team},
+      year={2025},
+      eprint={2506.14731},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2506.14731},
+}
+```