Instructions to use rednote-hilab/dots.llm1.inst with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rednote-hilab/dots.llm1.inst with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="rednote-hilab/dots.llm1.inst")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("rednote-hilab/dots.llm1.inst")
model = AutoModelForCausalLM.from_pretrained("rednote-hilab/dots.llm1.inst")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use rednote-hilab/dots.llm1.inst with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rednote-hilab/dots.llm1.inst"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rednote-hilab/dots.llm1.inst",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rednote-hilab/dots.llm1.inst

SGLang

How to use rednote-hilab/dots.llm1.inst with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rednote-hilab/dots.llm1.inst" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rednote-hilab/dots.llm1.inst",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rednote-hilab/dots.llm1.inst" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rednote-hilab/dots.llm1.inst",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use rednote-hilab/dots.llm1.inst with Docker Model Runner:
```
docker model run hf.co/rednote-hilab/dots.llm1.inst
```

Only 9.3 on the English SimpleQA despite 143b total parameters

by phil111 - opened Jun 7, 2025

Discussion

phil111

Jun 7, 2025

•

edited Jun 8, 2025

Edit: I played around with this model a bit and it has more broad knowledge than I expected considering its low 9.3 English SimpleQA score.

Still, a 143 billion total parameter model should at least achieve a score of 20. Even Mistral Small 24b and Gemma 3 27b score a little higher.

AnA202

Jun 7, 2025

•

edited Jun 7, 2025

hey, i am not from the team but i think i have theory for this result.
first we need to understand that they train the model without syntetic data, then we need to also acklowledge that they originated from china and their reddot app are dominated by chines people.
with only this information we can already determine that they will have mostly data in chinese, i dont see this as a negative point.

but ofc i hope later on they could improve it while keep on valuing non-syntetic so they still able to retain their different feels compare to model we have right now

phil111

Jun 7, 2025

This comment has been hidden (marked as Resolved)

redmoe-ai-v1 changed discussion status to closed Jun 9, 2025

phil111

Jun 9, 2025

According to your paper you maintained a 1:1 token training ratio between English and Chinese. At first glance this seems fair and reasonable; however, since there is more available English training tokens from sources like the WWW and digitized books than all other languages combined the only way to achieve said 1:1 English to Chinese ratio is to far more aggressively filter the English tokens, which I'm assuming is why this model achieved a good Chinese SimpleQA score relative to its total parameter count while achieving a very low English SimpleQA score (<10) for its size.

Point being, since there's far more available English tokens you either need to up the ratio between English and Chinese or improve the filtering so the damage caused by far more aggressively filtering the English tokens is mitigated.

redmoe-ai-v1 changed discussion status to open Jun 10, 2025

redmoe-ai-v1

rednote-hilab org Jun 10, 2025

Thank you for your feedback! I’ve reopened the channel for further discussion.

Your point about enhancing the quality and value of English tokens is insightful and much appreciated. We are actively working on processing larger volumes of data and implementing more fine-grained data filtering methods for pretraining.

phil111

Jun 10, 2025

Thanks for reopening this discussion but this model's general knowledge appears to be better than the 9.3 SimpleQA score suggests. An issue with the test is that nearly all of the questions are esoteric, so gaining knowledge in the covered domains rarely adds points until a threshold is crossed. This is probably why so many models plateau around 10, then pick up again between 20-65.

phil111 changed discussion status to closed Jun 10, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment