Instructions to use rednote-hilab/dots.llm1.inst with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rednote-hilab/dots.llm1.inst with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="rednote-hilab/dots.llm1.inst") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("rednote-hilab/dots.llm1.inst") model = AutoModelForCausalLM.from_pretrained("rednote-hilab/dots.llm1.inst") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use rednote-hilab/dots.llm1.inst with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rednote-hilab/dots.llm1.inst" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rednote-hilab/dots.llm1.inst", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rednote-hilab/dots.llm1.inst
- SGLang
How to use rednote-hilab/dots.llm1.inst with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rednote-hilab/dots.llm1.inst" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rednote-hilab/dots.llm1.inst", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rednote-hilab/dots.llm1.inst" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rednote-hilab/dots.llm1.inst", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use rednote-hilab/dots.llm1.inst with Docker Model Runner:
docker model run hf.co/rednote-hilab/dots.llm1.inst
Only 9.3 on the English SimpleQA despite 143b total parameters
Edit: I played around with this model a bit and it has more broad knowledge than I expected considering its low 9.3 English SimpleQA score.
Still, a 143 billion total parameter model should at least achieve a score of 20. Even Mistral Small 24b and Gemma 3 27b score a little higher.
hey, i am not from the team but i think i have theory for this result.
first we need to understand that they train the model without syntetic data, then we need to also acklowledge that they originated from china and their reddot app are dominated by chines people.
with only this information we can already determine that they will have mostly data in chinese, i dont see this as a negative point.
but ofc i hope later on they could improve it while keep on valuing non-syntetic so they still able to retain their different feels compare to model we have right now
According to your paper you maintained a 1:1 token training ratio between English and Chinese. At first glance this seems fair and reasonable; however, since there is more available English training tokens from sources like the WWW and digitized books than all other languages combined the only way to achieve said 1:1 English to Chinese ratio is to far more aggressively filter the English tokens, which I'm assuming is why this model achieved a good Chinese SimpleQA score relative to its total parameter count while achieving a very low English SimpleQA score (<10) for its size.
Point being, since there's far more available English tokens you either need to up the ratio between English and Chinese or improve the filtering so the damage caused by far more aggressively filtering the English tokens is mitigated.
Thank you for your feedback! I’ve reopened the channel for further discussion.
Your point about enhancing the quality and value of English tokens is insightful and much appreciated. We are actively working on processing larger volumes of data and implementing more fine-grained data filtering methods for pretraining.
Thanks for reopening this discussion but this model's general knowledge appears to be better than the 9.3 SimpleQA score suggests. An issue with the test is that nearly all of the questions are esoteric, so gaining knowledge in the covered domains rarely adds points until a threshold is crossed. This is probably why so many models plateau around 10, then pick up again between 20-65.