Instructions to use SII-GAIR-NLP/davinci-llm-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SII-GAIR-NLP/davinci-llm-model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SII-GAIR-NLP/davinci-llm-model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SII-GAIR-NLP/davinci-llm-model")
model = AutoModelForCausalLM.from_pretrained("SII-GAIR-NLP/davinci-llm-model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Local Apps Settings

vLLM

How to use SII-GAIR-NLP/davinci-llm-model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SII-GAIR-NLP/davinci-llm-model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SII-GAIR-NLP/davinci-llm-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/SII-GAIR-NLP/davinci-llm-model

SGLang

How to use SII-GAIR-NLP/davinci-llm-model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SII-GAIR-NLP/davinci-llm-model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SII-GAIR-NLP/davinci-llm-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SII-GAIR-NLP/davinci-llm-model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SII-GAIR-NLP/davinci-llm-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use SII-GAIR-NLP/davinci-llm-model with Docker Model Runner:
```
docker model run hf.co/SII-GAIR-NLP/davinci-llm-model
```

Overview

daVinci-LLM-3B is a 3B-parameter base language model presented in daVinci-LLM: Towards the Science of Pretraining. This project aims to make the pretraining process a transparent and reproducible scientific endeavor.

We release not only the final weights but also training trajectories, intermediate checkpoints, data processing decisions, and 200+ ablation studies covering data quality, mixture design, training dynamics, and evaluation validity.

GitHub: GAIR-NLP/daVinci-LLM
Paper: arXiv:2603.27164
Dataset: davinci-llm-data

The model follows a two-stage curriculum over ~8T tokens:

Stage 1 (6T tokens): broad pretraining over diverse web-scale corpora.
Stage 2 (2T tokens): structured QA and reasoning-heavy data to amplify math and code reasoning.

Key Features

Fully transparent pretraining pipeline: data processing logic, mixtures, logs, and checkpoints are publicly documented.
Data Darwinism framework: a systematic L0–L9 taxonomy for data processing depth.
Large-scale ablations: 200+ controlled experiments with both positive and negative results.

Intended Use

Research: pretraining science, data quality studies, training dynamics, evaluation stability.
General capabilities: broad language understanding, math/science reasoning, and code generation.

This is a base model and is not instruction- or safety-aligned. Additional safety evaluation and alignment are required for production deployment.

Architecture

Type: Decoder-only Transformer (Qwen2 family)
Parameters: ~3.09B
Layers: 36
Hidden size: 2048
Attention heads: 16 (GQA, KV heads = 2)
MLP: SwiGLU, intermediate size 11008
Position encoding: RoPE (base = 10000)
Context length: 4096
Tokenizer: Qwen2 tokenizer (151,936 vocab)

Data and Processing

The training corpus spans general web text, code, science, and QA sources. Each dataset is annotated with a Data Darwinism level (L0–L9), and multiple sources receive L4/L5 generative refinement and cognitive completion.

Major categories:

General: Common Crawl–based corpora (L3).
Code: GitHub crawls + Nemotron code datasets (L3/L5).
Science/Math: MegaMath, Nemotron-CC-Math, and Darwin-Science series (L3–L5).
QA: multi-source QA data with rejection sampling (L5).

Evaluation

The model reaches an overall average score of 51.72 across 19 benchmarks, matching or exceeding the performance of larger 7B-scale models like OLMo-3 7B.

Capability Dimension	daVinci-3B	OLMo-3 7B	LLaMA-3.2-3B	Qwen-2.5-3B
Overall Performance	51.72	51.65	37.58	51.44
General Knowledge	52.96	55.13	51.08	55.16
Code Generation	55.99	54.42	32.40	56.13
Scientific Reasoning	48.30	45.98	22.45	44.65
MATH	62.80	39.60	9.00	37.20

Citation

@misc{qin2026davincillmtowardssciencepretraining,
      title={daVinci-LLM:Towards the Science of Pretraining}, 
      author={Yiwei Qin and Yixiu Liu and Tiantian Mi and Muhang Xie and Zhen Huang and Weiye Si and Pengrui Lu and Siyuan Feng and Xia Wu and Liming Liu and Ye Luo and Jinlong Hou and Qipeng Guo and Yu Qiao and Pengfei Liu},
      year={2026},
      eprint={2603.27164},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2603.27164}, 
}

Downloads last month: 23

Safetensors

Model size

3B params

Tensor type

BF16

Paper for SII-GAIR-NLP/davinci-llm-model

daVinci-LLM:Towards the Science of Pretraining

Paper • 2603.27164 • Published Mar 28 • 32