Instructions to use ba144220/cs224r-default-project-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ba144220/cs224r-default-project-sft with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ba144220/cs224r-default-project-sft")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ba144220/cs224r-default-project-sft")
model = AutoModelForCausalLM.from_pretrained("ba144220/cs224r-default-project-sft", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ba144220/cs224r-default-project-sft with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ba144220/cs224r-default-project-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ba144220/cs224r-default-project-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ba144220/cs224r-default-project-sft

SGLang

How to use ba144220/cs224r-default-project-sft with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ba144220/cs224r-default-project-sft" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ba144220/cs224r-default-project-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ba144220/cs224r-default-project-sft" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ba144220/cs224r-default-project-sft",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ba144220/cs224r-default-project-sft with Docker Model Runner:
```
docker model run hf.co/ba144220/cs224r-default-project-sft
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

cs224r-default-project-sft

Supervised fine-tuned (SFT) Qwen2.5-0.5B for the Countdown arithmetic reasoning task, trained as part of Stanford CS224R (Spring 2026).

Model Description

This model is fine-tuned to solve Countdown problems: given a target number and a set of allowed numbers, produce a valid arithmetic equation that equals the target. The model generates chain-of-thought reasoning inside <think> tags and a final answer inside <answer> tags.

Training Details

Hyperparameter	Value
Base model	Qwen/Qwen2.5-0.5B
Dataset	Asap7772/cog_behav_all_strategies
Epochs	6
Learning rate	5e-5
LR schedule	Cosine with 5% warmup
Batch size	64 (gradient accumulation = 8)
Weight decay	0.01
Gradient clipping	1.0
Precision	bfloat16
Gradient checkpointing	Enabled
Hardware	1x NVIDIA H100 (Modal)
Max prompt length	512
Max response length	1024

Loss is computed only on response tokens (assistant turn), with prompt tokens masked out.

Evaluation

Evaluated on asingh15/countdown_tasks_3to4 test split (40 prompts) using vLLM with temperature 0.6, top-k 20, top-p 0.95, sampling K=16 responses per prompt.

Metric	Value
Average Score	0.3660
Pass@1	0.30
Pass@16	0.75 (30/40)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("ba144220/cs224r-default-project-sft")
tokenizer = AutoTokenizer.from_pretrained("ba144220/cs224r-default-project-sft")

messages = [{"role": "user", "content": "Using the numbers [44, 19, 35], create an equation that equals 98."}]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.6, top_k=20, top_p=0.95, do_sample=True)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Example Output

Prompt: Using the numbers [44, 19, 35], create an equation that equals 98.

Model output:

<think>
Let me try to find a path to 98.
First, let's look at the numbers: 44, 19, and 35.
Working backwards from 98: 98 is even, so it might be helpful to look for factors.
Let's try some operations:
44 + 19 = 63 (getting closer)
63 + 35 = 98 (this is our target!)
We can do: 19 + 44 = 63, 63 + 35 = 98.
Let's verify: (19 + 44) + 35 = 98 ✓
</think>
<answer> (19 + 44) + 35 </answer>

Score: 1.0 (correct)

Limitations

Trained and evaluated only on the Countdown arithmetic task; not intended for general-purpose use.
Performance degrades on harder problems with more numbers or larger targets.
The 0.5B parameter size limits reasoning capacity compared to larger models.

Authors

Yuchi Hsu (yuchihsu@stanford.edu) and Ryan He (ryanhe@stanford.edu), Stanford CS224R Spring 2026.

Downloads last month: -

Safetensors

Model size

0.5B params

Tensor type

BF16

Model tree for ba144220/cs224r-default-project-sft

Base model

Qwen/Qwen2.5-0.5B

Finetuned

(685)

this model

Finetunes

2 models

Dataset used to train ba144220/cs224r-default-project-sft

Evaluation results

Average Score on Countdown Tasks 3-to-4
test set self-reported

0.366
Pass@16 on Countdown Tasks 3-to-4
test set self-reported

0.750