Instructions to use cs-552-2026-centralesupechec/general_knowledge_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cs-552-2026-centralesupechec/general_knowledge_model with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="cs-552-2026-centralesupechec/general_knowledge_model") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("cs-552-2026-centralesupechec/general_knowledge_model") model = AutoModelForCausalLM.from_pretrained("cs-552-2026-centralesupechec/general_knowledge_model") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use cs-552-2026-centralesupechec/general_knowledge_model with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "cs-552-2026-centralesupechec/general_knowledge_model" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cs-552-2026-centralesupechec/general_knowledge_model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/cs-552-2026-centralesupechec/general_knowledge_model
- SGLang
How to use cs-552-2026-centralesupechec/general_knowledge_model with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "cs-552-2026-centralesupechec/general_knowledge_model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cs-552-2026-centralesupechec/general_knowledge_model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "cs-552-2026-centralesupechec/general_knowledge_model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cs-552-2026-centralesupechec/general_knowledge_model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use cs-552-2026-centralesupechec/general_knowledge_model with Docker Model Runner:
docker model run hf.co/cs-552-2026-centralesupechec/general_knowledge_model
Model Card for general_knowledge_model
Post-trained version of Qwen/Qwen3-1.7B
for the General Knowledge benchmark of EPFL CS-552 — Modern NLP (Spring 2026),
team CentraleSupéchec.
The task is closed-book multiple-choice QA (2–20 options). The model reasons
inside a <think> ... </think> block and ends its reply with the answer wrapped in
\boxed{LETTER}, which is parsed for pass@1 scoring.
Training
The model is trained with Rejection Fine-Tuning (RFT) — STaR-style self-distillation — with an answer-only loss:
- Sample
n=8completions (T=0.7) from the base model over a ~4.7k-question pool of GPQA and MMLU-Pro (excluding Math/CS). - Keep the 722 questions the base fails at
pass@1but solves under repeated sampling, producing self-generated correct reasoning traces. - Fine-tune a LoRA adapter (
r=16,α=32) with the cross-entropy loss masked to the\boxed{}answer span only — the<think>reasoning conditions the forward pass but receives no gradient. This preserves the model's pretrained reasoning while sharpening answer commitment and output formatting.
The chat template (baked into the tokenizer) enforces a strict \boxed{LETTER}
output and a 16,384-token reasoning budget.
Quick start
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "cs-552-2026-centralesupechec/general_knowledge_model"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype="bfloat16", device_map="cuda")
question = (
"Which of the following is the capital of Australia?\n\n"
"Choices:\nA. Sydney\nB. Melbourne\nC. Canberra\nD. Perth"
)
inputs = tok.apply_chat_template(
[{"role": "user", "content": question}],
add_generation_prompt=True, return_tensors="pt",
).to(model.device)
out = model.generate(inputs, max_new_tokens=16384, temperature=0.6, top_p=0.95, top_k=20)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
# ... reasoning ... \boxed{C}
For vLLM, mirror the CI: apply the model's chat template, seed=42,
max_new_tokens=16384, temperature=0.6, top_p=0.95, top_k=20.
Generation config
max_new_tokens: 16384 · temperature: 0.6 · top_p: 0.95 · top_k: 20 ·
do_sample: true. The 16k budget is essential: it removes the format failures
that occur when reasoning is truncated before the boxed answer.
Evaluation
pass@1 on held-out sets disjoint from training (n=4, 16k tokens):
| Set | pass@1 |
|---|---|
| 650-question MMLU sweep (26 subjects) | ~0.74 |
| Internal 100-question expert set | ~0.59 |
See the project report and code for the full comparison against the base model, full-trace SFT, and GRPO.
Framework versions
- Transformers 5.7.0
- PyTorch 2.10.0+cu128
- TRL 0.12, PEFT 0.13
Citation
@inproceedings{zelikman2022star,
title = {{STaR}: Bootstrapping Reasoning With Reasoning},
author = {Zelikman, Eric and Wu, Yuhuai and Mu, Jesse and Goodman, Noah D.},
booktitle = {Advances in Neural Information Processing Systems},
year = {2022}
}
- Downloads last month
- 294