Instructions to use cs-552-2026-catma/general_knowledge_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cs-552-2026-catma/general_knowledge_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cs-552-2026-catma/general_knowledge_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cs-552-2026-catma/general_knowledge_model")
model = AutoModelForCausalLM.from_pretrained("cs-552-2026-catma/general_knowledge_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use cs-552-2026-catma/general_knowledge_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cs-552-2026-catma/general_knowledge_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-catma/general_knowledge_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/cs-552-2026-catma/general_knowledge_model

SGLang

How to use cs-552-2026-catma/general_knowledge_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cs-552-2026-catma/general_knowledge_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-catma/general_knowledge_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cs-552-2026-catma/general_knowledge_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-catma/general_knowledge_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use cs-552-2026-catma/general_knowledge_model with Docker Model Runner:
```
docker model run hf.co/cs-552-2026-catma/general_knowledge_model
```

General Knowledge Model

This model is a fine-tuned version of Qwen/Qwen3-1.7B for the CS-552 Modern NLP course project.

The model targets the General Knowledge benchmark, where it answers closed-book multiple-choice factual and reasoning questions. It was trained to return the final answer as a single option letter inside a LaTeX \boxed{} expression.

Intended output format

The model should produce answers in the following format:

\boxed{C}

Anything outside \boxed{} is treated as reasoning and is not used for scoring by the evaluation pipeline.

Training procedure

This checkpoint was trained using Supervised Fine-Tuning (SFT) with LoRA on top of Qwen/Qwen3-1.7B.

The SFT data was formatted as instruction-style multiple-choice examples:

Q: ...
A) ...
B) ...
C) ...
D) ...

Answer: \boxed{C}

The current checkpoint was trained on a processed General Knowledge dataset derived from MMLU-style multiple-choice examples.

Model behavior

The model is optimized for:

closed-book factual question answering
multiple-choice reasoning
final-answer extraction through \boxed{}
concise option-letter responses

The tokenizer chat template was configured with non-thinking mode to encourage concise answers.

Local validation

On the provided General Knowledge validation snapshot from the course starter repository, this checkpoint achieved:

Extraction rate: 10/10
Accuracy: 6/10

These validation samples are only a small sanity-check set and are not the hidden evaluation benchmark.

Framework versions

Transformers
PEFT
PyTorch
Datasets
Hugging Face Hub

Limitations

This is an intermediate SFT baseline, not the final model. It was trained mainly to establish a working General Knowledge pipeline and verify that the model can produce extractable boxed answers. Performance may vary on broader or harder factual reasoning tasks.

Downloads last month: 37

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for cs-552-2026-catma/general_knowledge_model

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

(716)

this model

cs-552-2026-catma
/

general_knowledge_model