Instructions to use cs-552-2026-catma/general_knowledge_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cs-552-2026-catma/general_knowledge_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cs-552-2026-catma/general_knowledge_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cs-552-2026-catma/general_knowledge_model")
model = AutoModelForCausalLM.from_pretrained("cs-552-2026-catma/general_knowledge_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use cs-552-2026-catma/general_knowledge_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cs-552-2026-catma/general_knowledge_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-catma/general_knowledge_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/cs-552-2026-catma/general_knowledge_model

SGLang

How to use cs-552-2026-catma/general_knowledge_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cs-552-2026-catma/general_knowledge_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-catma/general_knowledge_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cs-552-2026-catma/general_knowledge_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cs-552-2026-catma/general_knowledge_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use cs-552-2026-catma/general_knowledge_model with Docker Model Runner:
```
docker model run hf.co/cs-552-2026-catma/general_knowledge_model
```

general_knowledge_model / README.md

TuanNguyen2003

Create README.md

515e131 verified 18 days ago

preview code

raw

history blame contribute delete

2.21 kB

metadata

license: apache-2.0
language:
  - en
base_model: Qwen/Qwen3-1.7B
pipeline_tag: text-generation
library_name: transformers
tags:
  - qwen3
  - sft
  - general-knowledge
  - multiple-choice
  - cs-552
datasets:
  - cais/mmlu
metrics:
  - accuracy

General Knowledge Model

This model is a fine-tuned version of Qwen/Qwen3-1.7B for the CS-552 Modern NLP course project.

The model targets the General Knowledge benchmark, where it answers closed-book multiple-choice factual and reasoning questions. It was trained to return the final answer as a single option letter inside a LaTeX \boxed{} expression.

Intended output format

The model should produce answers in the following format:

\boxed{C}

Anything outside \boxed{} is treated as reasoning and is not used for scoring by the evaluation pipeline.

Training procedure

This checkpoint was trained using Supervised Fine-Tuning (SFT) with LoRA on top of Qwen/Qwen3-1.7B.

The SFT data was formatted as instruction-style multiple-choice examples:

Q: ...
A) ...
B) ...
C) ...
D) ...

Answer: \boxed{C}

The current checkpoint was trained on a processed General Knowledge dataset derived from MMLU-style multiple-choice examples.

Model behavior

The model is optimized for:

closed-book factual question answering
multiple-choice reasoning
final-answer extraction through \boxed{}
concise option-letter responses

The tokenizer chat template was configured with non-thinking mode to encourage concise answers.

Local validation

On the provided General Knowledge validation snapshot from the course starter repository, this checkpoint achieved:

Extraction rate: 10/10
Accuracy: 6/10

These validation samples are only a small sanity-check set and are not the hidden evaluation benchmark.

Framework versions

Transformers
PEFT
PyTorch
Datasets
Hugging Face Hub

Limitations

This is an intermediate SFT baseline, not the final model. It was trained mainly to establish a working General Knowledge pipeline and verify that the model can produce extractable boxed answers. Performance may vary on broader or harder factual reasoning tasks.