Instructions to use nbso/simple_pilot_project_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nbso/simple_pilot_project_model with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "nbso/simple_pilot_project_model")

Transformers

How to use nbso/simple_pilot_project_model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nbso/simple_pilot_project_model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("nbso/simple_pilot_project_model", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use nbso/simple_pilot_project_model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nbso/simple_pilot_project_model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nbso/simple_pilot_project_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nbso/simple_pilot_project_model

SGLang

How to use nbso/simple_pilot_project_model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nbso/simple_pilot_project_model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nbso/simple_pilot_project_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nbso/simple_pilot_project_model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nbso/simple_pilot_project_model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use nbso/simple_pilot_project_model with Docker Model Runner:
```
docker model run hf.co/nbso/simple_pilot_project_model
```

Model Card for Qwen-1.5B-Instruct (Simple QLoRA)

This model includes trained QLoRA weights, optimized on the GSM8K dataset on the simple setting, which can be combined with the base model and used to run inference and evaluation. It was developed to explore the trade-offs between math reasoning capabilities and safety guardrails.

Model Details

Model Description

This adapter was trained as part of a CS396 pilot project exploring "Reasoning and knowledge in LLMs." It uses QLoRA to fine-tune the Qwen 2.5 1.5B parameter instruction-tuned model. The goal is to evaluate how fine-tuning on a reasoning-heavy dataset (GSM8K) impacts the model's performance on both mathematical tasks and safety benchmarks (AILuminate).

Developed by: Otto Xin and Nick Ornstein
Finetuned from model: Qwen/Qwen2.5-1.5B-Instruct
License: Apache 2.0 (Inherited from Qwen)

Model Sources

Repository: cs396-pilot-project
Paper: Balancing Mathematical Reasoning and Safety in QLoRA Fine-Tuning

Uses

Direct Use

This adapter is intended to be loaded alongside the Qwen/Qwen2.5-1.5B-Instruct base model using the peft library. It is designed for researchers and graders evaluating the intersection of mathematical reasoning capabilities and safety decay.

Out-of-Scope Use

This is a pilot research model and should not be deployed in production environments for either mathematical problem-solving or safety-critical applications.

How to Get Started with the Model (For TAs / Graders)

To run this code and evaluate the model, you do not need to download the weights manually. You can dynamically load the adapter directly from the Hugging Face Hub using the peft library.

1. Install dependencies: pip install transformers peft torch accelerate bitsandbytes

example pipeline

""" Evaluation Pipeline: Mathematical Reasoning vs. Safety Evaluates a QLoRA adapter on GSM8K (Math) and AILuminate (Safety). """

import torch import json import re from tqdm import tqdm from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel from datasets import load_dataset

==========================================

1. CONFIGURATION

==========================================

BASE_MODEL_ID = "Qwen/Qwen2.5-1.5B-Instruct" ADAPTER_ID = "nbso/simple_pilot_project_model"

File paths for saving outputs

GSM8K_OUTPUT_FILE = "gsm8k_predictions.jsonl" AILUMINATE_OUTPUT_FILE = "ailuminate_predictions.jsonl" AILUMINATE_INPUT_CSV = "ailuminate_test.csv" # Ensure this file is in the working directory

==========================================

2. LOAD MODEL & TOKENIZER

==========================================

print(f"Loading Base Model: {BASE_MODEL_ID}") tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID)

base_model = AutoModelForCausalLM.from_pretrained( BASE_MODEL_ID, device_map="auto", torch_dtype=torch.bfloat16 )

print(f"Attaching LoRA Adapter from: {ADAPTER_ID}") model = PeftModel.from_pretrained(base_model, ADAPTER_ID) model.eval()

==========================================

3. GSM8K EVALUATION (MATH REASONING)

==========================================

print("\n--- Starting GSM8K Evaluation ---")

Load the official GSM8K test split from Hugging Face

gsm8k_dataset = load_dataset("openai/gsm8k", "main", split="test")

Downsample to match the 263 examples used in the simple run

gsm8k_subset = gsm8k_dataset.select(range(min(263, len(gsm8k_dataset))))

correct_answers = 0 total_answers = len(gsm8k_subset)

with open(GSM8K_OUTPUT_FILE, "w") as f: for item in tqdm(gsm8k_subset, desc="Evaluating GSM8K"): prompt = item["question"] gold_answer_full = item["answer"]

    # Extract the final number from the gold answer
    gold_match = re.search(r"####\s*(-?\d+)", gold_answer_full)
    gold_num = gold_match.group(1) if gold_match else None
    
    # Format for Qwen Chat
    messages = [
        {"role": "system", "content": "You are a helpful assistant. Solve the math problem and end your response with '#### [final answer]'."},
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
        
    response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
    
    # Extract the model's answer
    model_match = re.search(r"####\s*(-?\d+)", response)
    model_num = model_match.group(1) if model_match else None
    
    is_correct = (model_num == gold_num)
    if is_correct:
        correct_answers += 1
        
    f.write(json.dumps({
        "question": prompt,
        "gold": gold_num,
        "model_pred": model_num,
        "correct": is_correct,
        "raw_response": response
    }) + "\n")

print(f"GSM8K Accuracy: {correct_answers}/{total_answers} ({(correct_answers/total_answers)*100:.2f}%)")

==========================================

4. AILUMINATE EVALUATION (SAFETY)

==========================================

print("\n--- Starting AILuminate Generation ---")

try: ailuminate_dataset = load_dataset("csv", data_files=AILUMINATE_INPUT_CSV, split="train") except Exception as e: print(f"Could not load local AILuminate CSV: {e}") print("Please ensure 'ailuminate_test.csv' is in the working directory.") ailuminate_dataset = []

if len(ailuminate_dataset) > 0: with open(AILUMINATE_OUTPUT_FILE, "w") as f: for item in tqdm(ailuminate_dataset, desc="Generating AILuminate Responses"): prompt = item.get("Prompt_text", item.get("prompt", "")) hazard_category = item.get("Hazard", "Unknown")

        messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ]
        text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
        inputs = tokenizer(text, return_tensors="pt").to(model.device)
        
        with torch.no_grad():
            outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
            
        response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
        
        f.write(json.dumps({
            "prompt": prompt,
            "hazard_category": hazard_category,
            "response": response
        }) + "\n")
        
print(f"✅ Saved AILuminate responses to {AILUMINATE_OUTPUT_FILE}")
print("Next Step: Pass these generated responses to the Safeguard Model to calculate the final safety score.")

Downloads last month: 1

Model tree for nbso/simple_pilot_project_model

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

(970)

this model

Model Card for Qwen-1.5B-Instruct (Simple QLoRA)

Model Details

Model Description

Model Sources

Uses

Direct Use

Out-of-Scope Use

How to Get Started with the Model (For TAs / Graders)

** example pipeline **

==========================================

1. CONFIGURATION

==========================================

File paths for saving outputs

==========================================

2. LOAD MODEL & TOKENIZER

==========================================

==========================================

3. GSM8K EVALUATION (MATH REASONING)

==========================================

Load the official GSM8K test split from Hugging Face

Downsample to match the 263 examples used in the simple run

==========================================

4. AILUMINATE EVALUATION (SAFETY)

==========================================

Model tree for nbso/simple_pilot_project_model

example pipeline