Instructions to use my-ai-stack/Stack-2-9-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use my-ai-stack/Stack-2-9-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
model = AutoModelForCausalLM.from_pretrained("my-ai-stack/Stack-2-9-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use my-ai-stack/Stack-2-9-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "my-ai-stack/Stack-2-9-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/my-ai-stack/Stack-2-9-finetuned

SGLang

How to use my-ai-stack/Stack-2-9-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "my-ai-stack/Stack-2-9-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "my-ai-stack/Stack-2-9-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "my-ai-stack/Stack-2-9-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use my-ai-stack/Stack-2-9-finetuned with Docker Model Runner:
```
docker model run hf.co/my-ai-stack/Stack-2-9-finetuned
```

Stack-2-9-finetuned / training /IMPROVEMENTS.md

walidsobhie-code

fix: optimize model card badges and clean YAML frontmatter

2064035 about 2 months ago

preview code

raw

history blame contribute delete

7.26 kB

Training Infrastructure Improvements

Status: Audit Complete — Issues Found & Documented

🔴 CRITICAL: Data Format Mismatch (Training Won't Run)

The Problem

All training scripts expect simple text/chat formats, but the actual training data uses a messages-array format with tool calls:

# What scripts expect (WRONG):
{"text": "...", "instruction": "...", "output": "..."}

# What the data actually contains (CORRECT):
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": null, "tool_calls": [...]}, {"role": "tool", ...}], "tools": [...]}

Affected Scripts

Script	Issue
`train_simple_nobnb.py`	`tokenize_function` looks for `instruction`/`output` fields — these don't exist
`train_local.py`	References `./data/final/train.jsonl` — wrong path and wrong format
`train_extended_context.py`	Same `text` field assumption — won't tokenize properly
`t4-qlora.yaml`	`text_field: "text"` and `dataset_path: "./data/final/train_combined.jsonl"` — wrong
`extended-context-128k.yaml`	`dataset_path: "./training-data/final/train.jsonl"` — file doesn't exist

Fix Required

A proper data loader that converts the messages format to training tokens, handling:

System message prepending
Tool-call turns (skip or flatten)
User/assistant turns for language modeling
Padding and truncation at max_length

🔴 train_local.py Issues

Broken import path — sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'stack/training')) points to a directory that doesn't exist
Wrong data path — ./data/final/train.jsonl should be ./training-data/tool_examples_combined.jsonl
Wrong config path — stack/training/train_config_local.yaml doesn't exist
MPS check bug — torch.backends.mps.is_built() would raise AttributeError on non-Apple hardware
No 4-bit quantization — loads full model in FP32, will OOM on Mac MPS

🟡 t4-qlora.yaml Issues

Wrong data path: ./data/final/train_combined.jsonl doesn't exist
Wrong format field: text_field: "text" won't work with messages format
Includes neat_ft: false — this is not a valid HF TrainingArguments field
No push_to_hub_model_id despite push_to_hub: true being templated

🟡 extended-context-128k.yaml Issues

Wrong data path: ./training-data/final/train.jsonl doesn't exist
File references Qwen/Qwen2.5-Coder-1.5B but it's not clear if this model already has extended RoPE config
No verification that the base model actually has rope_scaling in its config.json

🟡 evaluate_model.py Issues

Wrong HumanEval format — expects test_cases in problem dict, but HumanEval typically uses canonical_solution + test strings that need to be executed
Code execution sandbox is limited — only allows specific builtins; many standard library functions missing
No handling of assert statements in test code
calculate_pass_at_k has a bug: correct_in_k = sum(correct_flags[:min(k, len(correct_flags))]) is wrong for pass@k — should be number of correct out of k samples drawn, not just first k

🟢 What's Working Well

train_simple_nobnb.py — Good mixed precision logic, proper bf16/fp16 detection, paged AdamW optimizer, gradient checkpointing with use_reentrant=False
Training configs — Comprehensive hardware-specific settings, well-documented
Recipes — Good documentation of GPU requirements and expected runtimes
LoRA config — Properly targets all relevant modules for Qwen

✅ Recommended Fixes (Priority Order)

1. Fix Data Loaders (Highest Priority)

Add a proper load_chat_data() function to train_simple_nobnb.py:

def load_chat_data(data_path: str, tokenizer, max_length: int = 2048, train_split: float = 0.9):
    """Load messages-format dataset and convert to training tokens."""
    raw_dataset = load_dataset("json", data_files=data_path, split="train")
    
    def tokenize_messages(example):
        messages = example["messages"]
        # Flatten to: system + user + assistant turns
        text = ""
        for msg in messages:
            role = msg["role"]
            content = msg.get("content", "") or ""
            if role == "system":
                text += f"<|system|>\n{content}\n"
            elif role == "user":
                text += f"<|user|>\n{content}\n"
            elif role == "assistant":
                # Skip tool calls in content for now, just use text response
                text += f"<|assistant|>\n{content}\n"
            elif role == "tool":
                text += f"<|tool|>\n{content}\n"
        text += "<|assistant|>"
        
        result = tokenizer(text, truncation=True, max_length=max_length, padding="max_length")
        result["labels"] = result["input_ids"].copy()
        return result
    
    tokenized = raw_dataset.map(tokenize_messages, remove_columns=raw_dataset.column_names)
    # ... train/test split
    return train_dataset, eval_dataset

2. Fix All Data Paths

Config File	Current (Wrong)	Correct
`t4-qlora.yaml`	`./data/final/train_combined.jsonl`	`./training-data/tool_examples_combined.jsonl`
`extended-context-128k.yaml`	`./training-data/final/train.jsonl`	`./training-data/tool_examples_combined.jsonl`
`train_local.py`	`./data/final/train.jsonl`	`./training-data/tool_examples_combined.jsonl`

3. Fix t4-qlora.yaml

Remove neat_ft: false (not a valid field)
Add output_dir override or create training-configs/t4-qlora-data-fix.yaml

4. Fix evaluate_model.py

Add proper HumanEval problem loading (use openai/humaneval dataset from HuggingFace)
Fix pass@k calculation
Expand safe builtins for code execution

5. Fix train_local.py

Remove broken stack/training import path
Add proper 4-bit quantization support for MPS (or detect CUDA availability)
Fix data and config paths

📁 Actual Training Data Location

/Users/walidsobhi/stack-2.9/training/training-data/
├── tool_examples.jsonl           (1000 lines)
├── tool_examples_combined.jsonl  (1500 lines)
└── tool_examples.json            (same data, json format)

Format: {"messages": [...], "tools": [...]} — messages-array with tool calls.

🚀 Quick Test Command

To verify training would work after fixes:

cd /Users/walidsobhi/stack-2.9/training
python -c "
from datasets import load_dataset
ds = load_dataset('json', data_files='training-data/tool_examples_combined.jsonl', split='train')
print(f'Total examples: {len(ds)}')
print(f'Keys: {ds.column_names}')
print(f'Example: {ds[0]}')
"

Expected output: ['messages', 'tools'] — not ['text'] or ['instruction', 'output'].

Next Steps

Write a proper load_chat_data() function in a shared data_utils.py
Update train_simple_nobnb.py to use it
Update all YAML configs with correct data paths
Test with 1 epoch on small sample
Then scale to full training on Kaggle/A100